Quantcast
Channel: Hasen Judy
Viewing all articles
Browse latest Browse all 50

Coroutines vs explicitly async APIs

$
0
0

I’ve expressed before (rather distastefully) an opinion that Node.js is not a good choice for developing web applications, because its async APIs will usually turn your code to a callback hell.

Many people responded negatively, perhaps rightly so. But on the other hand, I think some of them didn’t actually understand why the explicitly async APIs of Node.js are not a good way to handle the “blocking I/O” problem; or rather, these people didn’t understand that gevent’s coroutines and Go’s goroutines provide a much cleaner solution. They seem to think that if it’s not callback hell, it won’t scale. I want to show that they’re wrong.

I’m going to try to explain the difference between an API that’s explicitly asynchronous and one that appears “normal” but is ‘asynchronous’ under the hood.

Why asynchronous IO?

The short answer is that OS threads don’t scale. Meaning, if you spawn a new thread per request, your server will not scale to tens of thousands of simultaneous requests.

I don’t know much about the details of this, but it appears that OS level threads are expensive (relatively .. and only after you’ve spawned thousands of them).

Request handlers typically do some sort of a blocking I/O operation, such as reading a file, or making a network connection (to the database, for instance). To handle more connections, you have to do it in parallel, which is usually done by spawning new threads, but we just mentione d that they don’t “scale”.

So for scalability, there’s a need for a different mechanism of handling large amounts of concurrent connections. A mechanism that does not involve spawning a kernel-level thread per request.

One such method is using asynchronous I/O, because it’s non-blocking. What you basically do is tell the system: read data from here, and when it arrives, call this function; meanwhile, I will keep working on other stuff. Because I/O operations don’t block, you don’t have to spawn a thread for every connection.

This is basically what Node.js does. This is how it can scale to heavy loads while while doing everything in one thread.

A typical request handling function merely reads data from files/databases and does simple computations that don’t take more than a few nanoseconds on modern hardware.

Let’s suppose we have a request handler that takes the userid and fetches some “items” that belong to this user so they can be displayed to him.

In a typical synchronous IO style, it might look something like this:

def serve_items(request):
    user_id = grab_user_id(request) # simple function
    items = fetch_user_items_from_db(user_id) # blocking IO - blocks current process or thread
    items = sort_items_by_score(items) # super fast operation
    return json_response(items) # respond with the items in json format

That would be the synchronous version of this request handler.

In Node.js, it would look more like this:

var serve_items = function(request, response) {
    var user_id = grab_user_id(request); // simple function
    var on_items_ready = function(items) {
        items = sort_items_by_score(items); // simple computation
        response.send_json(items); // respond with the items in json format
    };
    fetch_user_items_from_db(user_id, on_items_ready);
};

The way you write the code is different.

Now, if you pay attention, you will find that this function returns almost immediately! The function doesn’t wait for the data fetch to complete before returning, it returns as soon as it fires the fetch request. The actual sorting of the items and sending the response happen later - after the data has arrived.

The disadvantage here is the way you write the code:

  1. More cognitive effort is required to express the control flow in a soup of callback handlers.

  2. The resulting code is harder to read.

I’d argue that from 1 and 2, we can conclude that the resulting code is harder to maintain, because maintenance requires both reading and writing.

I’m not going to delve into more detailed examples. There is a lot of material on the net. Just Google it.

http://bjouhier.wordpress.com/2011/01/09/asynchronous-javascript-the-tale-of-harry/

There are some solutions to this within the Node.js ecosystem. Here are some examples:

http://alexeypetrushin.github.io/synchronize/docs/index.html

https://github.com/Sage/streamlinejs

But now I’d like to take a look at another approach to the issue.

Goroutines

What makes blocking code “bad” or “undesirable” is that process and threads are expensive.

Why are they expensive? Can’t we make them cheap? If we can’t make them cheap, can’t we come up with other concurrency constructs that are cheap?

Well, it turns out we actually can. This is exactly what goroutines are: cheap concurrency constructs.

When a goroutine blocks on IO, it’s not a problem, because it’s very cheap to create new goroutines. You can create and teardown hundreds of thousands of goroutines in under a second; that’s how cheap they are. So “blocking” on I/O does not cause any scalability issues.

So in Go, you can write code in the usual synchronous style, and it will scale. It just works!

Actually goroutines still do asynchrunous I/O; but it all happens under the cover. The APIs of the language are not explicitly asynchronous. The APIs are usual, synchrunous.

Coroutines and Continuations

Goroutines are based on the concept of coroutines. The term “coroutine” comes from “cooperative routines”. We call them “cooperative” because they don’t get scheduled by the OS or any other scheduler, they just run for a while then “yield”. When a coroutine yields, other coroutines get to run. When a coroutine gets control back, it continues running from where it left off last time.

This tutorial gives a nice overview of coroutines in lua.

The concept of suspending and resuming a coroutine is closely related to the concept of continuations.

When a coroutine suspends, we can think of this as a snapshot being taken of the coroutine’s execution context. This snapshot can then be invoked at a later time and it will resume the execution of the coroutine.

Let’s look at a simple example of a yielding operation:

print("hello")
sleep(10) // milliseconds
print("world")

The call to sleep basically says: capture the state of the current coroutine (i.e.: save the current continuation), and don’t resume it before 10 milliseconds pass. After the sleep time passes (10 milliseconds, in this case), the coroutine may resume whenever it’s given the chance.

The same can happen with I/O operations that usually block. Depending on the implementation, coroutines may “yield” execution when they make, for example, an http request, and don’t resume execution until the response comes back. So, while the request is waiting to return, other coroutines can continue to run (inside the same process).

If you’re familiar with javascript, the sleep example might remind you of something like this:

console.log("hello")
setTimeout(10, function() {
    console.log("world")
});

Basically, in javascript, you have to manually create something similar to a continuation, but instead of a continuation, it’s a closure, and you pass it to the “sleep” function to say: here, forget about me, I’ll be gone now, but when you return, do this thing that I’m telling you about right here.

Instead of capturing the state of the current execution context, you terminate the current execution context and set the stage for a new execution context to be created in the future.

You may think continuations are conceptually the same thing as closures, but they are not. A continuation includes the call stack, so when a continuation is resuming, it may hit a “return” statement, and when it does that, the function returns the result to its caller.

Consider this:

# python syntax, but don’t assume that this is python per se ..

def logError(errorCode):
    message = getExplanation(errorCode)
    time = getTime()
    print(time + ": error " + code + ": " + message)

def getExplanation(code):
    resp_text = http_get_request("http://api.service.example/errors/" + code) # this line dispatches the request and yields execution
    resp_json = json.parse(resp_text)
    return resp_json.human_readable_explanation

Now, in this example, http_get_request is assumed to be a function that makes an http request, and just like sleep in the above example, it will capture the current continuation and yield execution to other coroutines, and when the request comes back, the captured continuation maybe invoked, to resume the execution of the coroutine.

Now, when this continuation is resumed, the entire execution context is resumed, including the fact that getExplanation was called from inside logError and that logError is waiting for getExplanation to return a string.

So, when the http response comes back, and the continuation is resumed, and we hit the line return resp_json.human_readable_explanation, we actually take this value and return it, and then continue execution at the line message = getExplanation(errorCode) inside the caller .. and so on.

You can’t do this with closures.

If the getExplanation function is to be written in the javascript callback style, it can’t really “return” anything.

Here, let’s try to make it return something. It will be absurd - and it won’t work as expected - but let’s do it just for fun:

var logError = function(errorCode) {
    var message = getExplanation(errorCode);
    var time = getTime();
    console.log(time + ": error " + code + ": " + message);
}

var getExplanation = function(code) {
    http_get_request("http://api.service.example/errors/" + code, function(resp_text) {
            var resp_json = json.parse(resp_text);
            return resp_json.human_readable_explanation; // absurd!!!
    });
    // getExplanation will infact return here, and it will return null
}

To help see the absurdity of this return attempt more clearly, let’s rewrite the async http request:

var getExplanation = function(code) {
    var url = "http://api.service.example/errors/" + code;
    var callback = function(resp_text) {
        var resp_json = json.parse(resp_text);
        return resp_json.human_readable_explanation; //absurdity!!!
    };

    http_get_request(url, callback);
    // getExplanation will infact return here, and it will return null
}

There’s no way to tell the callback “oh btw, return to logError, please!”. No! You can’t do that. It’s a brand new execution context; all information about the call stack is completely lost.

So, what you end up having to do instead, is pass a callback to getExplanation itself!

// notice the "caller_callback"
var getExplanation = function(code, caller_callback) {
    var url = "http://api.service.example/errors/" + code;

    var callback = function(resp_text) {
        var resp_json = json.parse(resp_text)
        var result = resp_json.human_readable_explanation;
        caller_callback(result);
    };

    http_get_request(url, callback);
    // getExplanation will infact return here, and it will return null
}

// and rewrite the `logError` function too
var logError = function(errorCode) {
    var callback = function(message) {
        var time = getTime();
        console.log(time + ": error " + code + ": " + message)
    };

    getExplanation(errorCode, callback)
}

So, I hope that I have illustrated why closures are not equivalent to continuations, and are in fact inferior to them.

Note that when I say a “continuation”, I’m talking about a concept; not any particular implementation detail. When a OS process is suspended and then resumed, we may also think of this as a continuation being resumed. When a thread is suspended and resumed, we may also think of this as a continuation being resumed.

The key aspect of coroutines is that they are much cheaper than threads or processes, and are not managed by the kernel.

The gevent library for python does just what I described about coroutines and continuations. It wraps an asynchronous I/O with a coroutine yielding mechanism, thus providing a synchronous API on top of an asynchronous API.

The obvious conclusion:

With an explicitly asynchronous API, such as the one that Node.js gives you, you must manually construct “callback” functions that emulate continuations. You get the benefit of asynchronous IO but lose all the benefits of synchronous IO APIs.

With things like gevent and goroutines, you get all the benefits of asynchronous IO while maintaining all the benefits of the tranditional synchronous IO APIs.

Another way to look at it is related to closures and continuations.

With Node.js, you have to create closures, and call the async IO APIs with these closures as the thing to execute when the response arrives.

With coroutines, the language/compilers does it for you such that when the response comes back, the current continuation is resumed.

Notice that what the compiler generates for you (continuations) is much more powerful than what you have to write by hand (closures).

Footnote:

Now, if only it was possible to specify the callbacks as continuations … it might then be possible to easily wrap async IO functions and make them appear synchronous

I’m imagning something like this:

var wrap_async_api = function(async_fn) {
    var sync_version = function() {
        return call_with_current_continuation as cc {
            var callback = function(data) {
                resume_continuation cc {
                    return data
                }
            }
            var args = Array.prototype.slice.call(arguments);
            args.push(callback)
            async_fn.call(this, args);
        }
    }
    return sync_version;
}

Not sure if this is a reasonable way to imagine using continuations to provide a synchronous API on top of an async one. I haven’t actually used call/cc in any real program.

For a good understanding of scheme’s continuations and call/cc, I recommend this article:

http://community.schemewiki.org/?call-with-current-continuation

It’s really good; read it slowly and carefully.

Footnote #2:

It appears that there are continuation-based solutions for node.js:

https://github.com/laverdet/node-fibers


Viewing all articles
Browse latest Browse all 50

Trending Articles