“Non-blocking”, “scalable” and “high performance” – can they all be achieved on a single web solution?
A non-blocking web server is capable of serving multiple requests at the same time, be it per process or per thread. It is usually achieved by using a non-blocking IO. The solution stands in contrast to how a standard web server handles multiple concurrent requests. When a request arrives to the synchronous web server, the request gets assigned to one of the workers which then processes the request from start to finish. When the server takes time to display a page by retrieving it from storage or to make some calculations and save the results to a database, the web server is blocked on the request processed and is effectively ignoring all other calls.
A real-life metaphor
To clarify, we might use a real-life metaphor of visiting a fast food restaurant vs. paying a visit to a bank. In a fast food restaurant, when you reach the end of the queue you place your order with the cashier. While the cashier is selecting the dishes on the touch screen menu, somebody else can see the order and starts working on it. Having sorted out the payment, you step aside and thus allow the cashier to service the next person in the queue. Hopefully, you will soon get your meal.
Now, let’s imagine a visit to a bank. You are the first in a line, talking to the clerk. The procedure requires the clerk to fill in a form for you. The other people queuing are forced to wait until the clerk has finished serving you. Definitely not much fun – like poor and frustrating user experience on a web application.
Coming up with a solution
Referring back to our metaphor, how could we possibly fix or, at least greatly improve, the situation in the bank? Assuming the people involved cannot process the tasks faster, you can speed things up by choosing one of the two options:
1. Open more windows to process more clients at the same time and thus reduce the length of queues.In the web servers world, this would mean spawning additional worker processes (and possibly killing them as the traffic decreases). Such an approach requires extra management work and does not address the root of the problem – the extra processes spend much time waiting.
- Alternatively, we might change the workflow in our bank so that when a client is about to fill in a form, he gets a pencil and is politely asked to come back after he has done the job or if he encounters a problem. While the client is busy with the task, the clerk is free to server another client.
And this is precisely where a non-blocking web server architecture steps in. In this approach, a single thread manages multiple requests by interleaving non-blocking IO calls for all the requests. There is no waiting / idle time here. An asynchronous read does not make the process wait for the back-end to send the data. The process is resumed after a new request is accepted as long as there is nothing to read from the node. The mechanism frees the resources of the process, so the latter can go on to look for work elsewhere, e.g. to accept a new request, to read from another client, or to check if the database has returned the results.
So what are the options available? Well, it depends on where you are coming from as well as on your specific needs and preferences. For this article, I have decided to take a tour of what is available for Python, Java, Scala and Ruby programmers. I have also added some notes on the solutions based on Node.js and Erlang.
Python developers seem to be very lucky. They seem to have a number of choices to pick from and there are some noteworthy examples of production usage I will get back to soon. The reason for this, in my opinion, is the relative maturity of the programming language, its decent performance and the implementation of the Global Interpreter Lock that handles the concurrent connections in a single thread.
There exists a great comparison of asynchronous servers for Python compiled by Nicholas Piël; you can find it on his blog. I will not go into the details and will limit myself to mentioning only the most important and/or innovative examples. Interestingly enough, the variety of options translates into a variety of implementations, as each non-blocking Python framework often achieves asynchronous concurrency in their own way based on specific patterns. Twisted and Tornado are at the top of the Piël’s comparison list and should most probably be adopted as the starting reference points for anyone who is considering a solution in Python.
I cannot resist mentioning Eventlet. “Eventlet is a concurrent networking library for Python that allows you to change how you run your code, not how you write it” – to use Eventlet creators’ very own words. Neat and powerful. I find the approach to make asynchronous code look like synchronous code to be the most powerful approach out there; it is easy to get started – you do not have to dive into the non-blocking IO theory to begin working with the framework and can still be productive. There is a huge difference between rewriting your code (Tornado) and adding a few lines to your code (Eventlet) in order to leverage asynchronous IO.
So how about a “big” usage case of a non-blocking web server written in Python? In 2009, Facebook developers announced in a blog post that they were open sourcing Tornado. Originally, it was a piece of infrastructure that powered FriendFeed’s real-time functionality; later, when Facebook acquired FriendFeed, FB used Tornade to implement their own features based on real-time updates.
Some argue you cannot build a good asynchronous IO web server with Ruby since Ruby is not fast enough when it comes to performance and, also, due to certain implementation issues, such as, for example, Ruby’s implementation of GIL (setting aside JRuby which uses JVM as a running environment). Still, there is Goliath, an asynchronous open source web server built using the Event Machinelibrary which provides event-driven IO incorporating the Reactor pattern. Goliath was designed for speed and with the focus on the following features: bare metal performance, Rack API and middleware support, simple configuration, fully asynchronous processing, as well as readable and maintainable code. It was possible to achieve the above thanks to a feature of Ruby 1.9: Fibers. For each request, Goliath creates a Fiber which can be managed (e.g. paused or resumed) by Event Machine callbacks on IO operations.
What is great about Goliath’s implementation strategy is that the implementation of the non-blocking server often relies on a callback pattern. Though callbacks may not be complex in themselves, they can lead to complicated and hard-to-maintain code. It is not the case with Goliath. As regards Goliath, callback pattern is completely hidden from the developer. As a result, a new developer who starts to work on a Goliath-based project maybe completely unaware of the fact that they are actually writing asynchronous code with a typical top-down flow. We saw a similar approach when discussing Python’s Eventlet.
Another great thing about Goliath is that it has had a long track record in a production environment; most notably, Goliath has been successfully used at Postrank, the largest aggregator of social engagement data in the industry, serving more than 500 requests per second, with uptime measured in months.
When looking for an asynchronous web server solution written in Ruby, you are not limited to Goliath. On 29 March 2012, Engine Yard released the 1.0 version of the web server called Puma, advertised as “a modern, concurrent web server for Ruby”. Puma, like many other Ruby web server implementations, is derived from Mongrel. Since Mongrel was written in the pre-rack world and almost the entire Ruby world transitioned to using Rack as the primary interface for web apps, the decision was made to cut out all the unnecessary abstractions and support the Rack interface directly (yes, this means that Puma is designed to run exclusively Rack apps). The second important thing about Puma is that it is strongly influenced by the concurrency handling patterns found in Ruby implementations like Rubinius and JRuby. Although the current version of Puma runs with all Ruby implementations, it will perform best when paired with Rubinius or JRuby, where genuine concurrency can be achieved.
As stated, Ruby’s Goliath and Python’s Eventlet hide callback and non-blocking patterns from developers and thus enable them to write code in a traditional top-down style. Node.js is meant to aid developers in a different way – it puts them in a kind of non-blocking “jail” and guards them against the things they should not do, such as developing synchronous IO code. That is safe in a way. However, since you still have to write code in an obscure callback way, I would, personally, never trade the ability to use a non-blocking technique (Eventlet or Goliath) for such “safety”. Developers need to weigh the pros and cons.
With all the hype about Node.js in the air, I am compelled to kill some myths about it. First, the non-blocking myth. The IO is non-blocking, but CPU bound calls are blocking. Having a single thread to handle the callbacks is a neat idea as long as all you need to worry about is delays between calling and getting responses from the database, the file system or the network. Large portions of client-server communication do not burden the CPU and there is no need to worry. But sometimes what you request involves a lot of number crunching performed by the CPU. If that is the case, a Node.js thread will lock your entire server during the time such a request is processed, as noted by Ted Dziuba or other bloggers.
The third issue, may not be a myth but an over-promise: scalability potential of Node.js. Let us drill into the issue. To scale the app, you can, for example, share a server socket between instances with sendmsg and perform “load balancing” of incoming connections in the kernel; you may as well go the traditional IPC way. However, there is no JS worker process pool which is trivial to invoke and creating the pool is like reinventing the wheel of a traditional UNIX threaded environment. Things start to look complex and it is hard to reconcile the impression with the Node’s “marketing good news” for ”less-than-expert programmers”. The plugin: Cluster – an extensible multi-core server management for Node.js, may help keep the promise. I am not sure if it is production ready. It might be a good idea to feed it into the core of the framework, not just leave it as a plugin.
The point is that with all the hype that comes with a new technology, it is desirable to perform an occasional sanity check; programmers are inherently tempted to try and solve problems with new tools (or should I say “new toys”;-). So make sure you use the right tool for the job: if you have many users and the service is something like a live chat or a file upload service, Node.js is a good choice. If you hog your CPU with each request, you should probably consider other options.
So, is Node.js used in any well-known production systems? One notable case is the LinkedIn application for mobile devices. It is mostly HTML5 embedded in a native package with the entire server side developed in Node. The team started off with a “legacy” Ruby On Rails application. Since, however, the app was not meant to perform any massive data analytics (e.g. CPU-burdening live calculation of suggested connections) and it just needed to communicate well with other services like the LinkedIn platform’s API and the database, Node.js turned out to be a great performer – in fact, much better than Ruby on Rails. It is nice that the LinkedIn mobile development team agreed to share their account on their Node.js performance tuning.
Java and Scala
When discussing scalable and high performance solutions, we cannot omit the Java platform. Java 1.4 shipped with an alternative version of IO API, the NIO, which stands for New IO. The difference between IO and NIO is that in the standard Java IO library you operate with byte streams and character streams while in NIO you deal with channels and buffers, which enables you to do asynchronous IO. You can find Node.js style event-driven asynchronous web server written in Java, for instance, Deft (inspired by Python’s Tornado server), but that is not the only reason why Java and Scala have found their way into this overview.
If you have some concerns about emerging technologies like Node.js (dating back to 2010), you can consider something that has been optimized for more than ten years: JVM. JVM scales, right out of the box, to thousands of users, which means you do not have to do much to take advantage of multiple cores in your production machine. If you want to explore Java, there are tools like Grizzly. A quote from the Grizzly project site casts some light on the benefits provided: “Grizzly’s goal is to help developers to build scalable and robust servers using NIO and we are also offering extended framework components: Web Framework (HTTP/S), Bayeux Protocol, Servlet, HttpService OSGi and Comet.”
Scala takes it a bit further. It runs on the Java platform and is compatible with the existing Java programs; because of the same compilation model, with some exceptions, Scala code can be decompiled to readable Java code. That is why “many existing companies who depend on Java for business critical applications are turning to Scala to boost their development productivity, applications scalability and overall reliability,” as mentioned on the Scala project page.
You can find some good web frameworks for Scala – they will also work with Java, by the way. One of them is Typesafe Stack. Typesafe 2.0 consists of a Play web framework as well as some interesting Akka middleware. Akka philosophy is simple: because “threads and nonblocking IO are complex and error-prone to work with by hand”, Akka’s implementation of the Actor concurrency model frees the developer from thinking about how to scale out (that is the job of the framework!), and allows them to focus on the business logic.
Akka developers claim that “on a commodity machine, you might run several million Actors — quite a step up from mere thousands of threads in a traditional Java application.” Furthermore, they say that “Akka is an ideal platform for asynchronous event-driven architectures, and can be configured to use non-blocking IO frameworks.” The set of features makes Scala into an alternative to Node.js in some business applications.
One of the prominent usage cases of Scala is Twitter. Twitter was originally a Ruby on Rails startup. Today Rails on the app is limited to the front-end. As Twitter grew, the Twitter development team attempted to tune up Rails to handle the increasing number of requests per second; they eventually gave up the attempts (Ruby Fibers were not available at the time) and re-wrote some parts of the back-end in Scala. You can find out more about the case in the interview with the Twitter engineers involved in the project.
It is when Facebook rolled out the live chat functionality that Erlang attracted some attention in the world of scalable and responsive web applications. Erlang is another programming language where we encounter the Actor model (just like in Java) – an independent lightweight process that communicates with other processes via message passing and a scheduler. That means you do not get a loop that stalls your entire program; Erlang badmatch takes down an Erlang process and notifies the other connected processes of what has just happened, unlike in C, for example, where an occurring error takes down the OS process and the current server state.
It is also different from the Node.js continuation-passing style, where there is a model of information flow which makes certain parts of the system concurrent without the programmer being aware of the fact; under the hood there is no real concurrency.
You can choose to study this detailed case about the introduction of the life chat functionality for Facebook users. To grasp the scale of the problem Facebook developers faced, imagine the case of sending a notification to all friends whenever a user goes online or offline. When we take into account the average friendlist size, user online peaks and the frequency of going online/offline, it no longer sounds simple.
It is noteworthy that the Facebook live chat implementation used MochiWeb – an Erlang library for building lightweight HTTP servers – to handle the massive number of long polling connections.
So what should I choose for my app?
There we are with the main available options at hand – a number of different languages, approaches and a variety of frameworks and web servers. You may now need to to make a choice. Before you do, let me present some final thoughts which will hopefully make the choice a bit easier.
First, no technology is a silver bullet. You most probably have an idea of what you want to build and are thus able to come up with the requirements; the latter should be confronted with the strengths and weaknesses of different technologies. Second, do not blindly believe benchmarks you can find on the Net; do not treat them as the only / the most important factor to consider while picking up one solution over another. Sometimes it may be reasonable to give in a little on one dimension and gain a lot more on another. For example, you choose to lose 5% on performance to significantly reduce the workload and the time needed for implementation.
Try to benchmark for your specific solution rather than rely on generic benchmark graphs found on the Internet. The results may differ significantly. You will have some requirements / expectations concerning the volume of traffic and the environment your web application is going to handle. You can now use such knowledge to quickly build a very simple application mimicking the behavior of your real solution. You can further use such a test app to generate your own graphs and compare the results for different technologies; check the two cases as examples of how the approach can be applied in practice: case 1 and case 2.
They are both perfect illustrations of the approach I advocate – do not just jump onto a technology bandwagon simply because of the Hype around it; carefully check the options available, consider the pros and cons of various technologies in your specific context. Some established, less sexy technologies with a long track record of production and serious usage cases, may be an appropriate solution.
Secondly, keep in mind the rule of thumb in the programming world: it is not a good idea to optimize your code prematurely. Your best bet is to first build something that simply works and only then scale it, if needed. Have a general plan of how to scale in the future. You may even plan to scale the application by rewriting some parts of it in another better performing (though less flexible) programming language.
Web apps evolve dynamically and are often fundamentally changed before they emerge as successful products / services on the Internet. Consequently, you need to ensure enough flexibility – the agility of dynamic languages and frameworks, like Ruby on Rails and Django turn out to be very beneficial in such circumstances. When you happen to reach the point where the need for performance and scalability sufficiently outweighs the need for flexibility, e.g. when you grow to the size of Facebook or Twitter, you will be in an enviable position to afford to scale some parts of your service with, e.g. Java / Erlang. The relative value of flexibility vs. performance/scalability changes throughout the lifecycle of your product / service and so does the relative strength of the technologies available. You select and modify your technology stack based on the current / changing needs of your web venture.
You also do not need to lose Java’s non-blocking IO library or multi-threading when you opt for the productivity and flexibility of Ruby. Use Rubinius – it has an API that provides concurrency without any mutexes or locking when sharing a state between threads. Alternatively, use JRuby, which runs in the JVM environment and ensures all the benefits associated with the fact.
What is more, you do not need to choose either events or threads; put your Rubinius or JRuby web application on the event loop server like nginx. If you want to explore the topics of “threads versus events in Ruby” or “scaling a Ruby webservice,” check these two great presentations: “High Performance Ruby: Threading versus Evented” and “How to scale a Ruby webservice.”
Imagine you started to develop your app with Ruby on Rails. Your app evolves and eventually becomes successful and “big” (TM), but it has also become monolithic on the way. What do you do? You can isolate the underperforming parts of the business logic in your application and create web service apps to cover such parts, using, for instance, Goliath for this purpose. In this way you leverage a solution designed to serve many requests per second. The approach also enables you to apply the resources available where they are needed the most.
Special thanks to Michał Czyż for his feedback on this article.