Server architecture for a scalable web application Server architecture for a scalable web application elasticsearch elasticsearch

Server architecture for a scalable web application


if our architecture might have any design flaws.

Well, keep in mind that we can't tell much from a generic diagram. But here are some notes:

1) MongoDB isn't as easy to scale as other databases such as DynamoDB, Riak or Cassandra. For example, if you ever exceed the capacity of a single master (no matter how many slaves you have, all writes go to the single master), you'll have to shard. But switching to sharding is very disruptive and very tedious to set up.

If you don't expect to exceed the write capacity of one node, then you'll be fine on MongoDB.

2) What will you do for async tasks such as sending emails, creating long reports, etc?

It's possible to do these things in the request loop, and that's probably a fine way to get started. But as you have more boxes, the chances of failure go up. When a box dies, all the async tasks go away and nobody will know what they were. You also can have problems where one box gets heavily loaded with async tasks (using too much CPU or memory), and the problem will get worse and worse as it gets more tasks and completes them more slowly.

Also, a front-end like ELB will have a 60-second limit, which can cause problems if some of your requests could take longer. (Spin them off into async jobs with polling or something.)

3) ELB doesn't support web sockets. Consider that if you think you might want websockets down the road.


There's no such thing as a master in elastic search. You have master copies of shards and replicas of shards but they are basically moved around through your cluster by elastic search. Nodes might be master for one shard and a replica for another. So, you could simply put a load balancer in front of it.

However, you can specialize nodes to be data nodes or routing nodes as explained here: http://www.elasticsearch.org/guide/reference/modules/node/

The routing nodes effectively become load balancers. You could have a few of those (redundancy) and distribute load between those. Alternatively, you could run a dedicated router node on each web server. Basically routing nodes are pretty light and you save a bit of bandwidth/latency since your web server talks to localhost and from there it is all elastic search internal cluster traffic.


I'd recommend to replace MongoDB with Amazon Dynamo DB (it has node.js SDK).