How should I set up mongodb cluster to handle 20K+ simultaneous How should I set up mongodb cluster to handle 20K+ simultaneous mongodb mongodb

How should I set up mongodb cluster to handle 20K+ simultaneous


Aout your cluster architecture :

Running several instances of mongod on the same server is usually not a good idea, do you any particular reason to do this ? The primary server of each shard will put some heavy pressure on your server, the replication also add io pressure, so mixing them won't be really good for performance. IMO, you should rather have 6 shards (1 master - 2 secondaries) and give each instance their own server. (Conf and arbiter instance are not very resources consomming so its ok to leave them on the same servers).


Sometimes the limits don't apply to the process itself. As a test go onto one of the servers and get the pid for the mongo service you want to check on by doing

ps axu | grep mongodb

and then do

cat /proc/{pid}/limit

That will tell you if the limits have taken effect. If the limit isn't un effect then you need to specify the limit in the startup file and then stop - start the mongo service and test again.

A sure-fire way to know if this is happening is to tail -f the mongo log on a dying server and watch for those "too many files" messages.

We set our limit to 20000 per server and do the same on all mongod and mongos instances and this seems to work.


We're running a 4-shard replicaset on 4 machines. We have 2 shard primaries on 2 hosts, 2 shard replicas on the other 2 boxes, arbiters and config servers spread out).

We're getting messages:

./checkMongo.bash: fork: retry: Resource temporarily unavailable./checkMongo.bash: fork: retry: Resource temporarily unavailable./checkMongo.bash: fork: retry: Resource temporarily unavailableWrite failed: Broken pipe 

Checking ulimit -a:

core file size          (blocks, -c) 0data seg size           (kbytes, -d) unlimitedscheduling priority             (-e) 0file size               (blocks, -f) unlimitedpending signals                 (-i) 773713max locked memory       (kbytes, -l) 64max memory size         (kbytes, -m) unlimitedopen files                      (-n) 4096pipe size            (512 bytes, -p) 8POSIX message queues     (bytes, -q) 819200real-time priority              (-r) 0stack size              (kbytes, -s) 10240cpu time               (seconds, -t) unlimitedmax user processes              (-u) 1024virtual memory          (kbytes, -v) unlimitedfile locks                      (-x) unlimited   

Okay, so we're possibly hitting a process limit because of the fork message. Here's how to check that:

$ ps axo pid,ppid,rss,vsz,nlwp,cmd | egrep mongo27442     1 36572   59735772 275 /path/mongod --shardsvr --replSet shard-00 --dbpath /path/rs-00-p --port 30000 --logpath /path/rs-00-p.log --fork27534     1 4100020 59587548 295 /path/mongod --shardsvr --replSet shard-02 --dbpath /path/rs-02-p --port 30200 --logpath /path/rs-02-p.log --fork27769     1 57948   13242560 401 /path/mongod --configsvr --dbpath /path/configServer_1 --port 35000 --logpath /path/configServer_1.log --fork

So, you can see the mongod's have 275, 295, and 401 subprocesses/threads each. though I'm not hitting a limit now, I probably was earlier. So, the solution: change the system's ulimit for the user we're running under from 1024 to 2048 (or even unlimited). You can't change via

ulimit -u unlimited

unless you sudo first or something; I don't have privs to do that.