MongoDB Sharding Error in Production MongoDB Sharding Error in Production mongoose mongoose

MongoDB Sharding Error in Production


Looks like the secondary was down for a very long period of time and now it can't come in sync with the primary. This sync requires the oplog to contain all the writes going to the primary during the secondary's down-time. If the secondary has been down for too long, the records might have been rolled out of the oplog since it is a "capped" collection.You need to do a full resyc:

http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member

Thereafter, consider increasing the oplog size to avoid a similar situation in future.


Aafreen's answer is correct and his advice is good.

Just to note a few things when sizing your oplog so that the RS102 does not re-occur.

The oplog size is going to depend on how much data you change and how often. It is very much application dependent (have a think about what your normal write patterns are). You basically want an oplog which is many times your largest time to recovery on failure or maintenance window.

Oplog

The oplog is a capped collection that stores all operations that modify the data stored in MongoDB. All members of the replica set have oplogs that allow them to maintain the current state of the database. Unless you modify the size of your oplog with the oplogSize option, the default size of the oplog will be as follows:

  • For 64-bit Linux, Solaris, and FreeBSD systems, MongoDB will allocate5% of the available free disk space to the oplog.

    If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space.

  • For 64-bit OS X systems, MongoDB allocates 183 megabytes of space tothe oplog.

    For 32-bit systems, MongoDB allocates about 48 megabytes of space tothe oplog.

As I mentioned above, there's no formula per say, however, if you're performing a lot of writes (inserts/deletes/updates) then you may want a larger oplog (than 5%) whereas if it's mostly reads, you could possibly get away with less than 5%, it really depends on your app.

Here's another introductory link to sizing oplog, which may help explain things a little more and I also recommend reading the Replication Fundamentals document.

The oplog on the primary is the most important and it is recommended that all oplogs (in the replica set) are the same size.