Spark rdd.count() yields inconsistent results

mongodb scala hadoop apache-spark cluster-computing

As you already spotted, the problem does not appear to be with spark (or scala) but with MongoDB.

As such the question regarding the difference seems to be resolved.

You will still want to troubleshoot the actual MongoDB error, the provided link can be a good starting point for that: http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember

mongodb scala hadoop apache-spark cluster-computing

count returns an estimated count. As such, the value returned can change even if the number of documents hasn't changed.

countDocuments was added to MongoDB 4.0 to provide an accurate count (that also works in multi-document transactions).

CodeHunter