Spark rdd.count() yields inconsistent results Spark rdd.count() yields inconsistent results hadoop hadoop

Spark rdd.count() yields inconsistent results


As you already spotted, the problem does not appear to be with spark (or scala) but with MongoDB.

As such the question regarding the difference seems to be resolved.

You will still want to troubleshoot the actual MongoDB error, the provided link can be a good starting point for that: http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember


count returns an estimated count. As such, the value returned can change even if the number of documents hasn't changed.

countDocuments was added to MongoDB 4.0 to provide an accurate count (that also works in multi-document transactions).