Mongodb insert speed slower after sharding

Here is the possible culprit collection.bulkWrite(list);

In case of bulk writes, mongos needs to break up your batches into smaller batches that go to each shard.

As you haven't specified anything about the insertion order of docs in your batch, MongoDB must honor the requirement that the inserts happen in the order that they are specified. The consequence is that consecutive inserts can be batched if and only if they correspond to the same shard.

mongos maintains the original document order, hence only the consecutive inserts which belong to the same shard can be grouped together

For eg. Consider the case where "k" is the shard key. There are two shards, corresponding to ranges

[MinKey, 10], (20, MaxKey]

Now suppose we batch insert the following documents:

[{k: 1}, {k: 25}, {k: 2}]

Doc1 -> Shard1, Doc2 -> Shard2, Doc3 -> Shard3

No two consecutive documents belongs to the same shard, hence a call to getLastError is required after each document in this case.

In the case of Hashed keys, documents will be distributed more randomly among the shards. i.e. documents belonging the same shards may be more scattered and hence will create more number of batches The more random is the distribution, smaller the size of batches, more the number of total batches, higher the incurred cost for getLastError which effectively means poorer the performance.

FIX : specify "ordered: false".

collection.bulkWrite(list, new BulkWriteOptions().ordered(false));

This tells the database that you do not care about strictly preserving the order in which insertions take place. With "ordered: false", mongos will create a single batch per shard, obviating the extra getLastError calls. Each batch operation can be performed on the appropriate shard concurrently, without waiting for the getLastError response from the previous batch.

Also,

MongoClient mongoClient = new MongoClient(host, port);

Creates a Mongo instance based on a single mongodb node and will not be able to discover other nodes in your replica-set or sharded cluster.

In this case, all your write requests are routed to a single node which is being responsible for all the additional bookkeeping work because of sharded-cluster. What you should use is

MongoClient(final List<ServerAddress> seeds)

When there is more than one server to choose from based on the type of request (read or write) and the read preference (if it's a read request), the driver will randomly select a server to send a request. This applies to both replica sets and sharded clusters.
Note : Put as many servers as you can in the list and the system will figure out the rest.

java mongodb performance bulkinsert

Generally speaking, whenever you use a sharded solution you need to think that either:

Your client application will be cluster-aware and therefore able to do the routing itself
Your client application will contact intermediate nodes that perform the routing

My suspect is Mongo Client is not cluster-aware "automatically", meaning that it doesn't look-up the nodes belonging to the cluster if you do not specify them. This feeling is reinforced by the following:

The MongoDB official documentation for clustering and sharding introduces routers component (https://docs.mongodb.com/v3.2/core/sharded-cluster-components/)
The javadoc explicitly says to use the ServerAddress[] constructor to connect either to a replica set or a sharded cluster.

You can connect to a replica set using the Java driver by passing a ServerAddress list to the MongoClient constructor. For example:
MongoClient mongoClient = new MongoClient(Arrays.asList( new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019)));
You can connect to a sharded cluster using the same constructor. MongoClient will auto-detect whether the servers are a list of replica set members or a list of mongos servers.

CodeHunter

Mongodb insert speed slower after sharding

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last