MongoDB bulkWrite multiple updateOne vs updateMany MongoDB bulkWrite multiple updateOne vs updateMany mongodb mongodb

MongoDB bulkWrite multiple updateOne vs updateMany


Short answer:

Using updateMany is at least twice faster, but might accidentally update more documents than you intended, keep reading to learn how to avoid this and gain the performance benefits.

Long answer:

We ran the following experiment to know the answer for that, the following are the steps:

  1. Create a bankaccounts mongodb collection, each document contains only one field (balance).
  2. Insert 1 million documents into the bankaccounts collection.
  3. Randomize the order in memory of all 1 million documents to avoid any possible optimizations from the database using ids that are inserted in the same sequence, simulating a real-world scenario.
  4. Build write operations for bulkWrite from the documents with a random number between 0 and 100.
  5. Execute the bulkWrite.
  6. Log the time the bulkWrite took.

Now, the experiment lies in the 4th step.

In one variation of the experiment we build an array consisting of 1 million updateOne operations, each updateOne has filter for a single document, and its respective `update object.

In the second variation, we build 100 updateMany operations, each including filter for 10K documents ids, and their respective update.

Results:updateMany with multiple documents ids is 243% faster than multiple updateOnes, this can not be used everywhere though, please read "The risk" section to learn when it should be used.

Details:We ran the script 5 times for each variation, the detailed results are as follows:With updateOne: 51.28 seconds on average.With updateMany: 21.04 seconds on average.

The risk:As many people have already pointed out, updateMany is not a direct substitute to updateOne, since it can incorrectly update multiple documents when our intention was to really update only one document.This approach is only valid when you're using a field that is unique such as _id or any other field that is unique, if the filter is depending on fields that are not unique, multiple documents will be updated and the results will not be equivalent.

65831219.js

// 65831219.js'use strict';const mongoose = require('mongoose');const { Schema } = mongoose;const DOCUMENTS_COUNT = 1_000_000;const UPDATE_MANY_OPERATIONS_COUNT = 100;const MINIMUM_BALANCE = 0;const MAXIMUM_BALANCE = 100;const SAMPLES_COUNT = 10;const bankAccountSchema = new Schema({  balance: { type: Number }});const BankAccount = mongoose.model('BankAccount', bankAccountSchema);mainRunner().catch(console.error);async function mainRunner () {  for (let i = 0; i < SAMPLES_COUNT; i++) {    await runOneCycle(buildUpdateManyWriteOperations).catch(console.error);    await runOneCycle(buildUpdateOneWriteOperations).catch(console.error);    console.log('-'.repeat(80));  }  process.exit(0);}/** * * @param {buildUpdateManyWriteOperations|buildUpdateOneWriteOperations} buildBulkWrite */async function runOneCycle (buildBulkWrite) {  await mongoose.connect('mongodb://localhost:27017/test', {    useNewUrlParser: true,    useUnifiedTopology: true  });  await mongoose.connection.dropDatabase();  const { accounts } = await createAccounts({ accountsCount: DOCUMENTS_COUNT });  const { writeOperations } = buildBulkWrite({ accounts });  const writeStartedAt = Date.now();  await BankAccount.bulkWrite(writeOperations);  const writeEndedAt = Date.now();  console.log(`Write operations took ${(writeEndedAt - writeStartedAt) / 1000} seconds with \`${buildBulkWrite.name}\`.`);}async function createAccounts ({ accountsCount }) {  const rawAccounts = Array.from({ length: accountsCount }, () => ({ balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }));  const accounts = await BankAccount.insertMany(rawAccounts);  return { accounts };}function buildUpdateOneWriteOperations ({ accounts }) {  const writeOperations = shuffleArray(accounts).map((account) => ({    updateOne: {      filter: { _id: account._id },      update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }    }  }));  return { writeOperations };}function buildUpdateManyWriteOperations ({ accounts }) {  shuffleArray(accounts);  const accountsChunks = chunkArray(accounts, accounts.length / UPDATE_MANY_OPERATIONS_COUNT);  const writeOperations = accountsChunks.map((accountsChunk) => ({    updateMany: {      filter: { _id: { $in: accountsChunk.map(account => account._id) } },      update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }    }  }));  return { writeOperations };}function getRandomInteger (min = 0, max = 1) {  min = Math.ceil(min);  max = Math.floor(max);  return min + Math.floor(Math.random() * (max - min + 1));}function shuffleArray (array) {  let currentIndex = array.length;  let temporaryValue;  let randomIndex;  // While there remain elements to shuffle...  while (0 !== currentIndex) {    // Pick a remaining element...    randomIndex = Math.floor(Math.random() * currentIndex);    currentIndex -= 1;    // And swap it with the current element.    temporaryValue = array[currentIndex];    array[currentIndex] = array[randomIndex];    array[randomIndex] = temporaryValue;  }  return array;}function chunkArray (array, sizeOfTheChunkedArray) {  const chunked = [];  for (const element of array) {    const last = chunked[chunked.length - 1];    if (!last || last.length === sizeOfTheChunkedArray) {      chunked.push([element]);    } else {      last.push(element);    }  }  return chunked;}

Output

$ node 65831219.jsWrite operations took 20.803 seconds with `buildUpdateManyWriteOperations`.Write operations took 50.84 seconds with `buildUpdateOneWriteOperations`.----------------------------------------------------------------------------------------------------

Tests were run using MongoDB version 4.0.4.


At high level, if you have same update object, then you can do updateMany rather than bulkWrite

Reason:

bulkWrite is designed to send multiple different commands to the server as mentioned here

If you have same update object, updateMany is best suited.

Performance:

If you have 10k update commands in bulkWrite, it will be executed batch manner internally. It may impact on the execution time

Exact lines from the reference about batching:

Each group of operations can have at most 1000 operations. If a group exceeds this limit, MongoDB will divide the group into smaller groups of 1000 or less. For example, if the bulk operations list consists of 2000 insert operations, MongoDB creates 2 groups, each with 1000 operations.

Thanks @Alex