Why is Node.js executing in this manner? Why is Node.js executing in this manner? mongoose mongoose

Why is Node.js executing in this manner?


Neil has done a great job of providing a solution, but I just wanted to touch on your question:

Can anyone tell me why Node would be doing this, and why switching to async.eachSeries fixes this problem?

If you look at the details to async.each vs async.eachSeries, you may notice that the documentation for async.each states:

Applies the function iterator to each item in arr, in parallel

However, async.eachSeries states:

The same as each, only iterator is applied to each item in arr in series. The next iterator is only called once the current one has completed. This means the iterator functions will complete in order.

In detail, if we look at the code you'll see that the code for each ends up calling the native forEach function on the array itself, and each element calls the iterator (link to source):

_each(arr, function (x) {    iterator(x, only_once(done) );});

which calls:

var _each = function (arr, iterator) {    if (arr.forEach) {        return arr.forEach(iterator);    }

However, each call to your iterator function ends up calling model.save. This Mongoose function (among other things) ends up performing I/O to save your data to the database. If you were to trace the code path, you'd see that it ends up in a function which calls process.nextTick (link to source).

Node's process.nextTick function is typically used in situations such as this (I/O), and will process the callback once the flow of execution has ended. In this situation, each callback will only be called once the forEach loop has completed. (This was purposeful, and meant to not block any code execution.)

So to sum up:

When using async.each, the code you have above will run through all your users, queuing up the saves, but only begin to process them once the code has completed iterating over all the users.

When using async.eachSeries, the code you have above will process each user one at a time, and only process the next user once the save has completed -- when the eachSeries callback has been called.


Well there certainly is a problem with throwing the kitchen sink at your process. It is doing essentially what you ask it to and therefore trying to asynchronously "spin up" all of these "save" operations at once. The basic reality is you only have so many connections to MongoDB that you can deal with so there is going to be a bottleneck somewhere when you do this.

A better approach than doing this in "series" if you don't actually need the operations to complete in an explicit order would be to use a "limit" on the amount of operations you are queuing up. There is async.eachLimit() to do exactly this.

The calling convention seems a bit odd here, so this seems a bit cleaner, to me at least:

async.eachLimit(users,500,function(user,callback){    var model = new Model({        id: user.id,        name: {            first: user.fname,            last: user.lname        }    });    model.save(function(err, model) {        console.log("saving user: " + model.id);        callback(err);    });}, function(err) {    if (err) {        console.log("there was a problem");    } else {        console.log("all successful");    }});

Or as basic translated coffeescript:

async.eachLimit users, 500, ((user, callback) ->  model = new Model(    id: user.id    name:      first: user.fname      last: user.lname  )  model.save (err, model) ->    console.log "saving user: " + model.id    callback err    return  return), (err) ->  if err    console.log "there was a problem"  else    console.log "all successful"  return

And the final callback will then process after all callbacks are returned, but you are "throttling" what you are throwing at mongoose and indeed MongoDB.

You also might want to look into the Bulk Operations API of MongoDB unless you explicitly need to use the "validation" functions or others from your model. This essentially allows you to send a "batch" of inserts at once, rather than sending each document to the database "one at a time".

Contrived example here, using eachSeries but the actual "writes" are grouped:

var async = require("async"),    mongoose = require("mongoose"),    Schema = mongoose.Schema;mongoose.connect('mongodb://localhost/test');var tenSchema = new Schema({  value: Number});var Ten = mongoose.model( "Ten", tenSchema, "ten" );var ten = [1,2,4,5,6,7,8,9,10];var pos = 0;mongoose.connection.on("open",function(err,conn) {  var bulk = Ten.collection.initializeOrderedBulkOp();  async.eachSeries(ten,function(item,callback) {    bulk.insert({ "value": item });    pos++;    if ( pos % 2 == 0 ) {      bulk.execute(function(err,res) {        pos = 0;        bulk = Ten.collection.initializeOrderedBulkOp();        callback(err);      });    } else {      callback();    }  },function(err) {    if (err)      throw err;    if ( pos != 0 ) {      bulk.execute(function(err,result) {        console.log("done");      });    } else {      console.log("done");    }  });});

So in your case just "up" the value to count for the modulo, say to 500 and that will process the array but only write to the database once every 500 items.

The only thing to be aware of is this is a native driver function and not using the mongoose API. So you need to be careful ( in the case of a migrations script or similar ) to make sure the current connection is established before referencing these methods. The contrived way here is looking for "open", but basically you just want to be sure, by other means normally.

You could get fancier with queues of parallel "bulk writes", but the general performance should be better than with any other method without going to further lengths.