Finding mongoDB records in batches (using mongoid ruby adapter) Finding mongoDB records in batches (using mongoid ruby adapter) mongodb mongodb

Finding mongoDB records in batches (using mongoid ruby adapter)


With Mongoid, you don't need to manually batch the query.

In Mongoid, Model.all returns a Mongoid::Criteria instance. Upon calling #each on this Criteria, a Mongo driver cursor is instantiated and used to iterate over the records. This underlying Mongo driver cursor already batches all records. By default the batch_size is 100.

For more information on this topic, read this comment from the Mongoid author and maintainer.

In summary, you can just do this:

Model.all.each do |r|  Sunspot.index(r)end


If you are iterating over a collection where each record requires a lot of processing (i.e querying an external API for each item) it is possible for the cursor to timeout. In this case you need to perform multiple queries in order to not leave the cursor open.

require 'mongoid'module Mongoid  class Criteria    def in_batches_of(count = 100)      Enumerator.new do |y|        total = 0        loop do          batch = 0          self.limit(count).skip(total).each do |item|            total += 1            batch += 1            y << item          end          break if batch == 0        end      end    end  endend

Here is a helper method you can use to add the batching functionality. It can be used like so:

Post.all.order_by(:id => 1).in_batches_of(7).each_with_index do |post, index|  # call external slow APIend

Just make sure you ALWAYS have an order_by on your query. Otherwise the paging might not do what you want it to. Also I would stick with batches of 100 or less. As said in the accepted answer Mongoid queries in batches of 100 so you never want to leave the cursor open while doing the processing.


It is faster to send batches to sunspot as well.This is how I do it:

records = []Model.batch_size(1000).no_timeout.only(:your_text_field, :_id).all.each do |r|  records << r  if records.size > 1000    Sunspot.index! records    records.clear  endendSunspot.index! records

no_timeout: prevents the cursor to disconnect (after 10 min, by default)

only: selects only the id and the fields, which are actually indexed

batch_size: fetch 1000 entries instead of 100