Finding mongoDB records in batches (using mongoid ruby adapter)
With Mongoid, you don't need to manually batch the query.
In Mongoid, Model.all
returns a Mongoid::Criteria
instance. Upon calling #each
on this Criteria, a Mongo driver cursor is instantiated and used to iterate over the records. This underlying Mongo driver cursor already batches all records. By default the batch_size
is 100.
For more information on this topic, read this comment from the Mongoid author and maintainer.
In summary, you can just do this:
Model.all.each do |r| Sunspot.index(r)end
If you are iterating over a collection where each record requires a lot of processing (i.e querying an external API for each item) it is possible for the cursor to timeout. In this case you need to perform multiple queries in order to not leave the cursor open.
require 'mongoid'module Mongoid class Criteria def in_batches_of(count = 100) Enumerator.new do |y| total = 0 loop do batch = 0 self.limit(count).skip(total).each do |item| total += 1 batch += 1 y << item end break if batch == 0 end end end endend
Here is a helper method you can use to add the batching functionality. It can be used like so:
Post.all.order_by(:id => 1).in_batches_of(7).each_with_index do |post, index| # call external slow APIend
Just make sure you ALWAYS have an order_by on your query. Otherwise the paging might not do what you want it to. Also I would stick with batches of 100 or less. As said in the accepted answer Mongoid queries in batches of 100 so you never want to leave the cursor open while doing the processing.
It is faster to send batches to sunspot as well.This is how I do it:
records = []Model.batch_size(1000).no_timeout.only(:your_text_field, :_id).all.each do |r| records << r if records.size > 1000 Sunspot.index! records records.clear endendSunspot.index! records
no_timeout
: prevents the cursor to disconnect (after 10 min, by default)
only
: selects only the id and the fields, which are actually indexed
batch_size
: fetch 1000 entries instead of 100