ActiveRecord find_each combined with limit and order ActiveRecord find_each combined with limit and order ruby-on-rails ruby-on-rails

ActiveRecord find_each combined with limit and order


The documentation says that find_each and find_in_batches don't retain sort order and limit because:

  • Sorting ASC on the PK is used to make the batch ordering work.
  • Limit is used to control the batch sizes.

You could write your own version of this function like @rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.

Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.

Here is a minimal example similar to what I did:

batch_size = 512ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scopeids.each_slice(batch_size) do |chunk|    Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|      # Do things with thing    endend

The trade-offs to this solution are:

  • The complete query is executed to get the ID's
  • An array of all the ID's is kept in memory
  • Uses the MySQL specific FIELD() function

Hope this helps!


find_each uses find_in_batches under the hood.

Its not possible to select the order of the records, as described in find_in_batches, is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work.

However, the criteria is applied, what you can do is:

Thing.active.find_each(batch_size: 50000) { |t| puts t.id }

Regarding the limit, it wasn't implemented yet: https://github.com/rails/rails/pull/5696


Answering to your second question, you can create the logic yourself:

total_records = 50000batch = 1000(0..(total_records - batch)).step(batch) do |i|  puts Thing.active.order("created_at DESC").offset(i).limit(batch).to_sqlend


Retrieving the ids first and processing the in_groups_of

ordered_photo_ids = Photo.order(likes_count: :desc).pluck(:id)ordered_photo_ids.in_groups_of(1000, false).each do |photo_ids|  photos = Photo.order(likes_count: :desc).where(id: photo_ids)  # ...end

It's important to also add the ORDER BY query to the inner call.