Read large mongodb data

Your problem lies at the asList() call

This forces the driver to iterate through the entire cursor (80,000 docs few Gigs), keeping all in memory.

batchSize(someLimit) and Cursor.batch() won't help here as you traverse the whole cursor, no matter what batch size is.

Instead you can:

1) Iterate the cursor: List<MYClass> datalist = datasource.getCollection("mycollection").find()

2) Read documents one at a time and feed the documents into a buffer (let's say a list)

3) For every 1000 documents (say) call Hadoop API, clear the buffer, then start again.

java mongodb hadoop morphia

The asList() call will try to load the whole Mongodb collection into memory. Trying to make an in memory list object bigger than 3gb of size.

Iterating the collection with a cursor will fix this problem. You can do this with the Datasource class, but I prefer the type safe abstractions that Morphia offers with the DAO classes:

  class Dao extends BasicDAO<Order, String> {    Dao(Datastore ds) {      super(Order.class, ds);    }  }  Datastore ds = morphia.createDatastore(mongoClient, DB_NAME);  Dao dao = new Dao(ds);  Iterator<> iterator = dao.find().fetch();  while (iterator.hasNext()) {      Order order = iterator.next;      hadoopStrategy.add(order);  }

CodeHunter

Read large mongodb data

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last