MongoDB embedded vs. reference from performance perspective

1.Paging possible with $slice operator:

db.blogs.find({}, {posts:{$slice: [10, 10]}}) // skip 10, limit 10

2.Filtering also possible:

db.blogs.find({"posts.title":"Mongodb!"}, {posts:{$slice: 1}}) //take one post

3,4. Generally i guess you are speaking about small performance difference. It's not rocket science, it just blog with at most 1000 posts.

You said:

Is this the correct conclusion?

No, if you care about performance (in general if system will be small you can go with separate document).

I've done small performance test regarding 3,4, here is results:

-----------------------------------------------------------------| Count/Time |  Inserting posts   | Adding to nested collection |-------------|--------------------------------------------------               |   1        |   1 ms             |  28 ms                      ||   1000     |   81 ms            |  590 ms                     ||   10000    |   759 ms           |  2723 ms                    | ---------------------------------------------------------------

performance database-design mongodb

As for 3 & 4, if you are inserting into a nested document, it is basically an update.

This can be terribly bad for your performance because inserts are generally appended to the end of the data which works fine and fast. Updates, on the other hand, can be much trickier.

If your update does not change the size of a document (meaning that you had a key\value pair and simply changed the value to a new value that takes up the same amount of space) then you will be ok but when you start modifying documents and adding new data, a problem arises.

The problem is that while MongoDB allots more space than it needs for each document, it may not be enough. If you insert a document that is 1k large, MongoDB may allot 1.5k for the document to ensure that minor changes to the document have enough space to grow. If you use more than the allocated space, MongoDB has to fetch the entire document and re-write it at the tail end of the data.

There is obviously a performance implication in fetching and re-writing the data which will be amplified by the frequency of such an operation. To make matters worse, when this happens you end up leaving holes or pockets of unused space in your data files.

This ultimately gets copied into memory which means that you may end up using 2GB of RAM to store your data set, while in reality the data itself only takes up 1.5GB because there are .5GB worth of pockets. This fragmentation can be avoided by doing inserts as opposed to updates. It can also be fixed by doing a database repair.

In the next version of MongoDB there will be an online compaction function.

performance database-design mongodb

You can paging with '$slice' on embedded element
You can search with "field1.field2": /aRegex/ with aRegex is the word you search. But take care of performance.

About 3. and 4. I have no proof data.

BTW 2 collections can be easier to code/use/manage. And you can simply register blogId in each 'blog' document and add "blogId":"1234ABCD" in all your query

CodeHunter

MongoDB embedded vs. reference from performance perspective

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last