How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write database database

How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write


I've used mongo gridfs for storing media files for a messaging system we built using Mongo so I can share what we ran into.

So before I get into this for your use case scenario I would recommend not using GridFS and actually using something like Amazon S3 (with excellent rest apis for multipart uploads) and store the metadata in Mongo. This is the approach we settled on in our project after first implementing with GridFS. It's not that GridFS isn't great it's just not that well suited for chunking/appending and rewriting small portions of files. For more info here's a quick rundown on what GridFS is good for and not good for:

http://www.mongodb.org/display/DOCS/When+to+use+GridFS

Now if you are bent on using GridFS you need to understand how the driver and read/write concurrency works.

In mongo (2.2) you have one writer thread per schema/db. So this means when you are writing you are essentially locked from having another thread perform an operation. In real life usage this is super fast because the lock yields when a chunk is written (256k) so your reader thread can get some info back. Please look at this concurrency video/presentation for more details:

http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2

So if you look at my two links essentially we can say quetion 2 is answered. You should also understand a little bit about how Mongo writes large data sets and how page faults provide a way for reader threads to get information.

Now let's tackle your first question. The Mongo driver does not provide a way to append data to GridFS. It is meant to be a fire/forget atomic type operation. However if you understand how the data is stored in chunks and how the checksum is calculated then you can do it manually by using the fs.files and fs.chunks methods as this poster talks about here:

Append data to existing gridfs file

So going through those you can see that it is possible to do what you want but my general recommendation is to use a service (such as Amazon S3) that is designed for this type of interaction instead of trying to do extra work to make Mongo fit your needs. Of course you can go to the filesystem directly as well which would be the poor man's choice but you lose redundancy, sharding, replication etc etc that you get with GridFS or S3.

Hope that helps.

-Prasith