GridFS use filename as index GridFS use filename as index mongodb mongodb

GridFS use filename as index


To address your questions:

1) When you initialize a GridFS collection using the Java driver, that driver will automatically create indexes on the .files and the .chunks collections.

2) MongoDB requires that you have an '_id' field and a unique '_id' index. The default '_id' is only 12 bytes long -- there's really no significant overhead from having it present.

Reference: http://www.mongodb.org/display/DOCS/Object+IDs

3) The stats on the "filename_1_uploadDate_1" index only indicate the size of the index. This index contains only the contents of the filename and the upload data fields - it does not contain any of the photo data itself. You want to have the active portion of the index fit in RAM for performance reasons.

References:

4) If you want to have advanced statistics and monitoring, enroll your system in the free MMS monitoring system provided by 10gen. For more information, start here: https://mms.10gen.com/help/

5) Page faults are normal when loading in new data. MongoDB uses memory-mapped files, so every time you write to a new location within the data file, the OS will need to fault in that page.

For more information about memory mapped files, look here: http://docs.mongodb.org/manual/faq/storage/

6) The MongoDB Java driver provides its own connection pool. Unless you're doing a really high-performance application, you're probably best off using the Mongo object as a singleton.


Looks like you have to have _id field in each 'regular' document:

http://www.mongodb.org/display/DOCS/Object+IDs

If you don't specify how it is generated, MongoDB will auto-generate it using BsonObjectId datatype and also automatically create an index on it..It is because Mongo is sure about the uniqueness of this field. But if you don't want to use it..like in your case, you can put filename+dateupload in _id field and let Mongo handle the index on it..

Also, what you have mentioned about..the 125084624 thing, that's the size of the index on _id. Total size of your photos might be much more.. 125MB in the RAM looks harmless to me.
I don't know how you could better investigate faults, but..I'm assuming you are using 64-bit. If it's 32 bit, then DB size is limited to 2GB..Your inserts will start failing at some point before that..

Anyway, regarding connections, try and test with a few requests, once with individual connections and once with singleton.. I'm guessing a singleton should perform better. To test the performance, or carry out a load-test, you might use Jmeter:

http://jmeter.apache.org/