MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems mysql mysql

MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems


I work on a large software system that has done both mechanisms for storing attachments and other content. The first iteration of the system stored all data in BLOBs in the DB. I cursed it at the time. As a programmer, I could write side scripts to immediately operate on the data and change it whenever I wanted to.

Advance about 10 years and I still manage the same software but the architecture has changed and it was written with filesystem pointers. I curse it now and wish it were back in the DB. I have the added benefit of several years and having worked this application in much greater capacity in many more and many larger situations, I feel my opinion now is better educated. Promotion or system migration of the application requires extensive scripting and copying of millions of files. On one occasion we changed the OS and all the file pointers had the wrong directory separator, or the server name changes where the file was and we had to write and schedule simple SQL update statements with the DBA on the weekend to fix. Another is that the filesystem and DB records get out of sync, why is uncertain but after thousands of days of operation, sometimes non-transactional systems (filesystem and DB don't share transactional contexts) simply become out of sync. Sometimes files mysteriously go missing.

When all this was in the DB, migration or environment promotion was a matter of dump and import the DB. Row changes could be properly audited, everything in sync and logs can be replayed to point-in-time if necessary. Sure the DB gets big, but it is 2011 and this stuff is simply not a challenge for databases.

For what it is worth we had some similar issues with large data buffers when streaming some data, but A) we could pump the data in byte buffers with the Input|OutputStreams in JDBC and B) when using other tools, we wrote a stored procedure that would chunk the BLOB into a temp table and iteratively serve the chunks from the temp table. Works great.

I don't care what the technical reason for not putting this stuff in the DB, but it is so much easier to manage in a consolidated location I could double and triple the hardware or grid the DB for the time wasted by consultants and customers just in a short period of time managing the disparate files.


Update: go easy on the commenters, they're just giving their opinion on the matter.