Best way storing binary or image files Best way storing binary or image files database database

Best way storing binary or image files


There is no real best way, just a bunch of trade offs.

Database Pros:
1. Much easier to deal with in a clustering environment.
2. No reliance on additional resources like a file server.
3. No need to set up "sync" operations in load balanced environment.
4. Backups automatically include the files.

Database Cons:
1. Size / Growth of the database.
2. Depending on DB Server and your language, it might be difficult to put in and retrieve.
3. Speed / Performance.
4. Depending on DB server, you have to virus scan the files at the time of upload and export.


File Pros:
1. For single web/single db server installations, it's fast.
2. Well understood ability to manipulate files. In other words, it's easy to move the files to a different location if you run out of disk space.
3. Can virus scan when the files are "at rest". This allows you to take advantage of scanner updates.

File Cons:
1. In multi web server environments, requires an accessible share. Which should also be clustered for failover.
2. Additional security requirements to handle file access. You have to be careful that the web server and/or share does not allow file execution.
3. Transactional Backups have to take the file system into account.


The above said, SQL 2008 has a thing called FILESTREAM which combines both worlds. You upload to the database and it transparently stores the files in a directory on disk. When retrieving you can either pull from the database; or you can go direct to where it lives on the file system.


Pros of Storing binary files in a DB:

  • Some decrease in complexity since thedata access layer of your system needonly interface to a DB and not a DB +file system.
  • You can secure your files using thesame comprehensive permissions-basedsecurity that protects the rest ofthe database.
  • Your binary files are protectedagainst loss along with the rest ofyour data by way of database backups.No separate filesystem backup systemrequired.

Cons of Storing binary files in a DB:

  • Depending on size/number of files,can take up significant spacepotentially decreasing performance(dependening on whether your binaryfiles are stored in a table that isqueried for other content often ornot) and making for longer backuptimes.

Pros of Storing binary files in file system:

  • This is what files systems are goodat. File systems will handledefragmenting well and retrievingfiles (say to stream a video file tothrough a web server) will likely befaster that with a db.

Cons of Storing binary files in file system:

  • Slightly more complex data accesslayer. Needs its own backup system.Need to consider referentialintegrity issues (e.g. deletedpointer in database will need toresult in deletion of file so as tonot have 'orphaned' files in thefilesystem).

On balance I would use the file system. In the past, using SQL Server 2005 I would simply store a 'pointer' in db tables to the binary file. The pointer would typically be a GUID.

Here's the good news if you are using SQL Server 2008 (and maybe others - I don't know): there is built in support for a hybrid solution with the new VARBINARY(MAX) FILESTREAM data type. These behave logically like VARBINARY(MAX) columns but behind the scenes, SQL Sever 2008 will store the data in the file system.


There is no best way.

What? You need more info?

There are three ways I know of... One, as byte arrays in the database. Two, as a file with the path stored in the database. Three, as a hybrid (only if DB allows, such as with the FileStream type).

The first is pretty cool because you can query and get your data in the same step. Which is always nice. But what happens when you have LOTS of files? Your database gets big. Now you have to deal with big database maintenance issues, such as the trials of backing up databases that are over a terabyte. And what happens if you need outside access to the files? Such as type conversions, mass manipulation (resize all images, appy watermarks, etc)? Its much harder to do than when you have files.

The second is great for somewhat large numbers of files. You can store them on NAS devices, back them up incrementally, keep your database small, etc etc. But then, when you have LOTS of files, you start running into limitations in the file system. And if you spread them over the network, you get latency issues, user rights issues, etc. Also, I take pity on you if your network gets rearranged. Now you have to run massive updates on the database to change your file locations, and I pity you if something screws up.

Then there's the hybrid option. Its almost perfect--you can get your files via your query, yet your database isn't massive. Does this solve all your problems? Probably not. Your database isn't portable anymore; you're locked to a particular DBMS. And this stuff isn't mature yet, so you get to enjoy the teething process. And who says this solves all the different issues?

Fact is, there is no "best" way. You just have to determine your requirements, make the best choice depending on them, and then suck it up when you figure out you did the wrong thing.