Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices database database

Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices


Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

pros: - - simple, intuitive.

  • takes advantage of years of tuning and caching algorithms
  • easy backup, potentially easy clustering

things to think about:

  • richness of metadata - what types ofdata does it store, how does it letyou query them, can you havehierarchal or multivalued attributes

  • speed of querying metadata - not allfs's are particularly well optimizedwith anything other than size, dates.

  • inability to join queries (thoughthat's pretty much common to NoSQL)

  • inefficient storage usage (unless the filesystem performs block suballocation,you'll typically blow 4-16K per itemstored regardless of size)

  • May not have the kind of caching algorithmyou want for it's directory structure
  • tends to be less tunable, etc.
  • backup solutions may have troubledepending on how you store things -too deep, too many items per node,etc - which might obviate an obviousadvantage of such a structure.locking for a LOCAL filesystem workspretty well of course if you call theright routines, but not necessarilyfor a network base fileesytem (thoseproblems have been solved in variousways, but it's certainly a designissue)


One thing you may want to take into consideration is Oracle's BFILE datatype, which is a pointer to a file on disk. Perhaps that might be the best of both worlds? Microsoft SQL server doesn't seem to offer this capability.


There's a big example of an implementation at Amazon's S3.

http://aws.amazon.com/s3/

This sort of implementation is where a lot of companies are moving towards, because it scales fundamentally better than a relational database can. The approach is simple, and it works, and for some problems, it's a great solution. In the case of Amazon's S3, it's particularly nice for cloud storage, if you don't want to have to worry about the hassles of storing the data yourself.