Using a Filesystem (Not a Database!) for Schemaless Data - Best Practices
Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:
pros: - - simple, intuitive.
- takes advantage of years of tuning and caching algorithms
- easy backup, potentially easy clustering
things to think about:
richness of metadata - what types ofdata does it store, how does it letyou query them, can you havehierarchal or multivalued attributes
speed of querying metadata - not allfs's are particularly well optimizedwith anything other than size, dates.
inability to join queries (thoughthat's pretty much common to NoSQL)
inefficient storage usage (unless the filesystem performs block suballocation,you'll typically blow 4-16K per itemstored regardless of size)
- May not have the kind of caching algorithmyou want for it's directory structure
- tends to be less tunable, etc.
- backup solutions may have troubledepending on how you store things -too deep, too many items per node,etc - which might obviate an obviousadvantage of such a structure.locking for a LOCAL filesystem workspretty well of course if you call theright routines, but not necessarilyfor a network base fileesytem (thoseproblems have been solved in variousways, but it's certainly a designissue)
One thing you may want to take into consideration is Oracle's BFILE datatype, which is a pointer to a file on disk. Perhaps that might be the best of both worlds? Microsoft SQL server doesn't seem to offer this capability.
There's a big example of an implementation at Amazon's S3.
This sort of implementation is where a lot of companies are moving towards, because it scales fundamentally better than a relational database can. The approach is simple, and it works, and for some problems, it's a great solution. In the case of Amazon's S3, it's particularly nice for cloud storage, if you don't want to have to worry about the hassles of storing the data yourself.