What are the use cases of Graph-based Databases (http://neo4j.org/)? [closed] What are the use cases of Graph-based Databases (http://neo4j.org/)? [closed] database database

What are the use cases of Graph-based Databases (http://neo4j.org/)? [closed]


I used a graph database in a previous job. We weren't using neo4j, it was an in-house thing built on top of Berkeley DB, but it was similar. It was used in production (it still is).

The reason we used a graph database was that the data being stored by the system and the operations the system was doing with the data were exactly the weak spot of relational databases and were exactly the strong spot of graph databases. The system needed to store collections of objects that lack a fixed schema and are linked together by relationships. To reason about the data, the system needed to do a lot of operations that would be a couple of traversals in a graph database, but that would be quite complex queries in SQL.

The main advantages of the graph model were rapid development time and flexibility. We could quickly add new functionality without impacting existing deployments. If a potential customer wanted to import some of their own data and graft it on top of our model, it could usually be done on site by the sales rep. Flexibility also helped when we were designing a new feature, saving us from trying to squeeze new data into a rigid data model.

Having a weird database let us build a lot of our other weird technologies, giving us lots of secret-sauce to distinguish our product from those of our competitors.

The main disadvantage was that we weren't using the standard relational database technology, which can be a problem when your customers are enterprisey. Our customers would ask why we couldn't just host our data on their giant Oracle clusters (our customers usually had large datacenters). One of the team actually rewrote the database layer to use Oracle (or PostgreSQL, or MySQL), but it was slightly slower than the original. At least one large enterprise even had an Oracle-only policy, but luckily Oracle bought Berkeley DB. We also had to write a lot of extra tools - we couldn't just use Crystal Reports for example.

The other disadvantage of our graph database was that we built it ourselves, which meant when we hit a problem (usually with scalability) we had to solve it ourselves. If we'd used a relational database, the vendor would have already solved the problem ten years ago.

If you're building a product for enterprisey customers and your data fits into the relational model, use a relational database if you can. If your application doesn't fit the relational model but it does fit the graph model, use a graph database. If it only fits something else, use that.

If your application doesn't need to fit into the current blub architecture, use a graph database, or CouchDB, or BigTable, or whatever fits your app and you think is cool. It might give you an advantage, and its fun to try new things.

Whatever you chose, try not to build the database engine yourself unless you really like building database engines.


We've been working with the Neo team for over a year now and have been very happy. We model scholarly artifacts and their relationships, which is spot on for a graph db, and run recommendation algorithms over the network.

If you are already working in Java, I think that modeling using Neo4j is very straight forward and it has the flattest / fastest performance for R/W of any other solutions we tried.

To be honest, I have a hard time not thinking in terms of a Graph/Network because it's so much easier than designing convoluted table structures to hold object properties and relationships.

That being said, we do store some information in MySQL simply because it's easier for the Business side to run quick SQL queries against. To perform the same functions with Neo we would need to write code that we simply don't have the bandwidth for right now. As soon as we do though, I'm moving all that data to Neo!

Good luck.


Two points:

First, on the data I've been working with the past 5 years in SQL Server, I've recently hit the scalability wall with SQL for the type of queries we need to run (nested relationhsips...you know...graphs). I've been playing around with neo4j, and my lookup times are several orders of magnitude faster when I need this kind of lookup.

Second, to the point that graph databases are outdated. Um...no. Early on, as people were trying to figure out how to store and lookup data efficiently, they created and played with graph and network style database models. These were designed so the physical model reflected the logical model, so their efficiency wasnt that great. This type of data structure was good for semi-structured data, but not as good for structured dense data. So, this IBM dude named Codd was researching efficient ways to arrange and store structured data and came up with the idea for the relational database model. And it was good, and people were happy.

What do we have here? Two tools for two different purposes. Graph database models are very good for representing semi-structured data and the relationships between entities (that may or may not exist). Relational databases are good for structured data that has a very static schema, and where join depths do not go very deep. One is good for one kind of data, the other is good for other kinds of data.

To coin the phrase, there is no Silver Bullet. Its very short sighted to say that graph database models are out of date and to use one gives up 40 years of progress. That's like saying using C is giving up all the technological progress we've gone through to get things like Java and C#. That's not true though. C is a tool that is needed for certain tasks. And Java is a tool for other tasks.