What are the best ways to store Graphs in persistent storage What are the best ways to store Graphs in persistent storage database database

What are the best ways to store Graphs in persistent storage


Graph Databases:

  1. HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
  2. InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
  3. vertexdb: a high performance graph database server that supports automatic garbage collection.

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

  1. WebGraph is a framework to study the web graph.From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
  2. Dex is a high performance library to manage very large graphs or networks.
  3. This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the techniquethey use is "memory-mapped I/O, disk-based linear-hashing".


Disclaimer: I am speaking form the graph analysis standpoint.

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.


There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).

At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.