Neo4j output format Neo4j output format database database

Neo4j output format


There is more to it.

First of all as you said tabular results from queries are really commonplace and needed to integrate with other systems and databases.

Secondly oftentimes you don't actually return raw graph data from your queries, but aggregated, projected, sliced, extracted information out of your graph. So the relationships to the original graph data are already lost in most of the results of queries I see being used.

The only time that people need / use the raw graph data is when to export subgraph-data from the database as a query result.

The problem of doing that as a de-duplicated graph is that the db has to fetch all the result data data in memory first to deduplicate, extract the needed relationships etc.

Normally it just streams the data out as it comes and uses little memory with that.

Even if you use geoff, graphml or the gephi format you have to keep all the data in memory to deduplicate the results (which are returned as paths with potential duplicate nodes and relationships).

There is also the questions on what you want to include in your output? Just the nodes and rels returned? Or additionally all the other rels between the nodes that you return? Or all the rels of the returned nodes (but then you also have to include the end-nodes of those relationships).

You could just return the paths that you get, which are unique paths through the graph in themselves:

MATCH p = (n)-[r]-(m)WHERE ...RETURN p

Another way to address this problem in Neo4j is to use sensible aggregations.

E.g. what you can do is to use collect to aggregate data per node (i.e. kind of subgraphs)

MATCH (n)-[r]-(m)WHERE ...RETURN n, collect([r,type(r),m])

or use the new literal map syntax (Neo4j 2.0)

MATCH (n)-[r]-(m)WHERE ...RETURN {node: n, neighbours: collect({ rel: r, type: type(r), node: m})}

The dump command of the neo4j-shell uses the approach of pulling the cypher results into an in-memory structure, enriching it and then outputting it as cypher create statement(s).

A similar approach can be used for other output formats too if you need it. But so far there hasn't been the need.

If you really need this functionality it makes sense to write a server-extension that uses cypher for query specification, but doesn't allow return statements. Instead you would always use RETURN *, aggregate the data into an in-memory structure (SubGraph in the org.neo4j.cypher packages). And then render it as a suitable format (e.g. JSON or one of those listed above).

These could be a starting points for that:

There are also other efforts, like GraphJSON from GraphAlchemist: https://github.com/GraphAlchemist/GraphJSON

And the d3 json format is also pretty useful. We use it in the neo4j console (console.neo4j.org) to return the graph visualization data that is then consumed by d3 directly.


I've been working with neo4j for a while now and I can tell you that if you are concerned about memory and performances you should drop cypher at all, and use indexes and the other graph-traversal methods instead (e.g. retrieve all the relationships of a certain type from or to a start node, and then iterate over the found nodes).

As the documentation says, Cypher is not intended for in-app usage, but more as a administration tool. Furthermore, in production-scale environments, it is VERY easy to crash the server by running the wrong query.

In second place, there is no mention in the docs of an API method to retrieve the output as a graph-like structure. You will have to process the output of the query and build it.

That said, in the example you give you say that there is only one A and that you know it before the data is fetched, so you don't need to do:

MATCH (A)-->(B) RETURN A, B 

but just

MATCH (A)-->(B) RETURN B

(you don't need to receive A three times because you already know these are the nodes connected with A)

or better (if you need info about the relationships) something like

MATCH (A)-[r]->(B) RETURN r