Creating Family Tree with Neo4J Creating Family Tree with Neo4J json json

Creating Family Tree with Neo4J


Genealogical data might comply with the GEDCOM standard and include two types of nodes: Person and Union. The Person node has its identifier and the usual demographic facts. The Union nodes have a union_id and the facts about the union. In GEDCOM, Family is a third element bringing these two together. But in Neo4j, I found it suitable to also include the union_id in Person nodes. I used 5 relationships: father, mother, husband, wife and child. The family is then two parents with an inward vector and each child with an outward vector. The image illustrates this. This is very handy for visualizing connections and generating hypotheses. For example, consider the attached picture and my ancestor Edward G Campbell, the product of union 1917 where three brothers married three Vaught sisters from union 8944 and two married Gaither sisters from union 2945. Also, in the upper left, how Mahala Campbell married her step-brother John Greer Armstrong. Next to Mahala is an Elizabeth Campbell who is connected by marriage to other Campbell, but is likely directly related to them. Similarly, you can hypothesize about Rachael Jacobs in the upper right and how she might relate to the other Jacobs. Notice the query.  From the few initial nodes visualized, you can click to open others. I use bulk inserts which can populate ~30000 Person nodes and ~100,000 relationships in just over a minute. I have a small .NET function that returns the JSon from a dataview; this generic solution works with any dataview so it is scalable. I'm now working on adding other data, such as locations (lat/long), documentation (particularly that linking folks, such as a census), etc.


You might also have a look at Rik van Bruggens Blog on his family data:

Regarding your query

You already create a path pattern here: (p:Person)-[:PARENT*1..5]->(c:Person) you can assign it to a variable tree and then operate on that variable, e.g. returning the tree, or nodes(tree) or rels(tree) or operate on that collection in other ways:

MATCH tree = (p:Person)-[:PARENT*1..5]->(c:Person)WHERE c.FirstName = 'Bob'RETURN nodes(tree), rels(tree), tree, length(tree),       [n in nodes(tree) | n.FirstName] as names

See also the cypher reference card: http://neo4j.com/docs/stable/cypher-refcard and the online training http://neo4j.com/online-training to learn more about Cypher.

Don't forget to

create index on :Person(FirstName);


I'd suggest building a method to flatten out your data into an array. If they objects don't have UUIDs you would probably want to give them IDs as you flatten and then have a parent_id key for each record.

You can then run it as a set of cypher queries (either making multiple requests to the query REST API, or using the batch REST API) or alternatively dump the data to CSV and use cypher's LOAD CSV command to load the objects.

An example cypher command with params would be:

CREATE (:Member {uuid: {uuid}, name: {name}}

And then running through the list again with the parent and child IDs:

MATCH (m1:Member {uuid: {uuid1}}), (m2:Member {uuid: {uuid2}})CREATE m1<-[:PARENT]-m2

Make sure to have an index on the ID for members!