Is there a way to convert CSV columns into hierarchical relationships? Is there a way to convert CSV columns into hierarchical relationships? python python

Is there a way to convert CSV columns into hierarchical relationships?


For creating the exact nested object you want we'll use a mix of pure JavaScript and a D3 method named d3.stratify. However, have in mind that 7 million rows (please see the post scriptum below) is a lot to compute.

It's very important to mention that, for this proposed solution, you'll have to separate the Kingdoms in different data arrays (for instance, using Array.prototype.filter). This restriction occurs because we need a root node, and in the Linnaean taxonomy there is no relationship between Kingdoms (unless you create "Domain" as a top rank, which will be the root for all eukaryotes, but then you'll have the same problem for Archaea and Bacteria).

So, suppose you have this CSV (I added some more rows) with just one Kingdom:

RecordID,kingdom,phylum,class,order,family,genus,species1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis latrans3,Animalia,Chordata,Mammalia,Cetacea,Delphinidae,Tursiops,Tursiops truncatus1,Animalia,Chordata,Mammalia,Primates,Hominidae,Pan,Pan paniscus

Based on that CSV, we'll create an array here named tableOfRelationships which, as the name implies, has the relationships between the ranks:

const data = d3.csvParse(csv);const taxonomicRanks = data.columns.filter(d => d !== "RecordID");const tableOfRelationships = [];data.forEach(row => {  taxonomicRanks.forEach((d, i) => {    if (!tableOfRelationships.find(e => e.name === row[d])) tableOfRelationships.push({      name: row[d],      parent: row[taxonomicRanks[i - 1]] || null    })  })});

For the data above, this is the tableOfRelationships:

+---------+----------------------+---------------+| (Index) |         name         |    parent     |+---------+----------------------+---------------+|       0 | "Animalia"           | null          ||       1 | "Chordata"           | "Animalia"    ||       2 | "Mammalia"           | "Chordata"    ||       3 | "Primates"           | "Mammalia"    ||       4 | "Hominidae"          | "Primates"    ||       5 | "Homo"               | "Hominidae"   ||       6 | "Homo sapiens"       | "Homo"        ||       7 | "Carnivora"          | "Mammalia"    ||       8 | "Canidae"            | "Carnivora"   ||       9 | "Canis"              | "Canidae"     ||      10 | "Canis latrans"      | "Canis"       ||      11 | "Cetacea"            | "Mammalia"    ||      12 | "Delphinidae"        | "Cetacea"     ||      13 | "Tursiops"           | "Delphinidae" ||      14 | "Tursiops truncatus" | "Tursiops"    ||      15 | "Pan"                | "Hominidae"   ||      16 | "Pan paniscus"       | "Pan"         |+---------+----------------------+---------------+

Have a look at null as the parent of Animalia: that's why I told you that you need to separate your dataset by Kingdoms, there can be only one null value in the whole table.

Finally, based on that table, we create the hierarchy using d3.stratify():

const stratify = d3.stratify()    .id(function(d) { return d.name; })    .parentId(function(d) { return d.parent; });const hierarchicalData = stratify(tableOfRelationships);

And here is the demo. Open your browser's console (the snippet's one is not very good for this task) and inspect the several levels (children) of the object: