Easiest way to join two files from the unix command line, inserting zero entries for missing keys

unix join

join -o 0,1.2,2.2 -e 0 -a1 -a2 a.txt b.txt

-o 0,1.2,2.2 → output join field, then 2nd field of 1st file, then 2nd field of 2nd file.
-e 0 → Output 0 on empty input fields.
-a1 -a2 → Show all values from file 1 and file 2.

unix join

Write a script, whatever language you want. You will parse both files using a map/hashtable/dictionary data structure (lets just say dictionary). Each dictionary will have the first word as the key and the count (or even a string of counts) as the value. Here is some pseudocode of the algorithm:

Dict fileA, fileB; //Already parsedwhile(!fileA.isEmpty()) {      string check = fileA.top().key();      int val1 = fileA.top().value();      if(fileB.contains(check)) {          printToFile(check + " " + val1 + " " + fileB.getValue(check));          fileB.remove(check);      }      else {          printToFile(check + " " + val1 + " 0");      }      fileA.pop();}while(!fileB.isEmpty()) {      //Know key does not exist in FileA     string check = fileB.top().key();     int val1 = fileB.top().value();     printToFile(check + " 0 " + val1);     fileB.pop();}

You can use any type of iterator to go through the data structure instead of pop and top. Obviously you may need to access the data a different way depending on what language/data structure you need to use.

unix join

@ninjalj's answer is much saner, but here's a shell script implementation just for fun:

exec 8< a.txtexec 9< b.txtwhile true; do   if [ -z "$k1" ]; then    read k1 v1 <& 8   fi   if [ -z "$k2" ]; then    read k2 v2 <& 9   fi   if [ -z "$k1$k2" ]; then break; fi   if [ "$k1" == "$k2" ]; then    echo $k1 $v1 $v2     k1=    k2=   elif [ -n "$k1" -a "$k1" '<' "$k2" ]; then    echo $k1 $v1 0     k1=   else     echo $k2 0 $v2    k2=   fidone

CodeHunter

Easiest way to join two files from the unix command line, inserting zero entries for missing keys

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last