Easiest way to join two files from the unix command line, inserting zero entries for missing keys
Write a script, whatever language you want. You will parse both files using a map/hashtable/dictionary data structure (lets just say dictionary). Each dictionary will have the first word as the key and the count (or even a string of counts) as the value. Here is some pseudocode of the algorithm:
Dict fileA, fileB; //Already parsedwhile(!fileA.isEmpty()) { string check = fileA.top().key(); int val1 = fileA.top().value(); if(fileB.contains(check)) { printToFile(check + " " + val1 + " " + fileB.getValue(check)); fileB.remove(check); } else { printToFile(check + " " + val1 + " 0"); } fileA.pop();}while(!fileB.isEmpty()) { //Know key does not exist in FileA string check = fileB.top().key(); int val1 = fileB.top().value(); printToFile(check + " 0 " + val1); fileB.pop();}
You can use any type of iterator to go through the data structure instead of pop and top. Obviously you may need to access the data a different way depending on what language/data structure you need to use.
@ninjalj's answer is much saner, but here's a shell script implementation just for fun:
exec 8< a.txtexec 9< b.txtwhile true; do if [ -z "$k1" ]; then read k1 v1 <& 8 fi if [ -z "$k2" ]; then read k2 v2 <& 9 fi if [ -z "$k1$k2" ]; then break; fi if [ "$k1" == "$k2" ]; then echo $k1 $v1 $v2 k1= k2= elif [ -n "$k1" -a "$k1" '<' "$k2" ]; then echo $k1 $v1 0 k1= else echo $k2 0 $v2 k2= fidone