How to use awk/shell scripting to do SQL Where clause and SQL join like filtering and merging of rows and columns? How to use awk/shell scripting to do SQL Where clause and SQL join like filtering and merging of rows and columns? shell shell

How to use awk/shell scripting to do SQL Where clause and SQL join like filtering and merging of rows and columns?


There's a very simple database management program named "unity" available for UNIX at http://open-innovation.alcatel-lucent.com/projects/unity/. In unity you have 2 main files:

  1. a data file named whatever you like, e.g. "foo", and
  2. a descriptor file with the same base name as the data file but prefixed with "D" for Descriptor, e.g. "Dfoo"

These are both simple text files that you can edit with whatever editor you like (or it has it's own database-aware editor named uedit).

Dfoo would have one row for each column in foo describing attributes of the data that appears in that column in foo and it's separator from the next column.

foo would have the data.

It's been a while since I used unity in the raw (I have scripts that use it behind the scenes) but for the first table you show above:

----------------------------------------Col1 | Col2 | Col3 | Col4 | Col5 | Col6---------------------------------------- A   |  H1  | 123  | abcd | a1   | b1   ---------------------------------------- B   |  H1  | 124  | abcd | a2   | b1   ---------------------------------------- C   |  H2  | 127  | abd  | a3   | b1   ---------------------------------------- D   |  H1  | 128  | acd  | a4   | b1   ----------------------------------------

the Descriptor file (Dfoo) would be something like:

Col1 | 5cCol2 | 6cCol3 | 6cCol4 | 6cCol5 | 6cCol6 \n 6c

and the data file (foo) would be:

A|H1|123|abcd|a1|b1B|H1|124|abcd|a2|b1C|H2|127|abd|a3|b1D|H1|128|acd|a4|b1

You can then run unity commands like:

uprint -d- foo

to print the table with rows separated by lines of underscores and cells of the width specified in your descriptor file (e.g. 6c = 6 characters Centered while 6r = 6 characters Right-justified).

uselect Col2 from foo where Col3 leq abd

to select the values from column Col2 where the corresponding value in Col3 is Lexically EQual to the string "abd".

There are unity commands to let you do joins, merges, inserts, deletes, etc. - basically whatever you'd expect to be able to do with a relational database but it's all just based on simple text files.

In unity you can specify different separators between each column but if all of the separators are the same (except the final one which will be '\n') then you can run awk scripts on the file too just by using awk -F with the separator.

A couple of other toolsets you could look at that might be easier to install but probably don't have as much functionality as unity (which has been around since the 1970s!) would be recutils (from GNU) and csvDB so your full homework/research list is:

Note that recutils has rec2csv and csv2rec tools for converting between the recutils and CSV formats.


For a pipe delimited file:

awk '$2=="H1"{y="";x=$4;for(i=1;i<=length($4);i++)y=y?y","substr(x,i,1):substr(x,i,1);print $1,$4,$5,$6,y;}' FS="|" OFS="|" file

For a tab-delimited file, leave the FS empty:

awk '$2=="H1"{y="";x=$4;for(i=1;i<=length($4);i++)y=y?y","substr(x,i,1):substr(x,i,1);print $1,$4,$5,$6,y;}'  OFS="\t" file