rearrange data on a file (NOT a direct transpose)
Using GNU awk for true multi-dimensional arrays:
$ cat tst.awkNR==1 { split($0,hdr); next }{ idx = (NR-2)%4+1 val[idx][0] split($0,val[idx])}NR==5 { printf "%s", hdr[1] for (j=3; j in hdr; j++) { for (i=1; i<=idx; i++) { printf "%s%s", OFS, hdr[j]"/"val[i][2] } } print ""}idx==4 { printf "%s", $1 for (j=3; j<=NF; j++) { for (i=1; i<=idx; i++) { printf "%s%s", OFS, val[i][j] } } print ""}$ awk -f tst.awk fileNAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4JOHN 10,00 11,00 12,00 13,00 19,00 23,00 33,00 34,00 65,00 64,00 34,00 32,00PAUL 14,00 15,00 16,00 20,00 43,00 90,00 32,00 45,00 23,00 34,00 56,00 65,00RINGO 25,00 24,00 31,00 75,00 60,00 30,00 20,00 10,00 87,00 23,00 54,00 12,00
$ cat foo.awkNR==1{next} # skip the header{ printf "%s", (b!=$1?(b==""?"":ORS) $1:"") OFS; b=$1 # print name or OFS} { printf "%s", $3 OFS $4 OFS $5 # print fields} END {print ""} # finish with ORS
Spin it:
$ awk -f foo.awk foo.txtJOHN 10,00 19,00 65,00 11,00 23,00 64,00 12,00 33,00 34,00 13,00 34,00 32,00PAUL 14,00 43,00 23,00 15,00 90,00 34,00 16,00 32,00 56,00 20,00 45,00 65,00RINGO 25,00 60,00 87,00 24,00 30,00 23,00 31,00 20,00 54,00 75,00 10,00 12,00
My proposal in python (sed is clearly not up to the task, maybe awk, but that's a challenge). I have hardcoded the 4*3 aspect of the "matrix". Maybe something more elegant could have been done:
import collectionsnb_year = 4d = collections.defaultdict(lambda: [None]*nb_year*3) with open("input_file") as infile: next(infile) # skip title for l in infile: # read line by line fields = l.strip().split() # extract blank-separated fields if len(fields)<3: continue # protection against "accidental" blank lines target = d[fields[0]] # name offset = int(fields[1][1])-1 # extract year index 1 to 4 for i,f in enumerate(fields[2:]): # interleaved matrix fill target[offset+i*nb_year] = f # fill "matrix" print("NAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4") for k,v in sorted(d.items()): print("{} {}".format(k," ".join(v)))
output
NAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4JOHN 10,00 11,00 12,00 13,00 19,00 23,00 33,00 34,00 65,00 64,00 34,00 32,00PAUL 14,00 15,00 16,00 20,00 43,00 90,00 32,00 45,00 23,00 34,00 56,00 65,00RINGO 25,00 24,00 31,00 75,00 60,00 30,00 20,00 10,00 87,00 23,00 54,00 12,00