rearrange data on a file (NOT a direct transpose) rearrange data on a file (NOT a direct transpose) shell shell

rearrange data on a file (NOT a direct transpose)


Using GNU awk for true multi-dimensional arrays:

$ cat tst.awkNR==1 { split($0,hdr); next }{    idx = (NR-2)%4+1    val[idx][0]    split($0,val[idx])}NR==5 {    printf "%s", hdr[1]    for (j=3; j in hdr; j++) {        for (i=1; i<=idx; i++) {            printf "%s%s", OFS, hdr[j]"/"val[i][2]        }    }    print ""}idx==4 {    printf "%s", $1    for (j=3; j<=NF; j++) {        for (i=1; i<=idx; i++) {            printf "%s%s", OFS, val[i][j]        }    }    print ""}$ awk -f tst.awk fileNAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4JOHN 10,00 11,00 12,00 13,00 19,00 23,00 33,00 34,00 65,00 64,00 34,00 32,00PAUL 14,00 15,00 16,00 20,00 43,00 90,00 32,00 45,00 23,00 34,00 56,00 65,00RINGO 25,00 24,00 31,00 75,00 60,00 30,00 20,00 10,00 87,00 23,00 54,00 12,00


$ cat foo.awkNR==1{next}                                              # skip the header{    printf "%s", (b!=$1?(b==""?"":ORS) $1:"") OFS; b=$1  # print name or OFS} {    printf "%s", $3 OFS $4 OFS $5                        # print fields} END {print ""}                                           # finish with ORS

Spin it:

$ awk -f foo.awk foo.txtJOHN 10,00 19,00 65,00 11,00 23,00 64,00 12,00 33,00 34,00 13,00 34,00 32,00PAUL 14,00 43,00 23,00 15,00 90,00 34,00 16,00 32,00 56,00 20,00 45,00 65,00RINGO 25,00 60,00 87,00 24,00 30,00 23,00 31,00 20,00 54,00 75,00 10,00 12,00


My proposal in python (sed is clearly not up to the task, maybe awk, but that's a challenge). I have hardcoded the 4*3 aspect of the "matrix". Maybe something more elegant could have been done:

import collectionsnb_year = 4d = collections.defaultdict(lambda: [None]*nb_year*3)    with open("input_file") as infile:    next(infile)  # skip title    for l in infile:  # read line by line        fields = l.strip().split()  # extract blank-separated fields        if len(fields)<3: continue  # protection against "accidental" blank lines        target = d[fields[0]]       # name        offset = int(fields[1][1])-1    # extract year index 1 to 4        for i,f in enumerate(fields[2:]):  # interleaved matrix fill            target[offset+i*nb_year] = f      # fill "matrix"    print("NAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4")    for k,v in sorted(d.items()):        print("{} {}".format(k," ".join(v)))

output

NAME A/Y1 A/Y2 A/Y3 A/Y4 B/Y1 B/Y2 B/Y3 B/Y4 C/Y1 C/Y2 C/Y3 C/Y4JOHN 10,00 11,00 12,00 13,00 19,00 23,00 33,00 34,00 65,00 64,00 34,00 32,00PAUL 14,00 15,00 16,00 20,00 43,00 90,00 32,00 45,00 23,00 34,00 56,00 65,00RINGO 25,00 24,00 31,00 75,00 60,00 30,00 20,00 10,00 87,00 23,00 54,00 12,00