AWK Split File every n-th Row but group IDs together
Using any awk in any shell on every Unix box:
$ cat tst.awk/^@/ { hdr = hdr $0 ORS next}( (++numLines) % 5 ) == 1 { if ( $0 == prev ) { --numLines } else { close(out) out = FILENAME "." (++numBlocks) printf "%s", hdr > out numLines = 1 }}{ print > out prev = $0}
$ awk -f tst.awk text.txt
$ head text.txt.*==> text.txt.1 <==@something@somethingelse@anotherthing122333==> text.txt.2 <==@something@somethingelse@anotherthing44455==> text.txt.3 <==@something@somethingelse@anotherthing6778999==> text.txt.4 <==@something@somethingelse@anotherthing1011111114==> text.txt.5 <==@something@somethingelse@anotherthing15
Nice question.
With your example, this would work:
awk 'BEGIN{i=1;}/\@/{header= header == ""? $0 : header "\n" $0; next}c>=5 && $1!=prev{i++;c=0;}{if(!c) print header>FILENAME"."i; print > FILENAME"."i;c++;prev=$1;}' test.txt
You need strip the header out, and set a counter (c
in above), NR
is just current line number of the input, it will not meet your needs when the actual lines are not times of 5.
Break it up and improve a tiny bit:
awk 'BEGIN{i=1;} /\@/{header= header == ""? $0 : header ORS $0; next} c>=5 && $1!=prev{i++;c=0;} !c {print header>FILENAME"."i;} {print > FILENAME"."i;c++;prev=$1;} ' test.txt
To solve the potential problems mentioned in the comment:
awk 'BEGIN{i=1} /\@/{header= header == ""? $0 : header ORS $0; next} c>=5 && $1!=prev{i++;c=0} !c {close(f);f=(FILENAME"."i);print header>f} {print>f;c++;prev=$1} ' test.txt
or check Ed's answer which is more precise and different platforms/versions compatible.
With your shown samples, please try following awk
program. Written and tested in GNU awk
.
awk 'BEGIN{ outFile="test.txt" count=1}/@/{ header=(header?header ORS:"")$0 next}{ arr[$0]=(arr[$0]?arr[$0] ORS:"")$0}END{ PROCINFO["sorted_in"] = "@ind_num_asc" print header > (outFile count) for(i in arr){ num=split(arr[i],arr2,"\n") print arr[i] > (outFile count) len+=num if(len>=5){ len=0 } if(len==0){ close(outFile count) count++ print header > (outFile count) } }}' Input_file