AWK Split File every n-th Row but group IDs together

Using any awk in any shell on every Unix box:

$ cat tst.awk/^@/ {    hdr = hdr $0 ORS    next}( (++numLines) % 5 ) == 1 {    if ( $0 == prev ) {        --numLines    }    else {        close(out)        out = FILENAME "." (++numBlocks)        printf "%s", hdr > out        numLines = 1    }}{    print > out    prev = $0}

$ awk -f tst.awk text.txt

$ head text.txt.*==> text.txt.1 <==@something@somethingelse@anotherthing122333==> text.txt.2 <==@something@somethingelse@anotherthing44455==> text.txt.3 <==@something@somethingelse@anotherthing6778999==> text.txt.4 <==@something@somethingelse@anotherthing1011111114==> text.txt.5 <==@something@somethingelse@anotherthing15

unix awk split

Nice question.
With your example, this would work:

awk 'BEGIN{i=1;}/\@/{header= header == ""? $0 : header "\n" $0; next}c>=5 && $1!=prev{i++;c=0;}{if(!c) print header>FILENAME"."i; print > FILENAME"."i;c++;prev=$1;}' test.txt

You need strip the header out, and set a counter (c in above), NR is just current line number of the input, it will not meet your needs when the actual lines are not times of 5.

Break it up and improve a tiny bit:

awk 'BEGIN{i=1;}  /\@/{header= header == ""? $0 : header ORS $0; next}  c>=5 && $1!=prev{i++;c=0;}  !c {print header>FILENAME"."i;}  {print > FILENAME"."i;c++;prev=$1;}  ' test.txt

To solve the potential problems mentioned in the comment:

awk 'BEGIN{i=1}  /\@/{header= header == ""? $0 : header ORS $0; next}  c>=5 && $1!=prev{i++;c=0}  !c {close(f);f=(FILENAME"."i);print header>f}  {print>f;c++;prev=$1}  ' test.txt

or check Ed's answer which is more precise and different platforms/versions compatible.

unix awk split

With your shown samples, please try following awk program. Written and tested in GNU awk.

awk 'BEGIN{  outFile="test.txt"  count=1}/@/{  header=(header?header ORS:"")$0  next}{  arr[$0]=(arr[$0]?arr[$0] ORS:"")$0}END{  PROCINFO["sorted_in"] = "@ind_num_asc"  print header > (outFile count)  for(i in arr){    num=split(arr[i],arr2,"\n")    print arr[i] > (outFile count)    len+=num    if(len>=5){ len=0 }    if(len==0){      close(outFile count)      count++      print header > (outFile count)    }  }}'  Input_file

CodeHunter

AWK Split File every n-th Row but group IDs together

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last