Delete n1 previous lines and n2 lines following with respect to a line containing a pattern Delete n1 previous lines and n2 lines following with respect to a line containing a pattern bash bash

Delete n1 previous lines and n2 lines following with respect to a line containing a pattern


One way using sed, assuming that the patterns are not close enough each other:

Content of script.sed:

## If line doesn't match the pattern.../pattern/ ! {     ## Append line to 'hold space'.    H       ## Copy content of 'hold space' to 'pattern space' to work with it.    g       ## If there are more than 5 lines saved, print and remove the first    ## one. It's like a FIFO.    /\(\n[^\n]*\)\{6\}/ {        ## Delete the first '\n' automatically added by previous 'H' command.        s/^\n//        ## Print until first '\n'.        P           ## Delete data printed just before.        s/[^\n]*//        ## Save updated content to 'hold space'.        h       } ### Added to fix an error pointed out by potong in comments.### =======================================================    ## If last line, print lines left in 'hold space'.    $ {         x           s/^\n//        p       } ### =======================================================    ## Read next line.    b   }## If line matches the pattern.../pattern/ {    ## Remove all content of 'hold space'. It has the five previous    ## lines, which won't be printed.    x       s/^.*$//    x       ## Read next four lines and append them to 'pattern space'.    N ; N ; N ; N     ## Delete all.    s/^.*$//}

Run like:

sed -nf script.sed infile


A solution using awk:

awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }     nlines == 5 { print lines[NR%5]; nlines-- }     lines2del == 0 { lines[NR%5] = $0; nlines++ }     lines2del > 0 { lines2del-- }     END { while (nlines-- > 0)  { print lines[(NR - nlines) % 5] } }' fv.out

Update:

This is the script explained:

  • I remember the last 5 lines in the array lines using rotatory indexes (NR%5; NR is the record number; in this case lines).
  • If I find the pattern in the current line ($0 ~ "XXXX; $0 being the current record: in this case a line; and ~ being the Extended Regular Expression match operator), I reset the number of lines read and note that I have 5 lines to delete (including the current line).
  • If I already read 5 lines, I print the current line.
  • If I do not have lines to delete (which is also true if I had read 5 lines, I put the current line in the buffer and increment the number of lines. Note how the number of lines is decremented and then incremented if a line is printed.
  • If lines need to be deleted, I do not print anything and decrement the number of lines to delete.
  • At the end of the script, I print all the lines that are in the array.

My original version of the script was the following, but I ended up optimizing it to the above version:

awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; }     lines2del == 0 && nlines == 5 { print lines[NR%5]; lines[NR%5] }     lines2del == 0 && nlines < 5 { lines[NR%5] = $0; nlines++ }     lines2del > 0 { lines2del-- }     END { while (nlines-- > 0)  { print lines[(NR - nlines) % 5] } }' fv.out

awk is a great tool ! I strongly recommend that you find a tutorial on the net and read it. One important thing: awk works with Extended Regular Expressions (ERE). Their syntax is a little different from Standard Regular Expression (RE) used in sed, but all that can be done with RE can be done with ERE.


The idea is to read 5 lines without printing them. If you find the pattern, delete the unprinted lines and the 4 lines bellow. If you do not find the pattern, remember the current line and print the 1st unprinted line. At the end, print what is unprinted.

sed -n -e '/XXXX/,+4{x;s/.*//;x;d}' -e '1,5H' -e '6,${H;g;s/\n//;P;s/[^\n]*//;h}' -e '${g;s/\n//;p;d}' fv.out

Of course, this only works if you have one occurrence of your pattern in the file. If you have many, you need to read 5 new lines after finding your pattern, and it gets complicated if you again have your pattern in those lines. In this case, I think sed is not the right tool.