Delete n1 previous lines and n2 lines following with respect to a line containing a pattern
One way using sed
, assuming that the patterns are not close enough each other:
Content of script.sed
:
## If line doesn't match the pattern.../pattern/ ! { ## Append line to 'hold space'. H ## Copy content of 'hold space' to 'pattern space' to work with it. g ## If there are more than 5 lines saved, print and remove the first ## one. It's like a FIFO. /\(\n[^\n]*\)\{6\}/ { ## Delete the first '\n' automatically added by previous 'H' command. s/^\n// ## Print until first '\n'. P ## Delete data printed just before. s/[^\n]*// ## Save updated content to 'hold space'. h } ### Added to fix an error pointed out by potong in comments.### ======================================================= ## If last line, print lines left in 'hold space'. $ { x s/^\n// p } ### ======================================================= ## Read next line. b }## If line matches the pattern.../pattern/ { ## Remove all content of 'hold space'. It has the five previous ## lines, which won't be printed. x s/^.*$// x ## Read next four lines and append them to 'pattern space'. N ; N ; N ; N ## Delete all. s/^.*$//}
Run like:
sed -nf script.sed infile
A solution using awk
:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; } nlines == 5 { print lines[NR%5]; nlines-- } lines2del == 0 { lines[NR%5] = $0; nlines++ } lines2del > 0 { lines2del-- } END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
Update:
This is the script explained:
- I remember the last 5 lines in the array
lines
using rotatory indexes (NR%5; NR is the record number; in this case lines). - If I find the pattern in the current line (
$0 ~ "XXXX
;$0
being the current record: in this case a line; and~
being the Extended Regular Expression match operator), I reset the number of lines read and note that I have 5 lines to delete (including the current line). - If I already read 5 lines, I print the current line.
- If I do not have lines to delete (which is also true if I had read 5 lines, I put the current line in the buffer and increment the number of lines. Note how the number of lines is decremented and then incremented if a line is printed.
- If lines need to be deleted, I do not print anything and decrement the number of lines to delete.
- At the end of the script, I print all the lines that are in the array.
My original version of the script was the following, but I ended up optimizing it to the above version:
awk '$0 ~ "XXXX" { lines2del = 5; nlines = 0; } lines2del == 0 && nlines == 5 { print lines[NR%5]; lines[NR%5] } lines2del == 0 && nlines < 5 { lines[NR%5] = $0; nlines++ } lines2del > 0 { lines2del-- } END { while (nlines-- > 0) { print lines[(NR - nlines) % 5] } }' fv.out
awk
is a great tool ! I strongly recommend that you find a tutorial on the net and read it. One important thing: awk
works with Extended Regular Expressions (ERE). Their syntax is a little different from Standard Regular Expression (RE) used in sed
, but all that can be done with RE can be done with ERE.
The idea is to read 5 lines without printing them. If you find the pattern, delete the unprinted lines and the 4 lines bellow. If you do not find the pattern, remember the current line and print the 1st unprinted line. At the end, print what is unprinted.
sed -n -e '/XXXX/,+4{x;s/.*//;x;d}' -e '1,5H' -e '6,${H;g;s/\n//;P;s/[^\n]*//;h}' -e '${g;s/\n//;p;d}' fv.out
Of course, this only works if you have one occurrence of your pattern in the file. If you have many, you need to read 5 new lines after finding your pattern, and it gets complicated if you again have your pattern in those lines. In this case, I think sed is not the right tool.