Splitting a file in linux based on content [duplicate]

If you have a mail.txt

$ cat mail.txt<html>    mail A</html><html>    mail B</html><html>    mail C</html>

run csplit to split by <html>

$ csplit mail.txt '/^<html>$/' '{*}' - mail.txt    => input file - /^<html>$/  => pattern match every `<html>` line - {*}         => repeat the previous pattern as many times as possible

check output

$ lsmail.txt  xx00  xx01  xx02  xx03

If you want do it in awk

$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt$ ls1.txt  5.txt  9.txt  mail.txt

linux file bash sed awk

The csplit program solves your problem elegantly:

csplit '/<!DOCTYPE.*/' $FILE

linux file bash sed awk

csplit is the best solution to this problem. Just thought I'd post a bash-solution to show that there is no need to go perl on this task:

#!/usr/bin/bashMAIL='mail'        # path to huge mail-file#get linenumbers for all headersline_no=$(grep -n html $MAIL | cut -d: -f1)read -a LINES<<< $line_nofile=0for i in $(seq 0 2 ${#LINES[@]}); do    start=${LINES[i]}    end=$((${LINES[i+1]}-1))    echo $start, $end    sed -n "${start},${end}p" $MAIL > ${MAIL}${file}.txt    file=$((file+1))done

CodeHunter

Splitting a file in linux based on content [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last