Splitting a file in linux based on content [duplicate]
If you have a mail.txt
$ cat mail.txt<html> mail A</html><html> mail B</html><html> mail C</html>
run csplit
to split by <html>
$ csplit mail.txt '/^<html>$/' '{*}' - mail.txt => input file - /^<html>$/ => pattern match every `<html>` line - {*} => repeat the previous pattern as many times as possible
check output
$ lsmail.txt xx00 xx01 xx02 xx03
If you want do it in awk
$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt$ ls1.txt 5.txt 9.txt mail.txt
csplit
is the best solution to this problem. Just thought I'd post a bash-solution to show that there is no need to go perl on this task:
#!/usr/bin/bashMAIL='mail' # path to huge mail-file#get linenumbers for all headersline_no=$(grep -n html $MAIL | cut -d: -f1)read -a LINES<<< $line_nofile=0for i in $(seq 0 2 ${#LINES[@]}); do start=${LINES[i]} end=$((${LINES[i+1]}-1)) echo $start, $end sed -n "${start},${end}p" $MAIL > ${MAIL}${file}.txt file=$((file+1))done