Splitting a file in linux based on content [duplicate] Splitting a file in linux based on content [duplicate] bash bash

Splitting a file in linux based on content [duplicate]


If you have a mail.txt

$ cat mail.txt<html>    mail A</html><html>    mail B</html><html>    mail C</html>

run csplit to split by <html>

$ csplit mail.txt '/^<html>$/' '{*}' - mail.txt    => input file - /^<html>$/  => pattern match every `<html>` line - {*}         => repeat the previous pattern as many times as possible

check output

$ lsmail.txt  xx00  xx01  xx02  xx03

If you want do it in awk

$ awk '/<html>/{filename=NR".txt"}; {print >filename}' mail.txt$ ls1.txt  5.txt  9.txt  mail.txt


The csplit program solves your problem elegantly:

csplit '/<!DOCTYPE.*/' $FILE


csplit is the best solution to this problem. Just thought I'd post a bash-solution to show that there is no need to go perl on this task:

#!/usr/bin/bashMAIL='mail'        # path to huge mail-file#get linenumbers for all headersline_no=$(grep -n html $MAIL | cut -d: -f1)read -a LINES<<< $line_nofile=0for i in $(seq 0 2 ${#LINES[@]}); do    start=${LINES[i]}    end=$((${LINES[i+1]}-1))    echo $start, $end    sed -n "${start},${end}p" $MAIL > ${MAIL}${file}.txt    file=$((file+1))done