Split access.log file by dates using command line tools Split access.log file by dates using command line tools bash bash

Split access.log file by dates using command line tools


One way using awk:

awk 'BEGIN {    split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ")    for (a = 1; a <= 12; a++)        m[months[a]] = sprintf("%02d", a)}{    split($4,array,"[:/]")    year = array[3]    month = m[array[2]]    print > FILENAME"-"year"_"month".txt"}' incendiary.ws-2009

This will output files like:

incendiary.ws-2010-2010_04.txtincendiary.ws-2010-2010_05.txtincendiary.ws-2010-2010_06.txtincendiary.ws-2010-2010_07.txt

Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.

Original inspiration: "How to split existing apache logfile by month?"


Pure bash, making one pass through the access log:

while read; do    [[ $REPLY =~ \[(..)/(...)/(....): ]]    d=${BASH_REMATCH[1]}    m=${BASH_REMATCH[2]}    y=${BASH_REMATCH[3]}    #printf -v fname "access.apache.%s_%s_%s.log" ${BASH_REMATCH[@]:1:3}    printf -v fname "access.apache.%s_%s_%s.log" $y $m $d    echo "$REPLY" >> $fnamedone < access.log


Perl came to the rescue:

cat access.log | perl -n -e'm@\[(\d{1,2})/(\w{3})/(\d{4}):@; open(LOG, ">>access.apache.$3_$2_$1.log"); print LOG $_;'

Well, it's not exactly "standard" manipulation program, but it's made for text manipulation nevertheless.

I've also changed order of arguments in file name, so that files are named like access.apache.yyyy_mon_dd.log for easier sorting.