Split access.log file by dates using command line tools
One way using awk
:
awk 'BEGIN { split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ", months, " ") for (a = 1; a <= 12; a++) m[months[a]] = sprintf("%02d", a)}{ split($4,array,"[:/]") year = array[3] month = m[array[2]] print > FILENAME"-"year"_"month".txt"}' incendiary.ws-2009
This will output files like:
incendiary.ws-2010-2010_04.txtincendiary.ws-2010-2010_05.txtincendiary.ws-2010-2010_06.txtincendiary.ws-2010-2010_07.txt
Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.
Original inspiration: "How to split existing apache logfile by month?"
Pure bash, making one pass through the access log:
while read; do [[ $REPLY =~ \[(..)/(...)/(....): ]] d=${BASH_REMATCH[1]} m=${BASH_REMATCH[2]} y=${BASH_REMATCH[3]} #printf -v fname "access.apache.%s_%s_%s.log" ${BASH_REMATCH[@]:1:3} printf -v fname "access.apache.%s_%s_%s.log" $y $m $d echo "$REPLY" >> $fnamedone < access.log
Perl came to the rescue:
cat access.log | perl -n -e'm@\[(\d{1,2})/(\w{3})/(\d{4}):@; open(LOG, ">>access.apache.$3_$2_$1.log"); print LOG $_;'
Well, it's not exactly "standard" manipulation program, but it's made for text manipulation nevertheless.
I've also changed order of arguments in file name, so that files are named like access.apache.yyyy_mon_dd.log for easier sorting.