Carving data from log file Carving data from log file shell shell

Carving data from log file


Here's a Bash solution for version 4 and above, using an associative array:

#!/bin/bash# Assoc array to hold data.declare -A data# Log file ( the input file ).logfile=$1# Output file.output_file=$2# Print column names for required values.printf '%-20s %-10s %-10s %-10s\n' time latency99 requests errors > "$output_file"# Iterate over each line in $logfilewhile read -ra arr; do    # Insert keys and values into 'data' array.    for i in "${arr[@]}"; do        data["${i%=*}"]="${i#*=}"    done    # Convert time to GMT+2    gmt2_time=$(TZ=GMT+2 date -d "@${data[time]}" '+%T')    # Print results to stdout.    printf '%-20s %-10s %-10s %-10s\n' "$gmt2_time" "${data[latency99]%ms}" "${data[requests]}" "${data[errors]}" >> "$output_file"done < "$logfile"

As you can see, the script accepts two arguments. The first one is the file name of the logfile, and the second is the output file to which parsed data will be inserted line by line for each row in the logfile.

Please notice that I used GMT+2 as the value to the TZ variable.Use the exact area as the value instead. Like, for example, TZ="Europe/Berlin".You might want to use the tool tzselect to know the correct string value of your area.

In order to test it, I created the following logfile, containing 3 different rows of input:

time=1260196536.242325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=10ms requests=100 option1=0 option2=0 errors=1 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278time=1460246536.244325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=20ms requests=200 option1=0 option2=0 errors=2 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278time=1260236536.147325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=30ms requests=300 option1=0 option2=0 errors=3 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278

Let's run the test ( script name is sof ):

$ ./sof logfile parsed_logfile$ cat parsed_logfiletime                 latency99  requests   errors    12:35:36             10         100        1         22:02:16             20         200        2         23:42:16             30         300        3 

EDIT:

According to OP request as can be seen in the comments, and as discussed further in chat, I edited the script to include the following features:

  • Remove ms suffix from latency99's value.
  • Read input from a logfile, line by line, parse and output results to a selected file.
  • Include column names only in the first row of output.
  • Convert the time value to GMT+2.


Here is a awk script for you. Say the logfile is mc.log and the script is saved as mc.awk, you would run it like this: awk -f mc.awk mc.log with GNU awk.

mc.awk:

    BEGIN{        OFS="\t"        # some "" to align header and values in output        print "time", "", "latency99", "requests", "errors"    }    function getVal( str) {        # strip leading "key=" and trailing "ms" from str        gsub(/^.*=/, "", str)        gsub(/ms$/, "", str)        return str    }    function fmtTime( timeStamp ){        val=getVal( timeStamp )        return strftime( "%H:%M:%S", val)    }    {        # some "" to align header and values in output        print fmtTime($1), getVal($4), "", getVal($5), "", getVal($8)    }


Here's an awk version (not GNU). Converting the date would require a call to an external program:

#!/usr/bin/awk -fBEGIN {    FS="([[:alpha:]]+)?[[:blank:]]*[[:alnum:]]+="    OFS="\t"    print "time", "latency99", "requests", "errors"}{    print $2, $5, $6, $9 }