Carving data from log file
Here's a Bash solution for version 4 and above, using an associative array:
#!/bin/bash# Assoc array to hold data.declare -A data# Log file ( the input file ).logfile=$1# Output file.output_file=$2# Print column names for required values.printf '%-20s %-10s %-10s %-10s\n' time latency99 requests errors > "$output_file"# Iterate over each line in $logfilewhile read -ra arr; do # Insert keys and values into 'data' array. for i in "${arr[@]}"; do data["${i%=*}"]="${i#*=}" done # Convert time to GMT+2 gmt2_time=$(TZ=GMT+2 date -d "@${data[time]}" '+%T') # Print results to stdout. printf '%-20s %-10s %-10s %-10s\n' "$gmt2_time" "${data[latency99]%ms}" "${data[requests]}" "${data[errors]}" >> "$output_file"done < "$logfile"
As you can see, the script accepts two arguments. The first one is the file name of the logfile, and the second is the output file to which parsed data will be inserted line by line for each row in the logfile.
Please notice that I used GMT+2
as the value to the TZ
variable.Use the exact area as the value instead. Like, for example, TZ="Europe/Berlin"
.You might want to use the tool tzselect
to know the correct string value of your area.
In order to test it, I created the following logfile, containing 3 different rows of input:
time=1260196536.242325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=10ms requests=100 option1=0 option2=0 errors=1 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278time=1460246536.244325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=20ms requests=200 option1=0 option2=0 errors=2 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278time=1260236536.147325 latency=3:6:7:9:16:(8)ms latency95=11ms latency99=30ms requests=300 option1=0 option2=0 errors=3 throughput=480rps ql=1 rr=0.00% cr=0.00% accRequests=101468 accOption1=0 accOption2=0 accLatency=2:6:7:8:3998:(31)ms accLatency95=11ms accLatency99=649ms accOpenQueuing=1664 accErrors=278
Let's run the test ( script name is sof ):
$ ./sof logfile parsed_logfile$ cat parsed_logfiletime latency99 requests errors 12:35:36 10 100 1 22:02:16 20 200 2 23:42:16 30 300 3
EDIT:
According to OP request as can be seen in the comments, and as discussed further in chat, I edited the script to include the following features:
- Remove
ms
suffix fromlatency99
's value. - Read input from a logfile, line by line, parse and output results to a selected file.
- Include column names only in the first row of output.
- Convert the time value to GMT+2.
Here is a awk script for you. Say the logfile is mc.log
and the script is saved as mc.awk
, you would run it like this: awk -f mc.awk mc.log
with GNU awk.
mc.awk:
BEGIN{ OFS="\t" # some "" to align header and values in output print "time", "", "latency99", "requests", "errors" } function getVal( str) { # strip leading "key=" and trailing "ms" from str gsub(/^.*=/, "", str) gsub(/ms$/, "", str) return str } function fmtTime( timeStamp ){ val=getVal( timeStamp ) return strftime( "%H:%M:%S", val) } { # some "" to align header and values in output print fmtTime($1), getVal($4), "", getVal($5), "", getVal($8) }
Here's an awk
version (not GNU). Converting the date would require a call to an external program:
#!/usr/bin/awk -fBEGIN { FS="([[:alpha:]]+)?[[:blank:]]*[[:alnum:]]+=" OFS="\t" print "time", "latency99", "requests", "errors"}{ print $2, $5, $6, $9 }