Calculate time based metrics(hourly) Calculate time based metrics(hourly) shell shell

Calculate time based metrics(hourly)

Given your posted input file:

$ cat file2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>

This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)

$ cat tst.awk{    date = $1    time = $2    guid = gensub(/.*;gt;([^&]+).*/,"\\1","")    print guid, date, time}

will pull out what I THINK is the information you care about:

$ gawk -f tst.awk file904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569

The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!

Extending Ed Morton's solution:

Content of script.awk

function parse_time (date, time,        newtime) {    gsub(/-/, " ", date)    gsub(/:/, " ", time)    gsub(/,.*/, "", time)    newtime = date" "time    return newtime}(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {    etime = parse_time($1, $2)    endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime    next}{    stime = parse_time($1, $2)    starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime}END {    for (x in starttime) {        for (y in endtime) {            if (x==y) {                diff = mktime(endtime[x]) - mktime(starttime[y])                diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)                print x, diff                delete starttime[x]                delete endtime[y]             }        }    }}

Test: Modified the order of guid for testing

$ cat log.file 2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>$ awk -f script.awk log.file 904c-be-4e-bbda-3e62 0h:0m:20sedfc-fr-5e-bced-3443 0h:0m:45s