JQ: count number of objects per group, for a subset of input JQ: count number of objects per group, for a subset of input json json

JQ: count number of objects per group, for a subset of input


Assuming your input file is:

cat file{"modified":"Mon Sep 25 14:20:00 +0000 2018","object_id":1,"class_id":"C"}{"modified":"Mon Sep 25 14:23:00 +0000 2018","object_id":2,"class_id":"A"}{"modified":"Mon Sep 25 14:21:00 +0000 2018","object_id":3,"class_id":"B"}{"modified":"Mon Sep 25 14:22:00 +0000 2018","object_id":4,"class_id":"A"}

You can try the following:

<file jq -s '   [ .[] |      (.modified |= (strptime("%a %b %d %H:%M:%S +0000 %Y") | mktime))    ] |    sort_by(.modified) |              # sort using converted time   .[-3:] |                          # take the last 3   group_by(.class_id) |             # group ids together   .[] |                                {(.[0].class_id): length}'        # create the object using the id name and table length{   "A": 2}{  "B": 1}

Note that on my system, the option %z of strptime isn't working. So I replaced it with +0000 (which is anyway not used in the time conversion).


The accepted answer uses the -s command-line option, which requires that the entire input data fit into memory. For very large data sets, this may not be possible.

Since the release of jq 1.5 (in 2015), an alternative is available. Here, therefore, a memory-efficient solution using inputs is presented.

The key functionality is encapsulated in the following jq filter:

# Return an array of n items as if by # [stream] | sort_by(filter) | .[-n:]def maxn(stream; filter; n):  def maxn:    sort_by(filter) | .[-n :];  reduce stream as $x ([]; . + [$x] | maxn);

A solution to the problem at hand (with N==3) can now be obtained in just three additional lines:

maxn(inputs; .modified | strptime("%a %b %d %H:%M:%S +0000 %Y") | mktime; 3)| group_by(.class_id)[]| {(.[0].class_id): length}

Note that this assumes the -n command-line option is used. If it is omitted, the first line of input will be ignored.

Large N

For large datasets, if the value of N is also large, it would probably be worth the trouble to tweak the above to use jq’s support fot binary search (bsearch) instead of sort_by. It might similarly be worthwhile cacheing the mktime values.