JQ: count number of objects per group, for a subset of input

Assuming your input file is:

cat file{"modified":"Mon Sep 25 14:20:00 +0000 2018","object_id":1,"class_id":"C"}{"modified":"Mon Sep 25 14:23:00 +0000 2018","object_id":2,"class_id":"A"}{"modified":"Mon Sep 25 14:21:00 +0000 2018","object_id":3,"class_id":"B"}{"modified":"Mon Sep 25 14:22:00 +0000 2018","object_id":4,"class_id":"A"}

You can try the following:

<file jq -s '   [ .[] |      (.modified |= (strptime("%a %b %d %H:%M:%S +0000 %Y") | mktime))    ] |    sort_by(.modified) |              # sort using converted time   .[-3:] |                          # take the last 3   group_by(.class_id) |             # group ids together   .[] |                                {(.[0].class_id): length}'        # create the object using the id name and table length{   "A": 2}{  "B": 1}

Note that on my system, the option %z of strptime isn't working. So I replaced it with +0000 (which is anyway not used in the time conversion).

json stream grouping jq

The accepted answer uses the -s command-line option, which requires that the entire input data fit into memory. For very large data sets, this may not be possible.

Since the release of jq 1.5 (in 2015), an alternative is available. Here, therefore, a memory-efficient solution using inputs is presented.

The key functionality is encapsulated in the following jq filter:

# Return an array of n items as if by # [stream] | sort_by(filter) | .[-n:]def maxn(stream; filter; n):  def maxn:    sort_by(filter) | .[-n :];  reduce stream as $x ([]; . + [$x] | maxn);

A solution to the problem at hand (with N==3) can now be obtained in just three additional lines:

maxn(inputs; .modified | strptime("%a %b %d %H:%M:%S +0000 %Y") | mktime; 3)| group_by(.class_id)[]| {(.[0].class_id): length}

Note that this assumes the -n command-line option is used. If it is omitted, the first line of input will be ignored.

Large N

For large datasets, if the value of N is also large, it would probably be worth the trouble to tweak the above to use jq’s support fot binary search (bsearch) instead of sort_by. It might similarly be worthwhile cacheing the mktime values.

CodeHunter

JQ: count number of objects per group, for a subset of input

Large N

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last