jq: group and key by property
As a further example of @replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):
{ "took" : 228, "timed_out" : false, "hits" : { "total" : { "value" : 2806, "relation" : "eq" }, "hits" : [ { "_source" : { "agent" : { "name" : "100360xx" }, "data" : { "vulnerability" : { "severity" : "High", "package" : { "condition" : "less than 78.0", "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)" } } } } }, { "_source" : { "agent" : { "name" : "100360xx" }, "data" : { "vulnerability" : { "severity" : "High", "package" : { "condition" : "less than 78.0", "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)" } } } } }, ...
Here is the jq
filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:
jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'
Here is an excerpt of the output produced by the filter:
{ "100360xx": [ "Mozilla Firefox 68.11.0 ESR (x64 en-US)", "VLC media player", "Windows 10" ]}{ "WIN-KD5C4xxx": [ "Windows Server 2019" ]}{ "fridxxx": [ "java-1.8.0-openjdk", "kernel", "kernel-headers", "kernel-tools", "kernel-tools-libs", "python-perf" ]}{ "mcd-xxx-xxx": [ "dbus", "fribidi", "gnupg2", "graphite2", ...
The accepted answer doesn't produce valid json, but:
{ "name1": [ "1.1.1.1", "1.1.1.2" ]}{ "name2": [ "1.1.1.3", "1.1.1.4" ]}
name1
as well as name2
are valid json objects, but the output as a whole isn't.
The following jq
statement results in the desired output as specified in the question:
group_by(.component) | map({ key: (.[0].component), value: [.[] | .ip] }) | from_entries
Output:
{ "name1": [ "1.1.1.1", "1.1.1.2" ], "name2": [ "1.1.1.3", "1.1.1.4" ]}
Suggestions for simpler approaches are welcome.
If human readability is preferred over valid json, I'd suggest something like ...
jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'
... which results in ...
IPs for name1: ["1.1.1.1","1.1.1.2"]IPs for name2: ["1.1.1.3","1.1.1.4"]