jq: group and key by property jq: group and key by property json json

jq: group and key by property


I figured it out myself. I first group by .component and then just create new lists of ips that are indexed by the component of the first object of each group:

jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}'


As a further example of @replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):

{  "took" : 228,  "timed_out" : false,  "hits" : {    "total" : {      "value" : 2806,      "relation" : "eq"    },    "hits" : [      {        "_source" : {          "agent" : {            "name" : "100360xx"          },          "data" : {            "vulnerability" : {              "severity" : "High",              "package" : {                "condition" : "less than 78.0",                "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"              }            }          }        }      },      {        "_source" : {          "agent" : {            "name" : "100360xx"          },          "data" : {            "vulnerability" : {              "severity" : "High",              "package" : {                "condition" : "less than 78.0",                "name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"              }            }          }        }      },      ...

Here is the jq filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:

jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'

Here is an excerpt of the output produced by the filter:

{  "100360xx": [    "Mozilla Firefox 68.11.0 ESR (x64 en-US)",    "VLC media player",    "Windows 10"  ]}{  "WIN-KD5C4xxx": [    "Windows Server 2019"  ]}{  "fridxxx": [    "java-1.8.0-openjdk",    "kernel",    "kernel-headers",    "kernel-tools",    "kernel-tools-libs",    "python-perf"  ]}{  "mcd-xxx-xxx": [    "dbus",    "fribidi",    "gnupg2",    "graphite2",    ...


The accepted answer doesn't produce valid json, but:

{  "name1": [    "1.1.1.1",    "1.1.1.2"  ]}{  "name2": [    "1.1.1.3",    "1.1.1.4"  ]}

name1 as well as name2 are valid json objects, but the output as a whole isn't.

The following jq statement results in the desired output as specified in the question:

group_by(.component) | map({ key: (.[0].component), value: [.[] | .ip] }) | from_entries

Output:

{  "name1": [    "1.1.1.1",    "1.1.1.2"  ],  "name2": [    "1.1.1.3",    "1.1.1.4"  ]}

Suggestions for simpler approaches are welcome.

If human readability is preferred over valid json, I'd suggest something like ...

jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'

... which results in ...

IPs for name1: ["1.1.1.1","1.1.1.2"]IPs for name2: ["1.1.1.3","1.1.1.4"]