Parsing json with awk/sed in bash to get key value pair Parsing json with awk/sed in bash to get key value pair json json

Parsing json with awk/sed in bash to get key value pair


I would advise that you use 'jq', or a real JSON parser. You can't "parse" JSON with arbitrary regular expressions. You could hack something with awk, but that will break easily if your input has a form you didn't anticipate.

So, the answer is, introduce a cheap dependency (jq, or similar tool), and script around that. Unless you're running this script in a router or an embedded computer, chances are you can easily install jq.


If the key characters [, and {, }, and ] are always isolated in every line this would work:

#!/usr/bin/awk -ffunction walk(level, end) {    while (getline > 0) {        if (level && $NF ~ end) {            return        }         if ($NF == "{") {            walk(level + 1, "},?")        } else if ($NF == "[") {            walk(level + 1, "],?")        } else if (level == 3 && match($0, /"(title|description|id|createDate)":"[^"]*"/)) {            print substr($0, RSTART, RLENGTH)        }    }}BEGIN {    walk(0)    exit}

Input:

{"documents":[{"title":"a",   //needed"description":"b",  //needed"id":"c",  //needed....(some more:not useful)...."conversation":[{"message":"","id":"d",   //not needed.....(some more)...."createDate":"e",   //not needed},...(some more messages)....],"createDate":"f",  //needed....(many more labels).....}],....(some more global attributes)....}

Output:

"title":"a""description":"b""id":"c""createDate":"f"


Well, if you're going to use a regex to parse JSON, which will by nature be quick, dirty and heavily reliant on the exact syntax of the input file, you could write something that relies on the amount of white space occurring before the key value pairs you're interested in. Depending on the kind of output you're looking for, you could use something along the lines of:

awk '/^ {12}"title//^ {12}"description//^ {12}"id//^ {12}"createDate/' input_file.json

Not great, but it does the trick on your example input...