Find all occurrences of a key in nested dictionaries and lists Find all occurrences of a key in nested dictionaries and lists python python

Find all occurrences of a key in nested dictionaries and lists


I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -6.03 usec/pass on find_all_items(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -0.15 usec/pass on findkeys(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -1.79 usec/pass on get_recursively(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -0.14 usec/pass on find(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -0.36 usec/pass on dict_extract(k,o)- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:

o = { 'temparature': '50',       'logging': {        'handlers': {          'console': {            'formatter': 'simple',             'class': 'logging.StreamHandler',             'stream': 'ext://sys.stdout',             'level': 'DEBUG'          }        },        'loggers': {          'simpleExample': {            'handlers': ['console'],             'propagate': 'no',             'level': 'INFO'          },         'root': {           'handlers': ['console'],            'level': 'DEBUG'         }       },        'version': '1',        'formatters': {         'simple': {           'datefmt': "'%Y-%m-%d %H:%M:%S'",            'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'         }       }     },      'treatment': {'second': 5, 'last': 4, 'first': 4},        'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]}

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

def gen_dict_extract(key, var):    if hasattr(var,'iteritems'):        for k, v in var.iteritems():            if k == key:                yield v            if isinstance(v, dict):                for result in gen_dict_extract(key, v):                    yield result            elif isinstance(v, list):                for d in v:                    for result in gen_dict_extract(key, d):                        yield result

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

Interesting learning aspect here :)


d = { "id" : "abcde",    "key1" : "blah",    "key2" : "blah blah",    "nestedlist" : [     { "id" : "qwerty",        "nestednestedlist" : [         { "id" : "xyz", "keyA" : "blah blah blah" },        { "id" : "fghi", "keyZ" : "blah blah blah" }],        "anothernestednestedlist" : [         { "id" : "asdf", "keyQ" : "blah blah" },        { "id" : "yuiop", "keyW" : "blah" }] } ] } def fun(d):    if 'id' in d:        yield d['id']    for k in d:        if isinstance(d[k], list):            for i in d[k]:                for j in fun(i):                    yield j

>>> list(fun(d))['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']


d = { "id" : "abcde",    "key1" : "blah",    "key2" : "blah blah",    "nestedlist" : [    { "id" : "qwerty",        "nestednestedlist" : [        { "id" : "xyz", "keyA" : "blah blah blah" },        { "id" : "fghi", "keyZ" : "blah blah blah" }],        "anothernestednestedlist" : [        { "id" : "asdf", "keyQ" : "blah blah" },        { "id" : "yuiop", "keyW" : "blah" }] } ] }def findkeys(node, kv):    if isinstance(node, list):        for i in node:            for x in findkeys(i, kv):               yield x    elif isinstance(node, dict):        if kv in node:            yield node[kv]        for j in node.values():            for x in findkeys(j, kv):                yield xprint(list(findkeys(d, 'id')))