Python dataclass from a nested dict Python dataclass from a nested dict python python

Python dataclass from a nested dict


I'm the author of dacite - the tool that simplifies creation of data classes from dictionaries.

This library has only one function from_dict - this is a quick example of usage:

from dataclasses import dataclassfrom dacite import from_dict@dataclassclass User:    name: str    age: int    is_active: booldata = {    'name': 'john',    'age': 30,    'is_active': True,}user = from_dict(data_class=User, data=data)assert user == User(name='john', age=30, is_active=True)

Moreover dacite supports following features:

  • nested structures
  • (basic) types checking
  • optional fields (i.e. typing.Optional)
  • unions
  • collections
  • values casting and transformation
  • remapping of fields names

... and it's well tested - 100% code coverage!

To install dacite, simply use pip (or pipenv):

$ pip install dacite


Below is the CPython implementation of asdict– or specifically, the internal recursive helper function _asdict_inner that it uses:

# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.pydef _asdict_inner(obj, dict_factory):    if _is_dataclass_instance(obj):        result = []        for f in fields(obj):            value = _asdict_inner(getattr(obj, f.name), dict_factory)            result.append((f.name, value))        return dict_factory(result)    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):        # [large block of author comments]        return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])    elif isinstance(obj, (list, tuple)):        # [ditto]        return type(obj)(_asdict_inner(v, dict_factory) for v in obj)    elif isinstance(obj, dict):        return type(obj)((_asdict_inner(k, dict_factory),                          _asdict_inner(v, dict_factory))                         for k, v in obj.items())    else:        return copy.deepcopy(obj)

asdict simply calls the above with some assertions, and dict_factory=dict by default.

How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?


1. Adding type information

My attempt involved creating a custom return wrapper inheriting from dict:

class TypeDict(dict):    def __init__(self, t, *args, **kwargs):        super(TypeDict, self).__init__(*args, **kwargs)        if not isinstance(t, type):            raise TypeError("t must be a type")        self._type = t    @property    def type(self):        return self._type

Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass-es:

# only use dict for now; easy to add back laterdef _todict_inner(obj):    if is_dataclass_instance(obj):        result = []        for f in fields(obj):            value = _todict_inner(getattr(obj, f.name))            result.append((f.name, value))        return TypeDict(type(obj), result)    elif isinstance(obj, tuple) and hasattr(obj, '_fields'):        return type(obj)(*[_todict_inner(v) for v in obj])    elif isinstance(obj, (list, tuple)):        return type(obj)(_todict_inner(v) for v in obj)    elif isinstance(obj, dict):        return type(obj)((_todict_inner(k), _todict_inner(v))                         for k, v in obj.items())    else:        return copy.deepcopy(obj)

Imports:

from dataclasses import dataclass, fields, is_dataclass# thanks to Patrick Haughfrom typing import *# deepcopy import copy

Functions used:

# copy of the internal function _is_dataclass_instancedef is_dataclass_instance(obj):    return is_dataclass(obj) and not is_dataclass(obj.type)# the adapted version of asdictdef todict(obj):    if not is_dataclass_instance(obj):         raise TypeError("todict() should be called on dataclass instances")    return _todict_inner(obj)

Tests with the example dataclasses:

c = C([Point(0, 0), Point(10, 4)])print(c)cd = todict(c)print(cd)# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}print(cd.type)# <class '__main__.C'>

Results are as expected.


2. Converting back to a dataclass

The recursive routine used by asdict can be re-used for the reverse process, with some relatively minor changes:

def _fromdict_inner(obj):    # reconstruct the dataclass using the type tag    if is_dataclass_dict(obj):        result = {}        for name, data in obj.items():            result[name] = _fromdict_inner(data)        return obj.type(**result)    # exactly the same as before (without the tuple clause)    elif isinstance(obj, (list, tuple)):        return type(obj)(_fromdict_inner(v) for v in obj)    elif isinstance(obj, dict):        return type(obj)((_fromdict_inner(k), _fromdict_inner(v))                         for k, v in obj.items())    else:        return copy.deepcopy(obj)

Functions used:

def is_dataclass_dict(obj):    return isinstance(obj, TypeDict)def fromdict(obj):    if not is_dataclass_dict(obj):        raise TypeError("fromdict() should be called on TypeDict instances")    return _fromdict_inner(obj)

Test:

c = C([Point(0, 0), Point(10, 4)])cd = todict(c)cf = fromdict(cd)print(c)# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])print(cf)# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])

Again as expected.


All it takes is a five-liner:

def dataclass_from_dict(klass, d):    try:        fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)}        return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d})    except:        return d # Not a dataclass field

Sample usage:

from dataclasses import dataclass, asdict@dataclassclass Point:    x: float    y: float@dataclassclass Line:    a: Point    b: Pointline = Line(Point(1,2), Point(3,4))assert line == dataclass_from_dict(Line, asdict(line))

Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22