Python dataclass from a nested dict
I'm the author of dacite
- the tool that simplifies creation of data classes from dictionaries.
This library has only one function from_dict
- this is a quick example of usage:
from dataclasses import dataclassfrom dacite import from_dict@dataclassclass User: name: str age: int is_active: booldata = { 'name': 'john', 'age': 30, 'is_active': True,}user = from_dict(data_class=User, data=data)assert user == User(name='john', age=30, is_active=True)
Moreover dacite
supports following features:
- nested structures
- (basic) types checking
- optional fields (i.e. typing.Optional)
- unions
- collections
- values casting and transformation
- remapping of fields names
... and it's well tested - 100% code coverage!
To install dacite, simply use pip (or pipenv):
$ pip install dacite
Below is the CPython implementation of asdict
– or specifically, the internal recursive helper function _asdict_inner
that it uses:
# Source: https://github.com/python/cpython/blob/master/Lib/dataclasses.pydef _asdict_inner(obj, dict_factory): if _is_dataclass_instance(obj): result = [] for f in fields(obj): value = _asdict_inner(getattr(obj, f.name), dict_factory) result.append((f.name, value)) return dict_factory(result) elif isinstance(obj, tuple) and hasattr(obj, '_fields'): # [large block of author comments] return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj]) elif isinstance(obj, (list, tuple)): # [ditto] return type(obj)(_asdict_inner(v, dict_factory) for v in obj) elif isinstance(obj, dict): return type(obj)((_asdict_inner(k, dict_factory), _asdict_inner(v, dict_factory)) for k, v in obj.items()) else: return copy.deepcopy(obj)
asdict
simply calls the above with some assertions, and dict_factory=dict
by default.
How can this be adapted to create an output dictionary with the required type-tagging, as mentioned in the comments?
1. Adding type information
My attempt involved creating a custom return wrapper inheriting from dict
:
class TypeDict(dict): def __init__(self, t, *args, **kwargs): super(TypeDict, self).__init__(*args, **kwargs) if not isinstance(t, type): raise TypeError("t must be a type") self._type = t @property def type(self): return self._type
Looking at the original code, only the first clause needs to be modified to use this wrapper, as the other clauses only handle containers of dataclass
-es:
# only use dict for now; easy to add back laterdef _todict_inner(obj): if is_dataclass_instance(obj): result = [] for f in fields(obj): value = _todict_inner(getattr(obj, f.name)) result.append((f.name, value)) return TypeDict(type(obj), result) elif isinstance(obj, tuple) and hasattr(obj, '_fields'): return type(obj)(*[_todict_inner(v) for v in obj]) elif isinstance(obj, (list, tuple)): return type(obj)(_todict_inner(v) for v in obj) elif isinstance(obj, dict): return type(obj)((_todict_inner(k), _todict_inner(v)) for k, v in obj.items()) else: return copy.deepcopy(obj)
Imports:
from dataclasses import dataclass, fields, is_dataclass# thanks to Patrick Haughfrom typing import *# deepcopy import copy
Functions used:
# copy of the internal function _is_dataclass_instancedef is_dataclass_instance(obj): return is_dataclass(obj) and not is_dataclass(obj.type)# the adapted version of asdictdef todict(obj): if not is_dataclass_instance(obj): raise TypeError("todict() should be called on dataclass instances") return _todict_inner(obj)
Tests with the example dataclasses:
c = C([Point(0, 0), Point(10, 4)])print(c)cd = todict(c)print(cd)# {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}print(cd.type)# <class '__main__.C'>
Results are as expected.
2. Converting back to a dataclass
The recursive routine used by asdict
can be re-used for the reverse process, with some relatively minor changes:
def _fromdict_inner(obj): # reconstruct the dataclass using the type tag if is_dataclass_dict(obj): result = {} for name, data in obj.items(): result[name] = _fromdict_inner(data) return obj.type(**result) # exactly the same as before (without the tuple clause) elif isinstance(obj, (list, tuple)): return type(obj)(_fromdict_inner(v) for v in obj) elif isinstance(obj, dict): return type(obj)((_fromdict_inner(k), _fromdict_inner(v)) for k, v in obj.items()) else: return copy.deepcopy(obj)
Functions used:
def is_dataclass_dict(obj): return isinstance(obj, TypeDict)def fromdict(obj): if not is_dataclass_dict(obj): raise TypeError("fromdict() should be called on TypeDict instances") return _fromdict_inner(obj)
Test:
c = C([Point(0, 0), Point(10, 4)])cd = todict(c)cf = fromdict(cd)print(c)# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])print(cf)# C(mylist=[Point(x=0, y=0), Point(x=10, y=4)])
Again as expected.
All it takes is a five-liner:
def dataclass_from_dict(klass, d): try: fieldtypes = {f.name:f.type for f in dataclasses.fields(klass)} return klass(**{f:dataclass_from_dict(fieldtypes[f],d[f]) for f in d}) except: return d # Not a dataclass field
Sample usage:
from dataclasses import dataclass, asdict@dataclassclass Point: x: float y: float@dataclassclass Line: a: Point b: Pointline = Line(Point(1,2), Point(3,4))assert line == dataclass_from_dict(Line, asdict(line))
Full code, including to/from json, here at gist: https://gist.github.com/gatopeich/1efd3e1e4269e1e98fae9983bb914f22