Validating detailed types in python dataclasses
Instead of checking for type equality, you should use isinstance
. But you cannot use a parametrized generic type (typing.List[int]
) to do so, you must use the "generic" version (typing.List
). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__
attribute that you can use for that.
Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__
attribute. Compare:
# Python 3.6>>> import typing>>> typing.List.__origin__>>> typing.List[int].__origin__typing.List
and
# Python 3.7>>> import typing>>> typing.List.__origin__<class 'list'>>>> typing.List[int].__origin__<class 'list'>
Python 3.8 introduce even better support with the typing.get_origin()
introspection function:
# Python 3.8>>> import typing>>> typing.get_origin(typing.List)<class 'list'>>>> typing.get_origin(typing.List[int])<class 'list'>
Notable exceptions being typing.Any
, typing.Union
and typing.ClassVar
… Well, anything that is a typing._SpecialForm
does not define __origin__
. Fortunately:
>>> isinstance(typing.Union, typing._SpecialForm)True>>> isinstance(typing.Union[int, str], typing._SpecialForm)False>>> typing.get_origin(typing.Union[int, str])typing.Union
But parametrized types define an __args__
attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args()
function to retrieve them:
# Python 3.7>>> typing.Union[int, str].__args__(<class 'int'>, <class 'str'>)# Python 3.8>>> typing.get_args(typing.Union[int, str])(<class 'int'>, <class 'str'>)
So we can improve type checking a bit:
for field_name, field_def in self.__dataclass_fields__.items(): if isinstance(field_def.type, typing._SpecialForm): # No check for typing.Any, typing.Union, typing.ClassVar (without parameters) continue try: actual_type = field_def.type.__origin__ except AttributeError: # In case of non-typing types (such as <class 'int'>, for instance) actual_type = field_def.type # In Python 3.8 one would replace the try/except with # actual_type = typing.get_origin(field_def.type) or field_def.type if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] actual_type = field_def.type.__args__ actual_value = getattr(self, field_name) if not isinstance(actual_value, actual_type): print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'") ret = False
This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]]
or typing.Optional[typing.List[int]]
for instance, but it should get things started.
Next is the way to apply this check.
Instead of using __post_init__
, I would go the decorator route: this could be used on anything with type hints, not only dataclasses
:
import inspectimport typingfrom contextlib import suppressfrom functools import wrapsdef enforce_types(callable): spec = inspect.getfullargspec(callable) def check_types(*args, **kwargs): parameters = dict(zip(spec.args, args)) parameters.update(kwargs) for name, value in parameters.items(): with suppress(KeyError): # Assume un-annotated parameters can be any type type_hint = spec.annotations[name] if isinstance(type_hint, typing._SpecialForm): # No check for typing.Any, typing.Union, typing.ClassVar (without parameters) continue try: actual_type = type_hint.__origin__ except AttributeError: # In case of non-typing types (such as <class 'int'>, for instance) actual_type = type_hint # In Python 3.8 one would replace the try/except with # actual_type = typing.get_origin(type_hint) or type_hint if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] actual_type = type_hint.__args__ if not isinstance(value, actual_type): raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value))) def decorate(func): @wraps(func) def wrapper(*args, **kwargs): check_types(*args, **kwargs) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable)
Usage being:
@enforce_types@dataclasses.dataclassclass Point: x: float y: float@enforce_typesdef foo(bar: typing.Union[int, str]): pass
Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:
type hints using strings (
class Foo: def __init__(self: 'Foo'): pass
) are not taken into account byinspect.getfullargspec
: you may want to usetyping.get_type_hints
andinspect.signature
instead;a default value which is not the appropriate type is not validated:
@enforce_type def foo(bar: int = None): pass foo()
does not raise any
TypeError
. You may want to useinspect.Signature.bind
in conjuction withinspect.BoundArguments.apply_defaults
if you want to account for that (and thus forcing you to definedef foo(bar: typing.Optional[int] = None)
);variable number of arguments can't be validated as you would have to define something like
def foo(*args: typing.Sequence, **kwargs: typing.Mapping)
and, as said at the beginning, we can only validate containers and not contained objects.
Update
After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing
module and will propose a few findings and a new approach here.
For starter, typing
is doing a great job in finding when an argument is optional:
>>> def foo(a: int, b: str, c: typing.List[str] = None):... pass... >>> typing.get_type_hints(foo){'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}
This is pretty neat and definitely an improvement over inspect.getfullargspec
, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints
will bail out for other kind of default values:
>>> def foo(a: int, b: str, c: typing.List[str] = 3):... pass... >>> typing.get_type_hints(foo){'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}
So you may still need extra strict checking, even though such cases feels very fishy.
Next is the case of typing
hints used as arguments for typing._SpecialForm
, such as typing.Optional[typing.List[str]]
or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]
. Since the __args__
of these typing._SpecialForm
s is always a tuple, it is possible to recursively find the __origin__
of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm
left.
Proposed improvements:
import inspectimport typingfrom functools import wrapsdef _find_type_origin(type_hint): if isinstance(type_hint, typing._SpecialForm): # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal, # typing.NoReturn, typing.Optional, or typing.Union without parameters return actual_type = typing.get_origin(type_hint) or type_hint # requires Python 3.8 if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] or … for origins in map(_find_type_origin, typing.get_args(type_hint)): yield from origins else: yield actual_typedef _check_types(parameters, hints): for name, value in parameters.items(): type_hint = hints.get(name, typing.Any) actual_types = tuple(_find_type_origin(type_hint)) if actual_types and not isinstance(value, actual_types): raise TypeError( f"Expected type '{type_hint}' for argument '{name}'" f" but received type '{type(value)}' instead" )def enforce_types(callable): def decorate(func): hints = typing.get_type_hints(func) signature = inspect.signature(func) @wraps(func) def wrapper(*args, **kwargs): parameters = dict(zip(signature.parameters, args)) parameters.update(kwargs) _check_types(parameters, hints) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable)def enforce_strict_types(callable): def decorate(func): hints = typing.get_type_hints(func) signature = inspect.signature(func) @wraps(func) def wrapper(*args, **kwargs): bound = signature.bind(*args, **kwargs) bound.apply_defaults() parameters = dict(zip(signature.parameters, bound.args)) parameters.update(bound.kwargs) _check_types(parameters, hints) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable)
Thanks to @Aran-Fey that helped me improve this answer.
Just found this question.
pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)
Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.
from datetime import datetimefrom pydantic.dataclasses import dataclass@dataclassclass User: id: int name: str = 'John Doe' signup_ts: datetime = Noneprint(User(id=42, signup_ts='2032-06-21T12:00'))"""User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0))"""User(id='not int', signup_ts='2032-06-21T12:00')
The last line will give:
...pydantic.error_wrappers.ValidationError: 1 validation errorid value is not a valid integer (type=type_error.integer)
I created a tiny Python library for this purpose: https://github.com/tamuhey/dataclass_utils
This library can be applied for such dataclass that holds another dataclass (nested dataclass), and nested container type (like Tuple[List[Dict...
)