Validating detailed types in python dataclasses Validating detailed types in python dataclasses python python

Validating detailed types in python dataclasses


Instead of checking for type equality, you should use isinstance. But you cannot use a parametrized generic type (typing.List[int]) to do so, you must use the "generic" version (typing.List). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__ attribute that you can use for that.

Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__ attribute. Compare:

# Python 3.6>>> import typing>>> typing.List.__origin__>>> typing.List[int].__origin__typing.List

and

# Python 3.7>>> import typing>>> typing.List.__origin__<class 'list'>>>> typing.List[int].__origin__<class 'list'>

Python 3.8 introduce even better support with the typing.get_origin() introspection function:

# Python 3.8>>> import typing>>> typing.get_origin(typing.List)<class 'list'>>>> typing.get_origin(typing.List[int])<class 'list'>

Notable exceptions being typing.Any, typing.Union and typing.ClassVar… Well, anything that is a typing._SpecialForm does not define __origin__. Fortunately:

>>> isinstance(typing.Union, typing._SpecialForm)True>>> isinstance(typing.Union[int, str], typing._SpecialForm)False>>> typing.get_origin(typing.Union[int, str])typing.Union

But parametrized types define an __args__ attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args() function to retrieve them:

# Python 3.7>>> typing.Union[int, str].__args__(<class 'int'>, <class 'str'>)# Python 3.8>>> typing.get_args(typing.Union[int, str])(<class 'int'>, <class 'str'>)

So we can improve type checking a bit:

for field_name, field_def in self.__dataclass_fields__.items():    if isinstance(field_def.type, typing._SpecialForm):        # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)        continue    try:        actual_type = field_def.type.__origin__    except AttributeError:        # In case of non-typing types (such as <class 'int'>, for instance)        actual_type = field_def.type    # In Python 3.8 one would replace the try/except with    # actual_type = typing.get_origin(field_def.type) or field_def.type    if isinstance(actual_type, typing._SpecialForm):        # case of typing.Union[…] or typing.ClassVar[…]        actual_type = field_def.type.__args__    actual_value = getattr(self, field_name)    if not isinstance(actual_value, actual_type):        print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")        ret = False

This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]] or typing.Optional[typing.List[int]] for instance, but it should get things started.


Next is the way to apply this check.

Instead of using __post_init__, I would go the decorator route: this could be used on anything with type hints, not only dataclasses:

import inspectimport typingfrom contextlib import suppressfrom functools import wrapsdef enforce_types(callable):    spec = inspect.getfullargspec(callable)    def check_types(*args, **kwargs):        parameters = dict(zip(spec.args, args))        parameters.update(kwargs)        for name, value in parameters.items():            with suppress(KeyError):  # Assume un-annotated parameters can be any type                type_hint = spec.annotations[name]                if isinstance(type_hint, typing._SpecialForm):                    # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)                    continue                try:                    actual_type = type_hint.__origin__                except AttributeError:                    # In case of non-typing types (such as <class 'int'>, for instance)                    actual_type = type_hint                # In Python 3.8 one would replace the try/except with                # actual_type = typing.get_origin(type_hint) or type_hint                if isinstance(actual_type, typing._SpecialForm):                    # case of typing.Union[…] or typing.ClassVar[…]                    actual_type = type_hint.__args__                if not isinstance(value, actual_type):                    raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))    def decorate(func):        @wraps(func)        def wrapper(*args, **kwargs):            check_types(*args, **kwargs)            return func(*args, **kwargs)        return wrapper    if inspect.isclass(callable):        callable.__init__ = decorate(callable.__init__)        return callable    return decorate(callable)

Usage being:

@enforce_types@dataclasses.dataclassclass Point:    x: float    y: float@enforce_typesdef foo(bar: typing.Union[int, str]):    pass

Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:

  • type hints using strings (class Foo: def __init__(self: 'Foo'): pass) are not taken into account by inspect.getfullargspec: you may want to use typing.get_type_hints and inspect.signature instead;

  • a default value which is not the appropriate type is not validated:

     @enforce_type def foo(bar: int = None):     pass foo()

    does not raise any TypeError. You may want to use inspect.Signature.bind in conjuction with inspect.BoundArguments.apply_defaults if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None));

  • variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping) and, as said at the beginning, we can only validate containers and not contained objects.


Update

After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing module and will propose a few findings and a new approach here.

For starter, typing is doing a great job in finding when an argument is optional:

>>> def foo(a: int, b: str, c: typing.List[str] = None):...   pass... >>> typing.get_type_hints(foo){'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}

This is pretty neat and definitely an improvement over inspect.getfullargspec, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints will bail out for other kind of default values:

>>> def foo(a: int, b: str, c: typing.List[str] = 3):...   pass... >>> typing.get_type_hints(foo){'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}

So you may still need extra strict checking, even though such cases feels very fishy.

Next is the case of typing hints used as arguments for typing._SpecialForm, such as typing.Optional[typing.List[str]] or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]. Since the __args__ of these typing._SpecialForms is always a tuple, it is possible to recursively find the __origin__ of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm left.

Proposed improvements:

import inspectimport typingfrom functools import wrapsdef _find_type_origin(type_hint):    if isinstance(type_hint, typing._SpecialForm):        # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,        # typing.NoReturn, typing.Optional, or typing.Union without parameters        return    actual_type = typing.get_origin(type_hint) or type_hint  # requires Python 3.8    if isinstance(actual_type, typing._SpecialForm):        # case of typing.Union[…] or typing.ClassVar[…] or …        for origins in map(_find_type_origin, typing.get_args(type_hint)):            yield from origins    else:        yield actual_typedef _check_types(parameters, hints):    for name, value in parameters.items():        type_hint = hints.get(name, typing.Any)        actual_types = tuple(_find_type_origin(type_hint))        if actual_types and not isinstance(value, actual_types):            raise TypeError(                    f"Expected type '{type_hint}' for argument '{name}'"                    f" but received type '{type(value)}' instead"            )def enforce_types(callable):    def decorate(func):        hints = typing.get_type_hints(func)        signature = inspect.signature(func)        @wraps(func)        def wrapper(*args, **kwargs):            parameters = dict(zip(signature.parameters, args))            parameters.update(kwargs)            _check_types(parameters, hints)            return func(*args, **kwargs)        return wrapper    if inspect.isclass(callable):        callable.__init__ = decorate(callable.__init__)        return callable    return decorate(callable)def enforce_strict_types(callable):    def decorate(func):        hints = typing.get_type_hints(func)        signature = inspect.signature(func)        @wraps(func)        def wrapper(*args, **kwargs):            bound = signature.bind(*args, **kwargs)            bound.apply_defaults()            parameters = dict(zip(signature.parameters, bound.args))            parameters.update(bound.kwargs)            _check_types(parameters, hints)            return func(*args, **kwargs)        return wrapper    if inspect.isclass(callable):        callable.__init__ = decorate(callable.__init__)        return callable    return decorate(callable)

Thanks to @Aran-Fey that helped me improve this answer.


Just found this question.

pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)

Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.

from datetime import datetimefrom pydantic.dataclasses import dataclass@dataclassclass User:    id: int    name: str = 'John Doe'    signup_ts: datetime = Noneprint(User(id=42, signup_ts='2032-06-21T12:00'))"""User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0))"""User(id='not int', signup_ts='2032-06-21T12:00')

The last line will give:

    ...pydantic.error_wrappers.ValidationError: 1 validation errorid  value is not a valid integer (type=type_error.integer)


I created a tiny Python library for this purpose: https://github.com/tamuhey/dataclass_utils

This library can be applied for such dataclass that holds another dataclass (nested dataclass), and nested container type (like Tuple[List[Dict...)