What exactly is a Sequence?
Brief introduction to typing in Python
Skip ahead if you know what structural typing, nominal typing and duck typing are.
I think much of the confusion arises from the fact that typing
was a provisional module between versions 3.5 and 3.6. And was still subject to change between versions 3.7 and 3.8. This means there has been a lot of flux in how Python has sought to deal with typing through type annotations.
It also doesn't help that python is both duck-typed and nominally typed. That is, when accessing an attribute of an object, Python is duck-typed. The object will only be checked to see if it has an attribute at runtime, and only when immediately requested. However, Python also has nominal typing features (eg. isinstance()
and issubclass()
). Nominal typing is where one type is declared to be a subclass of another. This can be through inheritance, or with the register()
method of ABCMeta
.
typing
originally introduced its types using the idea of nominal typing. As of 3.8 it is trying to allow for the more pythonic structural typing.Structural typing is related to duck-typing, except that it is taken into consideration at "compile time" rather than runtime. For instance, when a linter is trying to detect possible type errors -- such as if you were to pass a dict
to a function that only accepts sequences like tuples or list. With structural typing, a class B
should be considered a subtype of A
if it implements the all the methods of A
, regardless of whether it has been declared to be a subtype of A
(as in nominal typing).
Answer
sequences (little s) are a duck type. A sequence is any ordered collection of objects that provides random access to its members. Specifically, if it defines __len__
and __getitem__
and uses integer indices between 0 and n-1 then it is a sequence. A Sequence (big s) is a nominal type. That is, to be a Sequence, a class must be declared as such, either by inheriting from Sequence or being registered as a subclass.
A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence. Nor should it be, as it does not implement the full interface promised by Sequence (things like count()
and index()
are missing).
It sounds like you want is a structured type for a sequence (small s). As of 3.8 this is possible by using protocols. Protocols define a set of methods which a class must implement to be considered a subclass of the protocol (a la structural typing).
from typing import Protocolimport numpy as npclass MySequence(Protocol): def __getitem__(self, index): raise NotImplementedError def __len__(self): raise NotImplementedError def __contains__(self, item): raise NotImplementedError def __iter__(self): raise NotImplementedErrordef f(s: MySequence): for i in range(len(s)): print(s[i], end=' ') print('end')f([1, 2, 3, 4]) # should be finearr: np.ndarray = np.arange(5)f(arr) # also finef({}) # might be considered fine! Depends on your type checker
Protocols are fairly new, so not all IDEs/type checkers might support them yet. The IDE I use, PyCharm, does. It doesn't like f({})
, but it is happy to consider a numpy array a Sequence (big S) though (perhaps not ideal). You can enable runtime checking of protocols by using the runtime_checkable
decorator of typing
. Be warned, all this does is individually check that each of the Protocols methods can be found on the given object/class. As a result, it can become quite expensive if your protocol has a lot of methods.
I think the most practical way to define a sequence in Python is 'A container that supports indexing with integers'.
The Wikipedia definition also holds:
a sequence is an enumerated collection of objects in which repetitions are allowed and order does matter.
To validate if an object is a sequence, I would emulate the logic from the Sequence Protocol:
hasattr(test_obj, "__getitem__") and not isinstance(test_obj, collections.abc.Mapping)
Per the doc you pasted:
The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just
__getitem__()
and__len__()
, addingcount()
,index()
,__contains__()
, and__reversed__()
. Types that implement this expanded interface can be registered explicitly using register().
numpy.ndarray
does not implement the Sequence
protocol because it does not implement count()
or index()
:
>>> arr = numpy.arange(6)>>> isinstance(arr, Sequence)False>>> arr.count(3)Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'count'>>> arr.index(3)Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'index'
Contrast to a range
:
>>> r = range(6)>>> isinstance(r, Sequence)True>>> r.count(3)1>>> r.index(3)3
If you want to claim that arr
is a Sequence
you can, by using the register()
class method:
>>> Sequence.register(numpy.ndarray)<class 'numpy.ndarray'>>>> isinstance(arr, Sequence)True
but this is a lie, because it doesn't actually implement the protocol (the register()
function doesn't actually check for that, it just trusts you):
>>> arr.count(3)Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'count'
so doing this may lead to errors if you pass a numpy.ndarray
to a function that expects a Sequence
.