What exactly is a Sequence? What exactly is a Sequence? numpy numpy

What exactly is a Sequence?


Brief introduction to typing in Python

Skip ahead if you know what structural typing, nominal typing and duck typing are.

I think much of the confusion arises from the fact that typing was a provisional module between versions 3.5 and 3.6. And was still subject to change between versions 3.7 and 3.8. This means there has been a lot of flux in how Python has sought to deal with typing through type annotations.

It also doesn't help that python is both duck-typed and nominally typed. That is, when accessing an attribute of an object, Python is duck-typed. The object will only be checked to see if it has an attribute at runtime, and only when immediately requested. However, Python also has nominal typing features (eg. isinstance()and issubclass()). Nominal typing is where one type is declared to be a subclass of another. This can be through inheritance, or with the register() method of ABCMeta.

typing originally introduced its types using the idea of nominal typing. As of 3.8 it is trying to allow for the more pythonic structural typing.Structural typing is related to duck-typing, except that it is taken into consideration at "compile time" rather than runtime. For instance, when a linter is trying to detect possible type errors -- such as if you were to pass a dict to a function that only accepts sequences like tuples or list. With structural typing, a class B should be considered a subtype of A if it implements the all the methods of A, regardless of whether it has been declared to be a subtype of A (as in nominal typing).

Answer

sequences (little s) are a duck type. A sequence is any ordered collection of objects that provides random access to its members. Specifically, if it defines __len__ and __getitem__ and uses integer indices between 0 and n-1 then it is a sequence. A Sequence (big s) is a nominal type. That is, to be a Sequence, a class must be declared as such, either by inheriting from Sequence or being registered as a subclass.

A numpy array is a sequence, but it is not a Sequence as it is not registered as a subclass of Sequence. Nor should it be, as it does not implement the full interface promised by Sequence (things like count() and index() are missing).

It sounds like you want is a structured type for a sequence (small s). As of 3.8 this is possible by using protocols. Protocols define a set of methods which a class must implement to be considered a subclass of the protocol (a la structural typing).

from typing import Protocolimport numpy as npclass MySequence(Protocol):    def __getitem__(self, index):        raise NotImplementedError    def __len__(self):        raise NotImplementedError    def __contains__(self, item):        raise NotImplementedError    def __iter__(self):        raise NotImplementedErrordef f(s: MySequence):    for i in range(len(s)):        print(s[i], end=' ')    print('end')f([1, 2, 3, 4]) # should be finearr: np.ndarray = np.arange(5)f(arr) # also finef({}) # might be considered fine! Depends on your type checker

Protocols are fairly new, so not all IDEs/type checkers might support them yet. The IDE I use, PyCharm, does. It doesn't like f({}), but it is happy to consider a numpy array a Sequence (big S) though (perhaps not ideal). You can enable runtime checking of protocols by using the runtime_checkable decorator of typing. Be warned, all this does is individually check that each of the Protocols methods can be found on the given object/class. As a result, it can become quite expensive if your protocol has a lot of methods.


I think the most practical way to define a sequence in Python is 'A container that supports indexing with integers'.

The Wikipedia definition also holds:

a sequence is an enumerated collection of objects in which repetitions are allowed and order does matter.

To validate if an object is a sequence, I would emulate the logic from the Sequence Protocol:

hasattr(test_obj, "__getitem__") and not isinstance(test_obj, collections.abc.Mapping) 


Per the doc you pasted:

The collections.abc.Sequence abstract base class defines a much richer interface that goes beyond just __getitem__() and __len__(), adding count(), index(), __contains__(), and __reversed__(). Types that implement this expanded interface can be registered explicitly using register().

numpy.ndarray does not implement the Sequence protocol because it does not implement count() or index():

>>> arr = numpy.arange(6)>>> isinstance(arr, Sequence)False>>> arr.count(3)Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'count'>>> arr.index(3)Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'index'

Contrast to a range:

>>> r = range(6)>>> isinstance(r, Sequence)True>>> r.count(3)1>>> r.index(3)3

If you want to claim that arr is a Sequence you can, by using the register() class method:

>>> Sequence.register(numpy.ndarray)<class 'numpy.ndarray'>>>> isinstance(arr, Sequence)True

but this is a lie, because it doesn't actually implement the protocol (the register() function doesn't actually check for that, it just trusts you):

>>> arr.count(3)Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'numpy.ndarray' object has no attribute 'count'

so doing this may lead to errors if you pass a numpy.ndarray to a function that expects a Sequence.