How come regex match objects aren't iterable even though they implement __getitem__? How come regex match objects aren't iterable even though they implement __getitem__? python python

How come regex match objects aren't iterable even though they implement __getitem__?


There are lies, damned lies and then there is Python documentation.

Having __getitem__ for a class implemented in C is not enough for it to be iterable. That is because there are actually 2 places in the PyTypeObject where the __getitem__ can be mapped to: tp_as_sequence and tp_as_mapping. Both have a slot for __getitem__ ([1], [2]).

Looking at the source of the SRE_Match, tp_as_sequence is initialized to NULL whereas tp_as_mapping is defined.

The iter() built-in function, if called with one argument, will call the PyObject_GetIter, which has the following code:

f = t->tp_iter;if (f == NULL) {    if (PySequence_Check(o))        return PySeqIter_New(o);    return type_error("'%.200s' object is not iterable", o);}

It first checks the tp_iter slot (obviously NULL for _SRE_Match objects); and failing that, then if PySequence_Check returns true, a new sequence iterator, else a TypeError is raised.

PySequenceCheck first checks if the object is a dict or a dict subclass - and returns false in that case. Otherwise it returns the value of

s->ob_type->tp_as_sequence &&    s->ob_type->tp_as_sequence->sq_item != NULL;

and since s->ob_type->tp_as_sequence was NULL for a _SRE_Match instance, 0 will be returned, and PyObject_GetIter raises TypeError: '_sre.SRE_Match' object is not iterable.