How come regex match objects aren't iterable even though they implement __getitem__?
There are lies, damned lies and then there is Python documentation.
Having __getitem__
for a class implemented in C is not enough for it to be iterable. That is because there are actually 2 places in the PyTypeObject
where the __getitem__
can be mapped to: tp_as_sequence
and tp_as_mapping
. Both have a slot for __getitem__
([1], [2]).
Looking at the source of the SRE_Match
, tp_as_sequence
is initialized to NULL
whereas tp_as_mapping
is defined.
The iter()
built-in function, if called with one argument, will call the PyObject_GetIter
, which has the following code:
f = t->tp_iter;if (f == NULL) { if (PySequence_Check(o)) return PySeqIter_New(o); return type_error("'%.200s' object is not iterable", o);}
It first checks the tp_iter
slot (obviously NULL
for _SRE_Match
objects); and failing that, then if PySequence_Check
returns true, a new sequence iterator, else a TypeError
is raised.
PySequenceCheck
first checks if the object is a dict
or a dict
subclass - and returns false in that case. Otherwise it returns the value of
s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL;
and since s->ob_type->tp_as_sequence
was NULL
for a _SRE_Match
instance, 0 will be returned, and PyObject_GetIter
raises TypeError: '_sre.SRE_Match' object is not iterable
.