Functionality of Python `in` vs. `__contains__`
Use the source, Luke!
Let's trace down the in
operator implementation
>>> import dis>>> class test(object):... def __contains__(self, other):... return True>>> def in_():... return 1 in test()>>> dis.dis(in_) 2 0 LOAD_CONST 1 (1) 3 LOAD_GLOBAL 0 (test) 6 CALL_FUNCTION 0 (0 positional, 0 keyword pair) 9 COMPARE_OP 6 (in) 12 RETURN_VALUE
As you can see, the in
operator becomes the COMPARE_OP
virtual machine instruction. You can find that in ceval.c
TARGET(COMPARE_OP) w = POP(); v = TOP(); x = cmp_outcome(oparg, v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x == NULL) break; PREDICT(POP_JUMP_IF_FALSE); PREDICT(POP_JUMP_IF_TRUE); DISPATCH();
Take a look at one of the switches in cmp_outcome()
case PyCmp_IN: res = PySequence_Contains(w, v); if (res < 0) return NULL; break;
Here we have the PySequence_Contains
call
intPySequence_Contains(PyObject *seq, PyObject *ob){ Py_ssize_t result; PySequenceMethods *sqm = seq->ob_type->tp_as_sequence; if (sqm != NULL && sqm->sq_contains != NULL) return (*sqm->sq_contains)(seq, ob); result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS); return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);}
That always returns an int
(a boolean).
P.S.
Thanks to Martijn Pieters for providing the way to find the implementation of the in
operator.
In Python reference for __contains__
it's written that __contains__
should return True
or False
.
If the return value is not boolean it's converted to boolean. Here is proof:
class MyValue: def __bool__(self): print("__bool__ function ran") return Trueclass Dummy: def __contains__(self, val): return MyValue()
Now write in shell:
>>> dum = Dummy()>>> 7 in dum__bool__ function ranTrue
And bool()
of nonempty list returns True
.
Edit:
It's only documentation for __contains__
, if you really want to see precise relation you should consider looking into source code although I'm not sure where exactly, but it's already answered. In documentation for comparison it's written:
However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an
if
statement), Python will call bool() on the value to determine if the result is true or false.
So you can guess that it's similar with __contains__
.
This is for anyone who is reading this to understand which one to use, I would say use __contains__()
instead of in, since it is faster.
For checking this, I did a simple experiment.
import timestartTime = time.time()q = 'abababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababababc'print(q.__contains__('c'))#print('c' in q)endTime = time.time()deltaTime = endTime - startTimeprint(deltaTime)
For one iteration, I commented the in and other time, I commented __contains__
. Here are the results:
(Using in)PS C:\Users\username> & python c:/Users/username/containsvsin.pyTrue0.0009970664978027344(Using __contains__)PS C:\Users\username> & python c:/Users/username/Downloads/containsvsin.pyTrue0.0