list comprehension filtering - "the set() trap" list comprehension filtering - "the set() trap" python-3.x python-3.x

list comprehension filtering - "the set() trap"


In order to optimize set(list_2), the interpreter needs to prove that list_2 (and all of its elements) does not change between iterations. This is a hard problem in the general case, and it would not surprise me if the interpreter does not even attempt to tackle it.

On the other hand a set literal cannot change its value between iterations, so the optimization is known to be safe.


From What’s New In Python 3.2:

Python’s peephole optimizer now recognizes patterns such x in {1, 2, 3} as being a test for membership in a set of constants. The optimizer recasts the set as a frozenset and stores the pre-built constant.


So now I'm wondering - why can python 3.x optimize away the set literal to only build once, but not set(list_2)?

No one's mentioned this issue yet: how do you know set([1,2,3]) and {1, 2, 3} are the same thing?

>>> import random>>> def set(arg):...     return [random.choice(range(5))]... >>> list1 = list(range(5))>>> [x for x in list1 if x in set(list1)][0, 4]>>> [x for x in list1 if x in set(list1)][0]

You can't shadow a literal; you can shadow set. So before you can consider hoisting, you need to know not just that list1 isn't being affected, you need to be sure that set is what you think it is. Sometimes you can do that, either under restrictive conditions at compile time or more conveniently at runtime, but it's definitely nontrivial.

It's kind of funny: often when the suggestion of doing optimizations like this comes up, one pushback is that as nice as they are, it makes it harder to reason about what Python performance is going to be like, even algorithmically. Your question provides some evidence for this objection.