Are list comprehensions syntactic sugar for `list(generator expression)` in Python 3?
Both work differently. The list comprehension version takes advantage of the special bytecode LIST_APPEND
which calls PyList_Append
directly for us. Hence it avoids an attribute lookup to list.append
and a function call at the Python level.
>>> def func_lc(): [x**2 for x in y]...>>> dis.dis(func_lc) 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>) 3 LOAD_CONST 2 ('func_lc.<locals>.<listcomp>') 6 MAKE_FUNCTION 0 9 LOAD_GLOBAL 0 (y) 12 GET_ITER 13 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 16 POP_TOP 17 LOAD_CONST 0 (None) 20 RETURN_VALUE>>> lc_object = list(dis.get_instructions(func_lc))[0].argval>>> lc_object<code object <listcomp> at 0x10d3c6780, file "<ipython-input-42-ead395105775>", line 2>>>> dis.dis(lc_object) 2 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 16 (to 25) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 LOAD_CONST 0 (2) 18 BINARY_POWER 19 LIST_APPEND 2 22 JUMP_ABSOLUTE 6 >> 25 RETURN_VALUE
On the other hand the list()
version simply passes the generator object to list's __init__
method which then calls its extend
method internally. As the object is not a list or tuple, CPython then gets its iterator first and then simply adds the items to the list until the iterator is exhausted:
>>> def func_ge(): list(x**2 for x in y)...>>> dis.dis(func_ge) 2 0 LOAD_GLOBAL 0 (list) 3 LOAD_CONST 1 (<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>) 6 LOAD_CONST 2 ('func_ge.<locals>.<genexpr>') 9 MAKE_FUNCTION 0 12 LOAD_GLOBAL 1 (y) 15 GET_ITER 16 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 POP_TOP 23 LOAD_CONST 0 (None) 26 RETURN_VALUE>>> ge_object = list(dis.get_instructions(func_ge))[1].argval>>> ge_object<code object <genexpr> at 0x10cde6ae0, file "<ipython-input-41-f9a53483f10a>", line 2>>>> dis.dis(ge_object) 2 0 LOAD_FAST 0 (.0) >> 3 FOR_ITER 15 (to 21) 6 STORE_FAST 1 (x) 9 LOAD_FAST 1 (x) 12 LOAD_CONST 0 (2) 15 BINARY_POWER 16 YIELD_VALUE 17 POP_TOP 18 JUMP_ABSOLUTE 3 >> 21 LOAD_CONST 1 (None) 24 RETURN_VALUE>>>
Timing comparisons:
>>> %timeit [x**2 for x in range(10**6)]1 loops, best of 3: 453 ms per loop>>> %timeit list(x**2 for x in range(10**6))1 loops, best of 3: 478 ms per loop>>> %%timeitout = []for x in range(10**6): out.append(x**2)...1 loops, best of 3: 510 ms per loop
Normal loops are slightly slow due to slow attribute lookup. Cache it and time again.
>>> %%timeitout = [];append=out.appendfor x in range(10**6): append(x**2)...1 loops, best of 3: 467 ms per loop
Apart from the fact that list comprehension don't leak the variables anymore one more difference is that something like this is not valid anymore:
>>> [x**2 for x in 1, 2, 3] # Python 2[1, 4, 9]>>> [x**2 for x in 1, 2, 3] # Python 3 File "<ipython-input-69-bea9540dd1d6>", line 1 [x**2 for x in 1, 2, 3] ^SyntaxError: invalid syntax>>> [x**2 for x in (1, 2, 3)] # Add parenthesis[1, 4, 9]>>> for x in 1, 2, 3: # Python 3: For normal loops it still works print(x**2)...149
Both forms create and call an anonymous function. However, the list(...)
form creates a generator function and passes the returned generator-iterator to list
, while with the [...]
form, the anonymous function builds the list directly with LIST_APPEND
opcodes.
The following code gets decompilation output of the anonymous functions for an example comprehension and its corresponding genexp-passed-to-list
:
import disdef f(): [x for x in []]def g(): list(x for x in [])dis.dis(f.__code__.co_consts[1])dis.dis(g.__code__.co_consts[1])
The output for the comprehension is
4 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 12 (to 21) 9 STORE_FAST 1 (x) 12 LOAD_FAST 1 (x) 15 LIST_APPEND 2 18 JUMP_ABSOLUTE 6 >> 21 RETURN_VALUE
The output for the genexp is
7 0 LOAD_FAST 0 (.0) >> 3 FOR_ITER 11 (to 17) 6 STORE_FAST 1 (x) 9 LOAD_FAST 1 (x) 12 YIELD_VALUE 13 POP_TOP 14 JUMP_ABSOLUTE 3 >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE
You can actually show that the two can have different outcomes to prove they are inherently different:
>>> list(next(iter([])) if x > 3 else x for x in range(10))[0, 1, 2, 3]>>> [next(iter([])) if x > 3 else x for x in range(10)]Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp>StopIteration
The expression inside the comprehension is not treated as a generator since the comprehension does not handle the StopIteration
, whereas the list
constructor does.