Python Compilation/Interpretation Process Python Compilation/Interpretation Process python python

Python Compilation/Interpretation Process


The bytecode is not actually interpreted to machine code, unless you are using some exotic implementation such as pypy.

Other than that, you have the description correct. The bytecode is loaded into the Python runtime and interpreted by a virtual machine, which is a piece of code that reads each instruction in the bytecode and executes whatever operation is indicated. You can see this bytecode with the dis module, as follows:

>>> def fib(n): return n if n < 2 else fib(n - 2) + fib(n - 1)... >>> fib(10)55>>> import dis>>> dis.dis(fib)  1           0 LOAD_FAST                0 (n)              3 LOAD_CONST               1 (2)              6 COMPARE_OP               0 (<)              9 JUMP_IF_FALSE            5 (to 17)             12 POP_TOP                          13 LOAD_FAST                0 (n)             16 RETURN_VALUE                >>   17 POP_TOP                          18 LOAD_GLOBAL              0 (fib)             21 LOAD_FAST                0 (n)             24 LOAD_CONST               1 (2)             27 BINARY_SUBTRACT                  28 CALL_FUNCTION            1             31 LOAD_GLOBAL              0 (fib)             34 LOAD_FAST                0 (n)             37 LOAD_CONST               2 (1)             40 BINARY_SUBTRACT                  41 CALL_FUNCTION            1             44 BINARY_ADD                       45 RETURN_VALUE        >>> 

Detailed explanation

It is quite important to understand that the above code is never executed by your CPU; nor is it ever converted into something that is (at least, not on the official C implementation of Python). The CPU executes the virtual machine code, which performs the work indicated by the bytecode instructions. When the interpreter wants to execute the fib function, it reads the instructions one at a time, and does what they tell it to do. It looks at the first instruction, LOAD_FAST 0, and thus grabs parameter 0 (the n passed to fib) from wherever parameters are held and pushes it onto the interpreter's stack (Python's interpreter is a stack machine). On reading the next instruction, LOAD_CONST 1, it grabs constant number 1 from a collection of constants owned by the function, which happens to be the number 2 in this case, and pushes that onto the stack. You can actually see these constants:

>>> fib.func_code.co_consts(None, 2, 1)

The next instruction, COMPARE_OP 0, tells the interpreter to pop the two topmost stack elements and perform an inequality comparison between them, pushing the Boolean result back onto the stack. The fourth instruction determines, based on the Boolean value, whether to jump forward five instructions or continue on with the next instruction. All that verbiage explains the if n < 2 part of the conditional expression in fib. It will be a highly instructive exercise for you to tease out the meaning and behaviour of the rest of the fib bytecode. The only one, I'm not sure about is POP_TOP; I'm guessing JUMP_IF_FALSE is defined to leave its Boolean argument on the stack rather than popping it, so it has to be popped explicitly.

Even more instructive is to inspect the raw bytecode for fib thus:

>>> code = fib.func_code.co_code>>> code'|\x00\x00d\x01\x00j\x00\x00o\x05\x00\x01|\x00\x00S\x01t\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00t\x00\x00|\x00\x00d\x02\x00\x18\x83\x01\x00\x17S'>>> import opcode>>> op = code[0]>>> op'|'>>> op = ord(op)>>> op124>>> opcode.opname[op]'LOAD_FAST'>>> 

Thus you can see that the first byte of the bytecode is the LOAD_FAST instruction. The next pair of bytes, '\x00\x00' (the number 0 in 16 bits) is the argument to LOAD_FAST, and tells the bytecode interpreter to load parameter 0 onto the stack.


To complete the great Marcelo Cantos's answer, here is just a small column-by-column summary to explain the output of disassembled bytecode.

For example, given this function:

def f(num):    if num == 42:        return True    return False

This may be disassembled into (Python 3.6):

(1)|(2)|(3)|(4)|          (5)         |(6)|  (7)---|---|---|---|----------------------|---|-------  2|   |   |  0|LOAD_FAST             |  0|(num)   |-->|   |  2|LOAD_CONST            |  1|(42)   |   |   |  4|COMPARE_OP            |  2|(==)   |   |   |  6|POP_JUMP_IF_FALSE     | 12|   |   |   |   |                      |   |  3|   |   |  8|LOAD_CONST            |  2|(True)   |   |   | 10|RETURN_VALUE          |   |   |   |   |   |                      |   |  4|   |>> | 12|LOAD_CONST            |  3|(False)   |   |   | 14|RETURN_VALUE          |   |

Each column has a specific purpose:

  1. The corresponding line number in the source code
  2. Optionally indicates the current instruction executed (when the bytecode comes from a frame object for example)
  3. A label which denotes a possible JUMP from an earlier instruction to this one
  4. The address in the bytecode which corresponds to the byte index (those are multiples of 2 because Python 3.6 use 2 bytes for each instruction, while it could vary in previous versions)
  5. The instruction name (also called opname), each one is briefly explained in the dis module and their implementation can be found in ceval.c (the core loop of CPython)
  6. The argument (if any) of the instruction which is used internally by Python to fetch some constants or variables, manage the stack, jump to a specific instruction, etc.
  7. The human-friendly interpretation of the instruction argument