Why does a large for loop with 10 billion iterations take a much longer time to run in Python than in C? Why does a large for loop with 10 billion iterations take a much longer time to run in Python than in C? python-3.x python-3.x

Why does a large for loop with 10 billion iterations take a much longer time to run in Python than in C?


It's partially due to the fact that Python byte code is executed by a program instead of the CPU directly, but most of the overhead is caused by the memory allocation and deallocation caused by the immutability of integers which is due to the object model, not the interpretedness.

What's going on is that your C code can change the value of the numbers, but in Python numbers are immutable which means they do not change. This means that when you do a sum, Python has to create a new int object for each new value, and then destroy the old ints after they're no longer used. This makes it much slower than just modifying a single memory value.


There is also the possibility that your C compiler is being clever, and reasons that via a chain of optimisations it can completely remove your for loop, and the result will be identical – as if the loop had actually run. I'd expect the code to run much faster than it did if that had been the case in your example, but it could do that.

Python has no such smart compiler. It can't do something as grand and clever as that; it's just not designed to optimise the code because it's so hard to do so reliably in a dynamically-typed language (though the fact that Python is strongly-typed does make it somewhat of a possibility.


As dmuir noticed, the code can be simplified drastically if the compiler propagates some constants correctly. For example: clang -O1 compiles the C code down to this (cf https://gcc.godbolt.org/z/1ZH8Rm ):

main:                                   # @main        push    rax        movabs  rsi, -9877432110        mov     edi, offset .L.str        xor     eax, eax        call    printf        xor     eax, eax        pop     rcx        ret.L.str:        .asciz  "Sum is %lld\n"

gcc -O1 produces essentially similar code.

Since this boils down to a single call to printf, the explanation seems to be:

  • The Python compiler is not as smart as the C compiler to optimize this code.
  • Your C compiler takes a long time to compile this 12 line piece of code. 3 seconds is way too long given your hardware setup! It only takes 0.15 seconds on my dinky laptop to compile and run the code with all optimisations. Are you compiling as C++?

Testing the C version with optimisations disabled (-O0) produces this output:

$ time (clang -O0 -o loop10g loop10g.c && ./loop10g)Sum is -9877432110real    4m15.352suser    3m47.232ssys     0m3.252s

Still much faster with unoptimized C than Python: 255 seconds vs: >3500

The Python code is interpreted with byte-code and a stack with dynamically typed values: a factor of 10 to 20 is a typical slowdown. Furthermore the integer arithmetic automatically switches to bignum mode for large values, which may be the case here, although the penalty should be even higher.


The answer is very simple. Python is the interpreted language. All the instructions are executed by the interpreter (special program which executes the script). It is much slower than the C code which is compiled to the native machine code.