Python Numpy Data Types Performance Python Numpy Data Types Performance numpy numpy

Python Numpy Data Types Performance


Half precision arithmetic (float16) is something which must be "emulated" by numpy I guess, as there are no corresponding types in the underlying C language (and in the appropriate processor instructions) for it. On the other hand, single precision (float32) and double precision (float64) operations can be done very efficiently using native data types.

As of the good performance for single precision operations: Modern processors have efficient units for vectorized floating point arithmetics (e.g. AVX) as it is also needed for good multimedia performance.


16 bit floating point numbers are not supports by most common CPUs directly (though graphics card vendors are apparently involved in this data type, so I expect GPUs to support it eventually). I expect them to be emulated, in a comparatively slow way. Google tells me that float16 was once hardware-dependent and some people wanted to emulate it for hardware that doesn't support it, though I didn't find anything on whether that actually happened.

32 bit floats, on the other hand, are not only supported natively, you can also vectorize many operations on them with SIMD instruction set extensions, which drastically reduces the overhead for the kind of operation you benchmark. The exception is shuffling data around, but in that case, float32 is on par with int32 and both can use the same SIMD instructions to load and store larger blocks of memory.

While there are also SIMD instructions for integer math, they are less common (e.g. SEE introduced them in a later version than the float versions) and often less sophisticated. My guess is that (your build of) NumPy doesn't have SIMD implementations of the operations that are slower for you. Alternatively, the integer operations may not be as optimized: Floats are used in many easy-to-vectorize applications whose performance matters a lot (e.g. image/media/video en- and decoding), so they may be more optimized.