What does the order parameter in numpy.array() do AKA what is contiguous order? What does the order parameter in numpy.array() do AKA what is contiguous order? numpy numpy

What does the order parameter in numpy.array() do AKA what is contiguous order?


Lets first unpack what K A C and F stand for first. I am referring to the implementation details section of this.

  • C Is Contiguous layout. Mathematically speaking, row major.
  • F Is Fortran contiguous layout. Mathematically speaking, column major.
  • A Is any order. Generally don't use this.
  • K Is keep order. Generally don't use this.

From here I can refer you to other answers that address the two following questions: Data Contiguity and Row vs. Column Major Ordering. Row vs Column Major Ordering is best described by its Wikipedia article. So now lets talk about data contiguity. In python this generally is not so important so I'm going to jump to C for a moment here.

In C there are two options for storing a 2D array.

  1. An array of arrays
  2. A flattened array

In the first example, the type of data we are storing in our array is another array. In terms of pointers, we have a block of memory where each value in it is a pointer to another block of memory. In order to find a value at any point we must de-reference first the outer array and then the inner array.

In the second example, we have a single block of memory the size of rows * columns. We can we can de-reference any index to get its value. But the indices are 1 dimensional. A 2D index can be converted using y + x * width.

When doing numerical calculations, we strive to use contiguous arrays. The reason for this is cache acceleration, which numpy relies on. If I want to add the value a to each value in a 2D array, I could move the entire flattened array into the cache if it fits. However, you could only move a single column (or row) into the cache for an array of arrays. If you want to know more, look up SIMD [Same instruction multiple data].