Best practices with reading and operating on fortran ordered arrays with numpy Best practices with reading and operating on fortran ordered arrays with numpy numpy numpy

Best practices with reading and operating on fortran ordered arrays with numpy


Numpy abstracts away the difference between Fortran ordering and C-ordering at the python level. (In fact, you can even have other orderings for >2d arrays with numpy. They're all treated the same at the python level.)

The only time you'll need to worry about C vs F ordering is when you're reading/writing to disk or passing the array to lower-level functions.

A Simple Example

As an example, let's make a simple 3D array in both C order and Fortran order:

In [1]: import numpy as npIn [2]: c_order = np.arange(27).reshape(3,3,3)In [3]: f_order = c_order.copy(order='F')In [4]: c_orderOut[4]: array([[[ 0,  1,  2],        [ 3,  4,  5],        [ 6,  7,  8]],       [[ 9, 10, 11],        [12, 13, 14],        [15, 16, 17]],       [[18, 19, 20],        [21, 22, 23],        [24, 25, 26]]])In [5]: f_orderOut[5]: array([[[ 0,  1,  2],        [ 3,  4,  5],        [ 6,  7,  8]],       [[ 9, 10, 11],        [12, 13, 14],        [15, 16, 17]],       [[18, 19, 20],        [21, 22, 23],        [24, 25, 26]]])

Notice that they both look identical (they are at the level we're interacting with them). How can you tell that they're in different orderings? First off, let's take a look at the flags (pay attention to C_CONTIGUOUS vs F_CONTIGUOUS):

In [6]: c_order.flagsOut[6]:   C_CONTIGUOUS : True  F_CONTIGUOUS : False  OWNDATA : False  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : FalseIn [7]: f_order.flagsOut[7]:   C_CONTIGUOUS : False  F_CONTIGUOUS : True  OWNDATA : True  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : False

And if you don't trust the flags, you can effectively view the memory order by looking at arr.ravel(order='K'). The order='K' is important. Otherwise, when you call arr.ravel() the output will be in C-order regardless of the memory layout of the array. order='K' uses the memory layout.

In [8]: c_order.ravel(order='K')Out[8]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])In [9]: f_order.ravel(order='K')Out[9]: array([ 0,  9, 18,  3, 12, 21,  6, 15, 24,  1, 10, 19,  4, 13, 22,  7, 16,       25,  2, 11, 20,  5, 14, 23,  8, 17, 26])

The difference is actually represented (and stored) in the strides of the array. Notice that c_order's strides are (72, 24, 8), while f_order's strides are (8, 24, 72).

Just to prove that the indexing works the same way:

In [10]: c_order[0,1,2]Out[10]: 5In [11]: f_order[0,1,2]Out[11]: 5

Reading and Writing

The main place where you'll run into problems with this is when you're reading from or writing to disk. Many file formats expect a particular ordering. I'm guessing that you're working with seismic data formats, and most of them (e.g. Geoprobe .vol's, and I think Petrel's volume format as well) essentially write a binary header and then a Fortran-ordered 3D array to disk.

With that in mind, I'll use a small seismic cube (snippet of some data from my dissertation) as an example.

Both of these are binary arrays of uint8s with a shape of 50x100x198. One is in C-order, while the other is in Fortran-order. c_order.dat f_order.dat

To read them in:

import numpy as npshape = (50, 100, 198)c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape(shape)f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape, order='F')assert np.all(c_order == f_order)

Notice that the only difference is specifying the memory layout to reshape. The memory layout of the two arrays is still different (reshape doesn't make a copy), but they're treated identically at the python level.

Just to prove that the files really are written in a different order:

In [1]: np.fromfile('c_order.dat', dtype=np.uint8)[:10]Out[1]: array([132, 142, 107, 204,  37,  37, 217,  37,  82,  60], dtype=uint8)In [2]: np.fromfile('f_order.dat', dtype=np.uint8)[:10]Out[2]: array([132, 129, 140, 138, 110,  88, 110, 124, 142, 139], dtype=uint8)

Let's visualize the result:

def plot(data):    fig, axes = plt.subplots(ncols=3)    for i, ax in enumerate(axes):        slices = [slice(None), slice(None), slice(None)]        slices[i] = data.shape[i] // 2        ax.imshow(data[tuple(slices)].T, cmap='gray_r')    return figplot(c_order).suptitle('C-ordered array')plot(f_order).suptitle('F-ordered array')plt.show()

enter image description hereenter image description here

Notice that we indexed them the same way, and they're displayed identically.

Common Mistakes with IO

First off, let's try reading in the Fortran-ordered file as if it were C-ordered and then take a look at the result (using the plot function above):

wrong_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape)plot(wrong_order)

enter image description here

Not so good!

You mentioned that you're having to use "reversed" indicies. This is probably because you fixed what happened in the figure above by doing something like (note the reversed shape!):

c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape([50,100,198])rev_f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape([198,100,50])

Let's visualize what happens:

plot(c_order).suptitle('C-ordered array')plot(rev_f_order).suptitle('Incorrectly read Fortran-ordered array')

enter image description hereenter image description here

Note that the image on the far right (the timeslice) of the first plot matches a transposed version of the image on the far left of the second.

Similarly, print rev_f_order[1,2,3] and print c_order[3,2,1] both yield 140, while indexing them the same way gives a different result.

Basically, this is where your reversed indices come from. Numpy thinks it's a C-ordered array with a different shape. Notice if we look at the flags, they're both C-contiguous in memory:

In [24]: rev_f_order.flagsOut[24]:   C_CONTIGUOUS : True  F_CONTIGUOUS : False  OWNDATA : False  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : FalseIn [25]: c_order.flagsOut[25]:   C_CONTIGUOUS : True  F_CONTIGUOUS : False  OWNDATA : False  WRITEABLE : True  ALIGNED : True  UPDATEIFCOPY : False

This is because a fortran-ordered array is equivalent to a C-ordered array with the reverse shape.

Writing to Disk in Fortran-Order

There's an additional wrinkle when writing a numpy array to disk in Fortran-order.

Unless you specify otherwise, the array will be written in C-order regardless of its memory-layout! (There's a clear note about this in the documentation for ndarray.tofile, but it's a common gotcha. The opposite behavior would be incorrect, though, i.m.o.)

Therefore, regardless of the memory layout of an array, to write it to disk in Fortran order, you need to do:

arr.ravel(order='F').tofile('output.dat')

If you're writing it as ascii, the same applies. Use ravel(order='F') and then write out the 1-dimensional result.