How to write a multidimensional array to a text file? How to write a multidimensional array to a text file? numpy numpy

How to write a multidimensional array to a text file?


If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as npx = np.arange(20).reshape((4,5))np.savetxt('test.txt', x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as npx = np.arange(200).reshape((4,5,10))np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))with open('test.txt', 'w') as outfile:    for slice_2d in x:        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np# Generate some test datadata = np.arange(200).reshape((4,5,10))# Write the array to diskwith open('test.txt', 'w') as outfile:    # I'm writing a header here just for the sake of readability    # Any line starting with "#" will be ignored by numpy.loadtxt    outfile.write('# Array shape: {0}\n'.format(data.shape))        # Iterating through a ndimensional array produces slices along    # the last axis. This is equivalent to data[i,:,:] in this case    for data_slice in data:        # The formatting string indicates that I'm writing out        # the values in left-justified columns 7 characters in width        # with 2 decimal places.          np.savetxt(outfile, data_slice, fmt='%-7.2f')        # Writing out a break to indicate different slices...        outfile.write('# New slice\n')

This yields:

# Array shape: (4, 5, 10)0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  # New slice50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  # New slice100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 # New slice150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 # New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):

# Read the array from disknew_data = np.loadtxt('test.txt')# Note that this returned a 2D array!print new_data.shape# However, going back to 3D is easy if we know the # original shape of the arraynew_data = new_data.reshape((4,5,10))    # Just to check that they're the same...assert np.all(new_data == data)


I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that's not a primary concern, just pickle it.

To save it:

import picklemy_data = {'a': [1, 2.0, 3, 4+6j],           'b': ('string', u'Unicode string'),           'c': None}output = open('data.pkl', 'wb')pickle.dump(my_data, output)output.close()

To read it back:

import pprint, picklepkl_file = open('data.pkl', 'rb')data1 = pickle.load(pkl_file)pprint.pprint(data1)pkl_file.close()


If you don't need a human-readable output, another option you could try is to save the array as a MATLAB .mat file, which is a structured array. I despise MATLAB, but the fact that I can both read and write a .mat in very few lines is convenient.

Unlike Joe Kington's answer, the benefit of this is that you don't need to know the original shape of the data in the .mat file, i.e. no need to reshape upon reading in. And, unlike using pickle, a .mat file can be read by MATLAB, and probably some other programs/languages as well.

Here is an example:

import numpy as npimport scipy.io# Some test datax = np.arange(200).reshape((4,5,10))# Specify the filename of the .mat filematfile = 'test_mat.mat'# Write the array to the mat file. For this to work, the array must be the value# corresponding to a key name of your choice in a dictionaryscipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')# For the above line, I specified the kwarg oned_as since python (2.7 with # numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary # since oned_as is a kwarg for dealing with 1-D arrays.# Now load in the data from the .mat that was just savedmatdata = scipy.io.loadmat(matfile)# And just to check if the data is the same:assert np.all(x == matdata['out'])

If you forget the key that the array is named in the .mat file, you can always do:

print matdata.keys()

And of course you can store many arrays using many more keys.

So yes – it won't be readable with your eyes, but only takes 2 lines to write and read the data, which I think is a fair trade-off.

Take a look at the docs for scipy.io.savematand scipy.io.loadmatand also this tutorial page: scipy.io File IO Tutorial