efficient way to compress a numpy array (python) efficient way to compress a numpy array (python) numpy numpy

efficient way to compress a numpy array (python)


It's not quite as nice as what you'd like, but I think you can do:

mask = my_array['job'] == 'this'for condition in ['that', 'other']:  mask = numpy.logical_or(mask,my_array['job'] == condition)selected_array = my_array[mask]


The best way to compress a numpy array is to use pytables. It is the defacto standard when it comes to handling a large amount of numerical data.

import tables as thdf5_file = t.openFile('outfile.hdf5')hdf5_file.createArray ......hdf5_file.close()


If you're looking for a numpy-only solution, I don't think you'll get it. Still, although it does lots of work under the covers, consider whether the tabular package might be able to do what you want in a less "ugly" fashion. I'm not sure you'll get more "efficient" without writing a C extension yourself.

By the way, I think this is both efficient enough and pretty enough for just about any real case.

my_array.compress([x in ['this', 'that'] for x in my_array['job']])

As an extra step in making this less ugly and more efficient, you would presumably not have a hardcoded list in the middle, so I would use a set instead, as it's much faster to search than a list if the list has more than a few items:

job_set = set(['this', 'that'])my_array.compress([x in job_set for x in my_array['job']])

If you don't think this is efficient enough, I'd advise benchmarking so you'll have confidence that you're spending your time wisely as you try to make it even more efficient.