NumPy: 3-byte, 6-byte types (aka uint24, uint48)

python numpy bigdata

I don't believe there's a way to do what you're asking (it would require unaligned access, which is highly inefficient on some architectures). My solution from Reading and storing arbitrary byte length integers from a file might be more efficient at transferring the data to an in-process array:

a = np.memmap("filename", mode='r', dtype=np.dtype('>u1'))e = np.zeros(a.size / 6, np.dtype('>u8'))for i in range(3):    e.view(dtype='>u2')[i + 1::4] = a.view(dtype='>u2')[i::3]

You can get unaligned access using the strides constructor parameter:

e = np.ndarray((a.size - 2) // 6, np.dtype('<u8'), buf, strides=(6,))

However with this each element will overlap with the next, so to actually use it you'd have to mask out the high bytes on access.

python numpy bigdata

There's an answer for this over at: How do I create a Numpy dtype that includes 24 bit integers?

It's a bit ugly, but does exactly what you want: Allows you to index your ndarray like it's got a dtype of <u3 so you can memmap() big data from disk.
You still need to manually apply a bitmask to clear out the fourth overlapping byte, but that can be applied to the sliced (multidimensional) array after access.

The trick is to abuse the 'stride' part of an ndarray, so that indexing works. In order to make it work without it complaining about limits, there's a special trick.

python numpy bigdata

Using the code below you can read integers of any size coded as big or little endian:

def readBigEndian(filename, bytesize):    with (open(filename,"rb")) as f:         str = f.read(bytesize)         while len(str)==bytesize:             int = 0;             for byte in map(ord,str):                 print byte                 int = (int << 8) | byte             yield(int)             str = f.read(bytesize)def readLittleEndian(filename, bytesize):    with (open(filename,"rb")) as f:         str = f.read(bytesize)         while len(str)==bytesize:             int = 0;             shift = 0             for byte in map(ord,str):                 print byte                 int |= byte << shift                 shift += 8             yield(int)             str = f.read(bytesize)for i in readLittleEndian("readint.py",3):    print i

CodeHunter

NumPy: 3-byte, 6-byte types (aka uint24, uint48)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last