NumPy: 3-byte, 6-byte types (aka uint24, uint48) NumPy: 3-byte, 6-byte types (aka uint24, uint48) numpy numpy

NumPy: 3-byte, 6-byte types (aka uint24, uint48)


I don't believe there's a way to do what you're asking (it would require unaligned access, which is highly inefficient on some architectures). My solution from Reading and storing arbitrary byte length integers from a file might be more efficient at transferring the data to an in-process array:

a = np.memmap("filename", mode='r', dtype=np.dtype('>u1'))e = np.zeros(a.size / 6, np.dtype('>u8'))for i in range(3):    e.view(dtype='>u2')[i + 1::4] = a.view(dtype='>u2')[i::3]

You can get unaligned access using the strides constructor parameter:

e = np.ndarray((a.size - 2) // 6, np.dtype('<u8'), buf, strides=(6,))

However with this each element will overlap with the next, so to actually use it you'd have to mask out the high bytes on access.


There's an answer for this over at: How do I create a Numpy dtype that includes 24 bit integers?

It's a bit ugly, but does exactly what you want: Allows you to index your ndarray like it's got a dtype of <u3 so you can memmap() big data from disk.
You still need to manually apply a bitmask to clear out the fourth overlapping byte, but that can be applied to the sliced (multidimensional) array after access.

The trick is to abuse the 'stride' part of an ndarray, so that indexing works. In order to make it work without it complaining about limits, there's a special trick.


Using the code below you can read integers of any size coded as big or little endian:

def readBigEndian(filename, bytesize):    with (open(filename,"rb")) as f:         str = f.read(bytesize)         while len(str)==bytesize:             int = 0;             for byte in map(ord,str):                 print byte                 int = (int << 8) | byte             yield(int)             str = f.read(bytesize)def readLittleEndian(filename, bytesize):    with (open(filename,"rb")) as f:         str = f.read(bytesize)         while len(str)==bytesize:             int = 0;             shift = 0             for byte in map(ord,str):                 print byte                 int |= byte << shift                 shift += 8             yield(int)             str = f.read(bytesize)for i in readLittleEndian("readint.py",3):    print i