c++ optimize array of ints c++ optimize array of ints arrays arrays

c++ optimize array of ints


I see several options for your array compaction.

1. Separate 8-bit and 1-bit arrays

You can split your array into 2 parts: first one stores 8 low-order bits of your original array, second one stores '1' if value does not fit in 8 bits or '0' otherwise. This will take 9 bits per value (same space as in nightcracker's approach, but a little bit simpler). To read value from these two arrays, do the following:

int8_t array8[37*73] = {...};uint16_t array1[(37*73+15)/16] = {...};size_t offset = 37 * x + y;int16_t item = static_cast<int16_t>(array8[offset]); // sign extendint16_t overflow = ((array1[offset/16] >> (offset%16)) & 0x0001) << 7;item ^= overflow;

2. Approximation

If you can approximate your array with some efficiently computed function (like polynomial or exponent), you can store in the array only the difference between your value and the approximation. This may require only 8 bits per value or even less.

3. Delta encoding

If your data is smooth enough, in addition to applying either of previous methods, you can store a shorter table with only part of the data values and other table, containing only differences between all values, absent in the first table, and values from the first table. This requires less bits for each value.

For example, you can store every fifth value and differences for other values:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7     Short array: 0         2         3         5         6         6Difference array:   0 1 1 2   0 0 0 1   0 1 1 2   0 0 0 1   0 0 0 0   0 1 1 1

Alternatively, you can use differences from previous value, which requires even less bits per value:

  Original array: 0 0 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6 6 6 7 7 7     Short array: 0         2         3         5         6         6     Delta array:   0 1 0 1   0 0 0 1   0 1 0 1   0 0 0 1   0 0 0 0   0 1 0 0

Approach with delta array may be efficiently implemented using bitwise operations if a group of delta values fits exactly in int16_t.


Initialization

For option #2, preprocessor may be used. For other options, preprocessor is possible, but may be not very convenient (preprocessor is not very good to process long value lists). Some combination of preprocessor and variadic templates may be better. Or it may be easier to use some text-processing script.


Update

After looking at the actual data, I can tell some more details. Option #2 (Approximation) is not very convenient for your data. Option #1 seems to be better. Or you can use Mark Ransom's or nightcracker's approach. It doesn't matter, which one - in all cases you save 7 bits out of 16.

Option #3 (Delta encoding) allows to save much more space. It cannot be used directly, because in some cells of the array data changes abruptly. But, as far as I know, these large changes happen at most once for each row. Which may be implemented by one additional column with full data value and one special value in the delta array.

I noticed, that (ignoring these abrupt changes) difference between neighbor values is never more than +/- 32. This requires 6 bits to encode each delta value. This means 6.6 bits per value. 58% compression. About 2400 bytes. (Not much, but a little bit better than 2464K in your comments).

Middle part of the array is much more smooth. You'll need only 5 bits per value to encode it separately. This may save 300..400 bytes more. Probably it's a good idea to split this array into several parts and encode each part differently.


As nightcracker has noted your values will fit into 9 bits. There's an easier way to store those values though. Put the absolute values into a byte array and put the sign bits into a separate packed bit array.

int8_t my_array[37][73] = {{**DATA ABSOLUTE VALUES HERE**}};int8_t my_signs[37][10] = {{**SIGN BITS HERE**}};int16_t my_value = my_array[i][j];if (my_signs[i][j/8] & (1 << j%8))    my_value = -my_value;

This is a 44% reduction in your original table size without too much effort.


I know from experience that visualizing things can help find a good solution to a problem. Since it isn't very clear what your data is actually representing (and so we know nothing/very little about the problem domain) we might not come up with "the best" solution (if one exists at all ofcourse). So I took the liberty and visualized the data; as the saying goes: a picture is worth a 1000 words :-)

I am sorry I do not have a solution (yet) better than the ones already posted but I thought the plot might help someone (or myself) come up with a better solution.

enter image description here