SSE instructions to add all elements of an array [duplicate]

c++ arrays sse simd sse2

If you just want to sum all the elements of an array then you need to load the data, unpack it to a wider element size, and then sum the unpacked elements. Note that you can maintain multiple partial sums until after the loop and then just do one final sum of these partial sums. For example:

uint32_t sum_array(const uint8_t a[], int n){    const __m128i vk0 = _mm_set1_epi8(0);       // constant vector of all 0s for use with _mm_unpacklo_epi8/_mm_unpackhi_epi8    const __m128i vk1 = _mm_set1_epi16(1);      // constant vector of all 1s for use with _mm_madd_epi16    __m128i vsum = _mm_set1_epi32(0);           // initialise vector of four partial 32 bit sums    uint32_t sum;    int i;    for (i = 0; i < n; i += 16)    {        __m128i v = _mm_load_si128(&a[i]);      // load vector of 8 bit values        __m128i vl = _mm_unpacklo_epi8(v, vk0); // unpack to two vectors of 16 bit values        __m128i vh = _mm_unpackhi_epi8(v, vk0);        vsum = _mm_add_epi32(vsum, _mm_madd_epi16(vl, vk1));        vsum = _mm_add_epi32(vsum, _mm_madd_epi16(vh, vk1));                                                // unpack and accumulate 16 bit values to                                                // 32 bit partial sum vector    }    // horizontal add of four 32 bit partial sums and return result    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4));    sum = _mm_cvtsi128_si32(vsum);    return sum;}

Note that there is one non-obvious trick in the above code - rather than further unpacking each 16 bit vector to a pair of 32 bit vectors (requiring 4 unpack instructions) and then using four 32 bit adds (another 4 instructions), we use _mm_madd_epi16 (PMADDWD) with a multiplicand of 1 and _mm_add_epi32 to effectively give us free unpacking, so we get the same result using 4 instructions instead of 8.

Note also that the input array, a[], needs to be 16 byte aligned, and n should be a multiple of 16.

CodeHunter

SSE instructions to add all elements of an array [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last