Implementing concurrent_vector according to intel blog Implementing concurrent_vector according to intel blog arrays arrays

Implementing concurrent_vector according to intel blog


I would not try to make the segmentsLarge and segmentsSmall a union. Yes this wastes one more pointer. Then the pointer, lets call it just segments can initially point to segmentsSmall.

On the other hand the other methods can always use the same pointer which makes them simpler.

And switching from small to large can be accomplished by one compare exchange of a pointer.

I am not sure how this could be accomplished safely with a union.

The idea would look something like this (note that I used C++11, which the Intel library predates, so they likely did it with their atomic intrinsics).This probably misses quite a few details which I am sure the Intel people have thought more about, so you will likely have to check this against the implementations of all other methods.

#include <atomic>#include <array>#include <cstddef>#include <climits>template<typename T>class concurrent_vector{private:  std::atomic<size_t> size;  std::atomic<T**> segments;  std::array<T*, 3> segmentsSmall;  unsigned lastSegmentIndex = 0;  void switch_to_large()  {    T** segmentsOld = segments;    if( segmentsOld == segmentsSmall.data()) {      // not yet switched      T** segmentsLarge = new T*[sizeof(size_t) * CHAR_BIT];      // note that we leave the original segment allocations alone and just copy the pointers      std::copy(segmentsSmall.begin(), segmentsSmall.end(), segmentsLarge);      for(unsigned i = segmentsSmall.size(); i < numSegments; ++i) {        segmentsLarge[i] = nullptr;      }      // now both the old and the new segments array are valid      if( segments.compare_exchange_strong(segmentsOld, segmentsLarge)) {        // success!        return;      }  else {        // already switched, just clean up        delete[] segmentsLarge;      }    }  }public:  concurrent_vector()  : size(0), segments(segmentsSmall.data())  {    //The initial array is contiguous just for the sake of cache optimization    T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3    segmentsSmall[0] = initialContiguousBlock;    segmentsSmall[1] = initialContiguousBlock + 2;    segmentsSmall[2] = initialContiguousBlock + 2 + 4;  }  void push_back(T& item)  {    if(size > 2 + 4 + 8) {      switch_to_large();    }    // here we may have to allocate more segments atomically    ++size;    //afterwards adds the item to the appropriate slot in the appropriate segment  }};