Why std::u16string is slower than array of char16_t? Why std::u16string is slower than array of char16_t? arrays arrays

Why std::u16string is slower than array of char16_t?


Because of the way libc++ implements the small string optimization, on every dereference it needs to check whether the string contents are stored in the string object itself or on the heap. Because the indexing is wrapped in benchmark::DoNotOptimize, it needs to perform this check every time a character is accessed. When accessing the string data via a pointer the data is always external, and so requires no check.


Interestingly I am unable to reproduce your results. I can barely detect a difference between the two.

The (incomplete) code I used is shown here:

hol::StdTimer timer;using index_type = std::size_t;index_type const N = 100'000'000;index_type const SIZE = 1024;static std::u16string s16;static char16_t const* p16;int main(int, char** argv){    std::generate_n(std::back_inserter(s16), SIZE,        []{ return (char)hol::random_number((int)'A', (int)'Z'); });    p16 = s16.c_str();    unsigned sum;    {        sum = 0;        timer.start();        for(index_type n = 0; n < N; ++n)            for(index_type i = 0; i < SIZE; ++i)                sum += s16[i];        timer.stop();        RESULT("string", sum, timer);    }    {        sum = 0;        timer.start();        for(std::size_t n = 0; n < N; ++n)            for(std::size_t i = 0; i < SIZE; ++i)                sum += p16[i];        timer.stop();        RESULT("array ", sum, timer);    }}

Output:

string: (670240768) 17.575232 secsarray : (670240768) 17.546145 secs

Compiler:

GCC 7.1 g++ -std=c++14 -march=native -O3 -D NDEBUG


In pure char16_t you access array directly, while in string you have overloaded operator[]

referenceoperator[](size_type __pos){    #ifdef _GLIBCXX_DEBUG_PEDANTIC    __glibcxx_check_subscript(__pos);#else    // as an extension v3 allows s[s.size()] when s is non-const.    _GLIBCXX_DEBUG_VERIFY(__pos <= this->size(),        _M_message(__gnu_debug::__msg_subscript_oob)        ._M_sequence(*this, "this")        ._M_integer(__pos, "__pos")        ._M_integer(this->size(), "size"));#endif    return _M_base()[__pos];}

and _M_base() is:

_Base& _M_base() { return *this; }

Now, my guesses are that either:

  1. _M_base() might not get inlined and than you get performance hit because of every read takes additional operation to read the function address.

or

  1. One of those subscript checks happen.