Why std::u16string is slower than array of char16_t?

c++ arrays performance c++11 stl

Because of the way libc++ implements the small string optimization, on every dereference it needs to check whether the string contents are stored in the string object itself or on the heap. Because the indexing is wrapped in benchmark::DoNotOptimize, it needs to perform this check every time a character is accessed. When accessing the string data via a pointer the data is always external, and so requires no check.

c++ arrays performance c++11 stl

Interestingly I am unable to reproduce your results. I can barely detect a difference between the two.

The (incomplete) code I used is shown here:

hol::StdTimer timer;using index_type = std::size_t;index_type const N = 100'000'000;index_type const SIZE = 1024;static std::u16string s16;static char16_t const* p16;int main(int, char** argv){    std::generate_n(std::back_inserter(s16), SIZE,        []{ return (char)hol::random_number((int)'A', (int)'Z'); });    p16 = s16.c_str();    unsigned sum;    {        sum = 0;        timer.start();        for(index_type n = 0; n < N; ++n)            for(index_type i = 0; i < SIZE; ++i)                sum += s16[i];        timer.stop();        RESULT("string", sum, timer);    }    {        sum = 0;        timer.start();        for(std::size_t n = 0; n < N; ++n)            for(std::size_t i = 0; i < SIZE; ++i)                sum += p16[i];        timer.stop();        RESULT("array ", sum, timer);    }}

Output:

string: (670240768) 17.575232 secsarray : (670240768) 17.546145 secs

Compiler:

GCC 7.1 g++ -std=c++14 -march=native -O3 -D NDEBUG

c++ arrays performance c++11 stl

In pure char16_t you access array directly, while in string you have overloaded operator[]

referenceoperator[](size_type __pos){    #ifdef _GLIBCXX_DEBUG_PEDANTIC    __glibcxx_check_subscript(__pos);#else    // as an extension v3 allows s[s.size()] when s is non-const.    _GLIBCXX_DEBUG_VERIFY(__pos <= this->size(),        _M_message(__gnu_debug::__msg_subscript_oob)        ._M_sequence(*this, "this")        ._M_integer(__pos, "__pos")        ._M_integer(this->size(), "size"));#endif    return _M_base()[__pos];}

and _M_base() is:

_Base& _M_base() { return *this; }

Now, my guesses are that either:

_M_base() might not get inlined and than you get performance hit because of every read takes additional operation to read the function address.

One of those subscript checks happen.

CodeHunter

Why std::u16string is slower than array of char16_t?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last