Sorting string vectors: plain C vs idiomatic C++11 Sorting string vectors: plain C vs idiomatic C++11 c c

Sorting string vectors: plain C vs idiomatic C++11


The cause is in c++ std io synchronization. The following code:

int main () {    typedef std::vector<std::string> svec;    svec a;    std::string s;    // note    std::ios_base::sync_with_stdio(false);    for (;;) {    getline(std::cin, s);    if (std::cin.eof()) {        if (s != "")            a.push_back(std::move(s));        break;    }        a.push_back(std::move(s));    }    std::sort(a.begin(), a.end());    for (std::string &s : a) {        std::cout << s << "\n";    }}

gives

 real   0m0.106s user   0m0.104s sys    0m0.004s

The C-version gives this:

 real   0m0.167s user   0m0.164s sys    0m0.000s

Edit: As RiaD correct mentioned sync_with_stdio of course is static function, so it enough to call the function once for all std io streams.


You're also using two different I/O libraries. This will completely screw up any timing information, as the C and C++ I/O libs are very different. IOstreams are plain not designed for speed.

In addition, I/O is completely untimable. If the source of I/O data simply happened to be coming in slower one time, one program would appear to be slower regardless of the sort time- for example, if the OS happened to have it in cache for one run but not for another.

You need to time purely the time taken to sort a pre-existing std::vector<std::string>, say.

Oh yeah, and your getl is full of memory leak.


My guess is that you don't measure sorting speed, but memory reallocations. Instead of doing a.push_back one element at a time, try allocating the vector memory upfront like you did with the C program

a.reserve(num_lines);

Depending on whether your compiler uses re-allocations with an expansion factor of 1.5 (VC++) or 2 (g++), you have 29 and 17 re-allocations with 140,190 lines in your file (i.e. log(total lines) / log(expansion factor)).

The comment by R. Martinho Fernandes also hits the nail: use std::chrono::high_resolution_clock::now() around the sort statements in both programs to get the timing differences. This isolates you from memory and IO differences.