Sorting string vectors: plain C vs idiomatic C++11
The cause is in c++ std io synchronization. The following code:
int main () { typedef std::vector<std::string> svec; svec a; std::string s; // note std::ios_base::sync_with_stdio(false); for (;;) { getline(std::cin, s); if (std::cin.eof()) { if (s != "") a.push_back(std::move(s)); break; } a.push_back(std::move(s)); } std::sort(a.begin(), a.end()); for (std::string &s : a) { std::cout << s << "\n"; }}
gives
real 0m0.106s user 0m0.104s sys 0m0.004s
The C-version gives this:
real 0m0.167s user 0m0.164s sys 0m0.000s
Edit: As RiaD correct mentioned sync_with_stdio
of course is static function, so it enough to call the function once for all std io streams.
You're also using two different I/O libraries. This will completely screw up any timing information, as the C and C++ I/O libs are very different. IOstreams are plain not designed for speed.
In addition, I/O is completely untimable. If the source of I/O data simply happened to be coming in slower one time, one program would appear to be slower regardless of the sort time- for example, if the OS happened to have it in cache for one run but not for another.
You need to time purely the time taken to sort a pre-existing std::vector<std::string>
, say.
Oh yeah, and your getl
is full of memory leak.
My guess is that you don't measure sorting speed, but memory reallocations. Instead of doing a.push_back
one element at a time, try allocating the vector memory upfront like you did with the C program
a.reserve(num_lines);
Depending on whether your compiler uses re-allocations with an expansion factor of 1.5
(VC++) or 2
(g++), you have 29
and 17
re-allocations with 140,190
lines in your file (i.e. log(total lines) / log(expansion factor)
).
The comment by R. Martinho Fernandes also hits the nail: use std::chrono::high_resolution_clock::now()
around the sort
statements in both programs to get the timing differences. This isolates you from memory and IO differences.