Handling UTF-8 in C++ Handling UTF-8 in C++ linux linux

Handling UTF-8 in C++


Don't use wstring on Linux.

std::wstring VS std::string

Take a look at first answer. I'm sure it answers your question.

  1. When I should use std::wstring over std::string?

On Linux? Almost never (§).

On Windows? Almost always (§).


The language itself has nothing to do with unicode or any other character coding. It is tied to operating system. Windows uses UTF16 for unicode support which implies using wide chars (16-bit wide chars) - wchar_t or std:wstring. Each Win Api function operating with strings requires wide char input.

But unix-based systems i.e. Mac OS X or Linux use UTF8. Of course - it is only a matter of how you handle bytes in the array, so you can have UTF16 string stored in common C array or std:string container. This is why you do not see any wstrings in cross-platform code; instead all strings are handled as UTF8 and re-encoded when necessary to UTF16 (on windows).

You have more options how to handle this a bit confusing stuff. I personally do it as mentioned above - by strictly using UTF8 coding in all the application, re-encoding strings when interacting with Windows Api and directly using them on Mac OS X. For the win re-encoding I use great conversion helpers:

C++ UTF-8 Conversion Helpers (on MSDN, available under the Apache License, Version 2.0).

You can also use cross-platform Qt String which defines conversion functions from UTF8 to/from UTF16 and other codings (ANSI, Latin...).

So the answer above - on unix use always UTF8 (std::string, char), on Windows UTF16 (std::wstring, wchar_t) is true.


Remember that on startup of the main program, the "C" locale is selected as default. You probably don't want this if you handle utf-8.Calling setlocale(LC_CTYPE, "") turns off this default, and you get whatever is defined in the environment (presumably a utf-8 locale).