What unicode encoding (UTF-8, UTF-16, other) does Windows use for its Unicode data types?

c++ windows winapi unicode encoding

The values stored in memory for Windows are UTF-16 little-endian, always. But that's not what you're talking about - you're looking at file contents. Windows itself does not specify the encoding of files, it leaves that to individual applications.

The 0xfe 0xff you see at the start of the file is a Byte Order Mark or BOM. It not only indicates that the file is most probably Unicode, but it tells you which variant of Unicode encoding.

0xfe 0xff      UTF-16 big-endian0xff 0xfe      UTF-16 little-endian0xef 0xbb 0xbf UTF-8

A file that doesn't have a BOM should be assumed to be 8-bit characters unless you know how it was written. That still doesn't tell you if it's UTF-8 or some other Windows character encoding, you'll just have to guess.

You may use Notepad as an example of how this is done. If the file has a BOM then Notepad will read it and process the contents appropriately. Otherwise you must specify the coding yourself with the "Encoding" dropdown list.

Edit: the reason Windows documentation isn't more specific about the encoding is that Windows was a very early adopter of Unicode, and at the time there was only one encoding of 16 bits per code point. When 65536 code points were determined to be inadequate, surrogate pairs were invented as a way to extend the range and UTF-16 was born. Microsoft was already using Unicode to refer to their encoding and never changed.

CodeHunter

What unicode encoding (UTF-8, UTF-16, other) does Windows use for its Unicode data types?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last