Why is it that UTF-8 encoding is used when interacting with a UNIX/Linux environment?

linux unix encoding

Partly because the file systems expect NUL ('\0') bytes to terminate file names, so UTF-16 would not work well. You'd have to modify a lot of code to make that change.

linux unix encoding

As jonathan-leffler mentions, the prime issue is the ASCII null character. C traditionally expects a string to be null terminated. So standard C string functions will choke on any UTF-16 character containing a byte equivalent to an ASCII null (0x00). While you can certainly program with wide character support, UTF-16 is not a suitable external encoding of Unicode in filenames, text files, environment variables.

Furthermore, UTF-16 and UTF-32 have both big endian and little endian orientations. To deal with this, you'll either need external metadata like a MIME type, or a Byte Orientation Mark. It notes,

Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.

The predecessor to UTF-16, which was called UCS-2 and didn't support surrogate pairs, had the same issues. UCS-2 should be avoided.

linux unix encoding

I believe it's mainly the backwards compatability that UTF8 gives with ASCII.

For an answer to the 'dangers' question, you need to specify what you mean by 'interacting'. Do you mean interacting with the shell, with libc, or with the kernel proper?

CodeHunter

Why is it that UTF-8 encoding is used when interacting with a UNIX/Linux environment?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last