What's the best way to identify unicode encoded text files in Windows?

See “How to detect the character encoding of a text-file?” or “How to reliably guess the encoding [...]?”

UTF-8 can be detected with validation. You can also look for the BOM EF BB BF, but don't rely on it.
UTF-16 can be detected by looking for the BOM.
UTF-32 can be detected by validation, or by the BOM.
Otherwise assume the ANSI code page.

Our codebase doesn't include any non-ASCII chars. I will try to grep for the BOM in files in our codebase. Thanks for the clarification.

Well that makes things a lot simpler. UTF-8 without non-ASCII chars is ASCII.

windows search unicode

Unicode is a standard, it is not an encoding. There are many encodings that implement Unicode, including UTF-8, UTF-16, UCS-2, and others. The translation of any of these encodings to ASCII depends entirely on what encoding your "different editors" use.

Some editors insert byte-order marks of BOMs at the start of Unicode files. If your editors do that, you can use them to detect the encoding.

ANSI is a standards body that has published several encodings for digital character data. The "ANSI" encoding used by MS DOS and supported in Windows is actually CP-1252, not an ANSI standard.

Does your codebase include non-ASCII characters? You may have better compatibility using a Unicode encoding rather than an ANSI one or CP-1252.

windows search unicode

Actually, if you want to find out in windows if a file is unicode, simply run findstr on the file for a string you know is in there.

findstr /I /C:"SomeKnownString" file.txt

It will come back empty. Then to be sure, run findstr on a letter or digit you know is in the file:

FindStr /I /C:"P" file.txt

You will probably get many occurrences and the key is that they will be spaced apart. This is a sign the file is unicode and not ascii.

Hope this helps.

CodeHunter

What's the best way to identify unicode encoded text files in Windows?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last