encoding of file shell script
I'd just use
file -bi myfile.txt
to determine the character encoding of a particular file.
A solution with an external dependency but I suspect file
is very common nowadays among all semi-modern distro's.
EDIT:
As a response to Laurence Gonsalves' comment: b
is the option to be 'brief' (not include the filename) and i
is the shorthand equivalent of --mime
so the most portable way (including Mac OSX) then probably is:
file --mime myfile.txt
There's no way to be 100% certain (unless you're dealing with a file format that internally states its encoding).
Most tools that attempt to make this distinction will try and decode the file as utf-8 (as that's the more strict encoding), and if that fails, then fall back to iso-8859-1. You can do this with iconv
"by hand", or you can use file
:
$ file utf8.txtutf8.txt: UTF-8 Unicode text$ file latin1.txtlatin1.txt: ISO-8859 text
Note that ASCII files are both UTF-8 and ISO-8859-1 compatible.
$ file ascii.txtascii.txt: ASCII text
Finally: there's no real way to distinguish between ISO-8859-1 and ISO-8859-2, for example, unless you're going to assume it's natural language and use statistical methods. This is probably why file says "ISO-8859".