How can I find encoding of a file via a script on Linux? How can I find encoding of a file via a script on Linux? unix unix

How can I find encoding of a file via a script on Linux?


It sounds like you're looking for enca. It can guess and even convert between encodings. Just look at the man page.

Or, failing that, use file -i (Linux) or file -I (OS X). That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)


file -bi <file name>

If you like to do this for a bunch of files

for f in `find | egrep -v Eliminate`; do echo "$f" ' -- ' `file -bi "$f"` ; done


uchardet - An encoding detector library ported from Mozilla.

Usage:

~> uchardet file.javaUTF-8

Various Linux distributions (Debian, Ubuntu, openSUSE, Pacman, etc.) provide binaries.