Convert charset from a entire project to utf-8 Convert charset from a entire project to utf-8 unix unix

Convert charset from a entire project to utf-8


In your project's root directory, use find(1) to list all *.php files and combine that with recode(1) to convert those files in place:

find . -type f -name '*.php' -exec recode windows1252..utf8 \{} \;

As an alternative to recode(1), you could also use iconv(1) to do the conversion (for usage with above find command: iconv -f windows-1252 -t utf-8 -o \{} \{}).

You need to have either recode or iconv installed for the above to work. Both should be easily installable via a package manager on most modern systems.


To convert a single file using Python (since I was asked...)

import codecswith codecs.open(filename_in, 'r', 'windows-1252') as fin:    with codecs.open(filename_out, 'w', 'utf-8') as fout:        for line in fin:            fout.write(line)

It is also possible to encode to utf-8 directly into a string without writing it to a file:

utf8_line = line.encode('utf-8')


I had a similar case but where all files were not encoded in ISO-8859. Some were encoded in ASCII or UTF-8. Using a bare find ... -exec iconv ... screwed up my git repo and I had to reclone it.

Here is what I used to avoid wrong conversions:

for f in $(find . -type f); do file $f | grep -q ISO-8859 && iconv -f ISO-8859-1 -t UTF-8 -o $f $f; done