Convert charset from a entire project to utf-8
In your project's root directory, use find(1) to list all *.php
files and combine that with recode(1) to convert those files in place:
find . -type f -name '*.php' -exec recode windows1252..utf8 \{} \;
As an alternative to recode(1), you could also use iconv(1) to do the conversion (for usage with above find
command: iconv -f windows-1252 -t utf-8 -o \{} \{}
).
You need to have either recode or iconv installed for the above to work. Both should be easily installable via a package manager on most modern systems.
To convert a single file using Python (since I was asked...)
import codecswith codecs.open(filename_in, 'r', 'windows-1252') as fin: with codecs.open(filename_out, 'w', 'utf-8') as fout: for line in fin: fout.write(line)
It is also possible to encode to utf-8 directly into a string without writing it to a file:
utf8_line = line.encode('utf-8')
I had a similar case but where all files were not encoded in ISO-8859. Some were encoded in ASCII or UTF-8. Using a bare find ... -exec iconv ...
screwed up my git repo and I had to reclone it.
Here is what I used to avoid wrong conversions:
for f in $(find . -type f); do file $f | grep -q ISO-8859 && iconv -f ISO-8859-1 -t UTF-8 -o $f $f; done