How to replace all '0xa0' chars with a ' ' in a bunch of text files? How to replace all '0xa0' chars with a ' ' in a bunch of text files? shell shell

How to replace all '0xa0' chars with a ' ' in a bunch of text files?


OK, first point: your output file is set to automatically encode text written to it as utf-8, so don't include an explicit encode('utf-8') method call when passing arguments to the write() method.

So the first thing to try is to simply use the following in your inner loop:

writer.write(line)

If that doesn't work, then the problem is almost certainly the fact that, as others have noted, you aren't decoding your input file properly.

Taking a wild guess and assuming that your input files are encoded in cp1252, you could try as a quick test the following in the inner loop:

for line in codecs.open(infile, 'r', 'cp1252'):    writer.write(line)

Minor point: 'wtr' is a nonsensical mode string (as write access implies read access). Simplify it to either 'wt' or even just 'w'.


Did you omit some code there? You're reading into line but trying to re-encode line2.

In any case, you're going to have to tell Python what encoding the input file is; if you don't know, then you'll have to open it raw and perform substitutions without help of a codec.


Please be serious - a simple replace() operation will do the job:

line = line.replace(chr(0xa0), '')

In addition the codecs.open() constructors support the 'errors' parameter to handle conversion errors. Please read up (yourself).