How to remove non UTF-8 characters from text file How to remove non UTF-8 characters from text file bash bash

How to remove non UTF-8 characters from text file


This command:

iconv -f utf-8 -t utf-8 -c file.txt

will clean up your UTF-8 file, skipping all the invalid characters.

-f is the source format-t the target format-c skips any invalid sequence


Your method must read byte by byte and fully understand and appreciate the byte wise construction of characters. The simplest method is to use an editor which will read anything but only output UTF-8 characters. Textpad is one choice.