How to remove all of the diacritics from a file?
If you check the man page of the tool iconv
:
//TRANSLIT
When the string "//TRANSLIT" is appended to --to-code, transliteration is activated. This means that when a character cannot be represented in thetarget character set, it can be approximated through one or several similarly looking characters.
so we could do :
kent$ cat test1 Replace ā, á, ǎ, and à with a. Replace ē, é, ě, and è with e. Replace ī, í, ǐ, and ì with i. Replace ō, ó, ǒ, and ò with o. Replace ū, ú, ǔ, and ù with u. Replace ǖ, ǘ, ǚ, and ǜ with ü. Replace Ā, Á, Ǎ, and À with A. Replace Ē, É, Ě, and È with E. Replace Ī, Í, Ǐ, and Ì with I. Replace Ō, Ó, Ǒ, and Ò with O. Replace Ū, Ú, Ǔ, and Ù with U. Replace Ǖ, Ǘ, Ǚ, and Ǜ with U.kent$ iconv -f utf8 -t ascii//TRANSLIT test1 Replace a, a, a, and a with a. Replace e, e, e, and e with e. Replace i, i, i, and i with i. Replace o, o, o, and o with o. Replace u, u, u, and u with u. Replace u, u, u, and u with u. Replace A, A, A, and A with A. Replace E, E, E, and E with E. Replace I, I, I, and I with I. Replace O, O, O, and O with O. Replace U, U, U, and U with U. Replace U, U, U, and U with U.