"for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte
As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was
"ISO-8859-1", so replacing
open("u.item", encoding="utf-8") with
open('u.item', encoding = "ISO-8859-1") will solve the problem.
The following also worked for me. ISO 8859-1 is going to save a lot, mainly if using Speech Recognition APIs.
file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1")
Your file doesn't actually contain UTF-8 encoded data; it contains some other encoding. Figure out what that encoding is and use it in the
In Windows-1252 encoding, for example, the
0xe9 would be the character