R tm package invalid input in 'utf8towcs' R tm package invalid input in 'utf8towcs' r r

R tm package invalid input in 'utf8towcs'


None of the above answers worked for me. The only way to work around this problem was to remove all non graphical characters (http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html).

The code is this simple

usableText=str_replace_all(tweets$text,"[^[:graph:]]", " ") 


This is from the tm faq:

it will replace non-convertible bytes in yourCorpus with strings showing their hex codes.

I hope this helps, for me it does.

tm_map(yourCorpus, function(x) iconv(enc2utf8(x), sub = "byte"))

http://tm.r-forge.r-project.org/faq.html


I think it is clear by now that the problem is because of the emojis that tolower is not able to understand

#to remove emojisdataSet <- iconv(dataSet, 'UTF-8', 'ASCII')