How to detect string byte encoding?
if your files either in cp1252
and utf-8
, then there is an easy way.
import loggingdef force_decode(string, codecs=['utf8', 'cp1252']): for i in codecs: try: return string.decode(i) except UnicodeDecodeError: pass logging.warn("cannot decode url %s" % ([string]))for item in os.listdir(rootPath): #Convert to Unicode if isinstance(item, str): item = force_decode(item) print item
otherwise, there is a charset detect lib.