Reading utf-8 characters from a gzip file in python
I don't see why this should be so hard.
What are you doing exactly? Please explain "eventually it reads an invalid character".
It should be as simple as:
import gzipfp = gzip.open('foo.gz')contents = fp.read() # contents now has the uncompressed bytes of foo.gzfp.close()u_str = contents.decode('utf-8') # u_str is now a unicode string
EDITED
This answer works for Python2
in Python3
, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses the rt
mode for gzip.open
.