Reading utf-8 characters from a gzip file in python

This is possible since Python 3.3:

import gzipgzip.open('file.gz', 'rt', encoding='utf-8')

Notice that gzip.open() requires you to explicitly specify text mode ('t').

python file-io utf-8 gzip

I don't see why this should be so hard.

What are you doing exactly? Please explain "eventually it reads an invalid character".

It should be as simple as:

import gzipfp = gzip.open('foo.gz')contents = fp.read() # contents now has the uncompressed bytes of foo.gzfp.close()u_str = contents.decode('utf-8') # u_str is now a unicode string

EDITED

This answer works for Python2 in Python3, please see @SeppoEnarvi 's answer at https://stackoverflow.com/a/19794943/610569 (it uses the rt mode for gzip.open.

python file-io utf-8 gzip

Maybe

import codecszf = gzip.open(fname, 'rb')reader = codecs.getreader("utf-8")contents = reader( zf )for line in contents:    pass

CodeHunter

Reading utf-8 characters from a gzip file in python

EDITED

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last