Parsing HTTP Response in Python
When I printed response.read()
I noticed that b
was preprended to the string (e.g. b'{"a":1,..
). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string')
, I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8')
. Once it was in a string type my problem was solved and I was easily able to iterate over the dict
.
I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:
from urllib.request import urlopenimport json# Get the dataseturl = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'response = urlopen(url)# Convert bytes to string type and string type to dictstring = response.read().decode('utf-8')json_obj = json.loads(string)print(json_obj['source_name']) # prints the string with 'source_name' key
You can also use python's requests library instead.
import requestsurl = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json' response = requests.get(url) dict = response.json()
Now you can manipulate the "dict" like a python dictionary.
json
works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8')
gets your the character encoding:
#!/usr/bin/env python3import ioimport jsonfrom urllib.request import urlopenwith urlopen('https://httpbin.org/get') as r, \ io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file: result = json.load(file)print(result['headers']['User-Agent'])
It is not necessary to use io.TextIOWrapper
here:
#!/usr/bin/env python3import jsonfrom urllib.request import urlopenwith urlopen('https://httpbin.org/get') as r: result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))print(result['headers']['User-Agent'])