Efficient looping through large JSON-Files
Well don't call response.json()
over and over and over again unnecessarily.
Instead of
for observation in response.json()['data']: fullGroupName = response.json()['full_name']
do
data = response.json() for observation in data['data']: fullGroupName = data['full_name']
After this change the whole thing takes my PC about 33 seconds. And pretty much all of that is for the requests. Maybe you could speed that up further by using parallel requests if that's ok for the site.
Although Stefan Pochmann has already answered your question, I think it's worth to mention how you could have figured out what the problem is for yourself.
One way would be to use a profiler, for example Python's cProfile, which is included in the standard library.
Assuming that your script is called slow_download.py
, you can limit the range in your loop to, for example, range(32, 33)
and execute it in the following way:
python3 -m cProfile -s cumtime slow_download.py
The -s cumtime
sorts the calls by cumulative time.
The result would be:
http://dw.euro.who.int/api/v3/data_sets/HFAMDB/HFAMDB_832 222056 function calls (219492 primitive calls) in 395.444 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 122/1 0.005 0.000 395.444 395.444 {built-in method builtins.exec} 1 49.771 49.771 395.444 395.444 py2.py:1(<module>) 9010 0.111 0.000 343.904 0.038 models.py:782(json) 9010 0.078 0.000 332.900 0.037 __init__.py:271(loads) 9010 0.091 0.000 332.801 0.037 decoder.py:334(decode) 9010 332.607 0.037 332.607 0.037 decoder.py:345(raw_decode) ...
This clearly suggests that the problem is related to json()
and related methods: loads()
and raw_decode()
.