Efficient looping through large JSON-Files Efficient looping through large JSON-Files json json

Efficient looping through large JSON-Files


Well don't call response.json() over and over and over again unnecessarily.

Instead of

  for observation in response.json()['data']:      fullGroupName = response.json()['full_name']

do

  data = response.json()  for observation in data['data']:      fullGroupName = data['full_name']

After this change the whole thing takes my PC about 33 seconds. And pretty much all of that is for the requests. Maybe you could speed that up further by using parallel requests if that's ok for the site.


Although Stefan Pochmann has already answered your question, I think it's worth to mention how you could have figured out what the problem is for yourself.

One way would be to use a profiler, for example Python's cProfile, which is included in the standard library.

Assuming that your script is called slow_download.py, you can limit the range in your loop to, for example, range(32, 33) and execute it in the following way:

python3 -m cProfile -s cumtime slow_download.py

The -s cumtime sorts the calls by cumulative time.

The result would be:

   http://dw.euro.who.int/api/v3/data_sets/HFAMDB/HFAMDB_832          222056 function calls (219492 primitive calls) in 395.444 seconds   Ordered by: cumulative time   ncalls  tottime  percall  cumtime  percall filename:lineno(function)    122/1    0.005    0.000  395.444  395.444 {built-in method builtins.exec}        1   49.771   49.771  395.444  395.444 py2.py:1(<module>)     9010    0.111    0.000  343.904    0.038 models.py:782(json)     9010    0.078    0.000  332.900    0.037 __init__.py:271(loads)     9010    0.091    0.000  332.801    0.037 decoder.py:334(decode)     9010  332.607    0.037  332.607    0.037 decoder.py:345(raw_decode)     ...

This clearly suggests that the problem is related to json() and related methods: loads() and raw_decode().


If the data is really large, dump the data in mongodb, and query whatever you want efficiently.