MemoryError when loading a JSON file MemoryError when loading a JSON file json json

MemoryError when loading a JSON file


500MB of JSON data does not result in 500MB of memory usage. It will result in a multiple of that. Exactly by what factor depends on the data, but a factor of 10 - 25 is not uncommon.

For example, the following simple JSON string of 14 characters (bytes on disk) results in a Python object is almost 25 times larger (Python 3.6b3):

>>> import json>>> from sys import getsizeof>>> j = '{"foo": "bar"}'>>> len(j)14>>> p = json.loads(j)>>> getsizeof(p) + sum(getsizeof(k) + getsizeof(v) for k, v in p.items())344>>> 344 / 1424.571428571428573

That's because Python objects require some overhead; instances track the number of references to them, what type they are, and their attributes (if the type supports attributes) or their contents (in the case of containers).

If you are using the json built-in library to load that file, it'll have to build larger and larger objects from the contents as they are parsed, and at some point your OS will refuse to provide more memory. That won't be at 32GB, because there is a limit per process how much memory can be used, so more likely to be at 4GB. At that point all those objects already created are freed again, so in the end the actual memory use doesn't have to have changed that much.

The solution is to either break up that large JSON file into smaller subsets, or to use an event-driven JSON parser like ijson.

An event-driven JSON parser doesn't create Python objects for the whole file, only for the currently parsed item, and notifies your code for each item it created with an event (like 'starting an array, here is a string, now starting a mapping, this is the end of the mapping, etc.). You can then decide what data you need and keep, and what to ignore. Anything you ignore is discarded again and memory use is kept low.


So, I will explain how I finally solved this problem. The first answer will work. But you have to know that loading elements one per one with ijson will be very long... and by the end, you do not have the loaded file.

So, the important information is that windows limit your memory per process to 2 or 4 GB, depending on wich windows you use (32 or 64). If you use pythonxy, that will be 2 GB (it only exists in 32). Anyway, in both way, that's very very low!

I solved this problem by installing a virtual Linux in my windows, and it works. Here are the main step to do so:

  1. Install Virtual Box
  2. Install Ubuntu (for exemple)
  3. Install python for scientist on your computer, like SciPy
  4. Create a share file between the 2 "computers" (you will find tutorial on google)
  5. Execute your code on your ubuntu "computer": it sould work ;)

NB: Do not forget to allow sufficient RAM and memory to you virtual computer.

This works for me. I don't have anymore this "memory error" problem.

I post here this asnwer from there.