Append list of Python dictionaries to a file without loading it

python json pickle

You can use json to dump the dicts, one per line. Now each line is a single json dict that you've written. You loose the outer list, but you can add records with a simple append to the existing file.

import jsonimport osdef append_record(record):    with open('my_file', 'a') as f:        json.dump(record, f)        f.write(os.linesep)# demonstrate a program writing multiple recordsfor i in range(10):    my_dict = {'number':i}    append_record(my_dict)

The list can be assembled later

with open('my_file') as f:    my_list = [json.loads(line) for line in f]

The file looks like

{"number": 0}{"number": 1}{"number": 2}{"number": 3}{"number": 4}{"number": 5}{"number": 6}{"number": 7}{"number": 8}{"number": 9}

python json pickle

If it is required to keep the file being valid json, it can be done as follows:

import jsonwith open (filepath, mode="r+") as file:    file.seek(0,2)    position = file.tell() -1    file.seek(position)    file.write( ",{}]".format(json.dumps(dictionary)) )

This opens the file for both reading and writing. Then, it goes to the end of the file (zero bytes from the end) to find out the file end's position (relatively to the beginning of the file) and goes last one byte back, which in a json file is expected to represent character ]. In the end, it appends a new dictionary to the structure, overriding the last character of the file and keeping it to be valid json. It does not read the file into the memory. Tested with both ANSI and utf-8 encoded files in Python 3.4.3 with small and huge (5 GB) dummy files.

A variation, if you also have os module imported:

import os, jsonwith open (filepath, mode="r+") as file:    file.seek(os.stat(filepath).st_size -1)    file.write( ",{}]".format(json.dumps(dictionary)) )

It defines the byte length of the file to go to the position of one byte less (as in the previous example).

python json pickle

If you are looking to not actually load the file, going about this with json is not really the right approach. You could use a memory mapped file… and never actually load the file to memory -- a memmap array can open the file and build an array "on-disk" without loading anything into memory.

Create a memory-mapped array of dicts:

>>> import numpy as np>>> a = np.memmap('mydict.dat', dtype=object, mode='w+', shape=(4,))>>> a[0] = {'name':"Joe", 'data':[1,2,3,4]}>>> a[1] = {'name':"Guido", 'data':[1,3,3,5]}>>> a[2] = {'name':"Fernando", 'data':[4,2,6,9]}>>> a[3] = {'name':"Jill", 'data':[9,1,9,0]}>>> a.flush()>>> del a

Now read the array, without loading the file:

>>> a = np.memmap('mydict.dat', dtype=object, mode='r')

The contents of the file are loaded into memory when the list is created, but that's not required -- you can work with the array on-disk without loading it.

>>> a.tolist()[{'data': [1, 2, 3, 4], 'name': 'Joe'}, {'data': [1, 3, 3, 5], 'name': 'Guido'}, {'data': [4, 2, 6, 9], 'name': 'Fernando'}, {'data': [9, 1, 9, 0], 'name': 'Jill'}]

It takes a negligible amount of time (e.g. nanoseconds) to create a memory-mapped array that can index a file regardless of size (e.g. 100 GB) of the file.

CodeHunter

Append list of Python dictionaries to a file without loading it

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last