psycopg2: insert multiple rows with one query psycopg2: insert multiple rows with one query python python

psycopg2: insert multiple rows with one query


I built a program that inserts multiple lines to a server that was located in another city.

I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)cur.execute("INSERT INTO table VALUES " + args_str) 

and 2 minutes when using this method:

cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)


New execute_values method in Psycopg 2.7:

data = [(1,'x'), (2,'y')]insert_query = 'insert into t (a, b) values %s'psycopg2.extras.execute_values (    cursor, insert_query, data, template=None, page_size=100)

The pythonic way of doing it in Psycopg 2.6:

data = [(1,'x'), (2,'y')]records_list_template = ','.join(['%s'] * len(data))insert_query = 'insert into t (a, b) values {}'.format(records_list_template)cursor.execute(insert_query, data)

Explanation: If the data to be inserted is given as a list of tuples like in

data = [(1,'x'), (2,'y')]

then it is already in the exact required format as

  1. the values syntax of the insert clause expects a list of records as in

    insert into t (a, b) values (1, 'x'),(2, 'y')

  2. Psycopg adapts a Python tuple to a Postgresql record.

The only necessary work is to provide a records list template to be filled by psycopg

# We use the data list to be sure of the template lengthrecords_list_template = ','.join(['%s'] * len(data))

and place it in the insert query

insert_query = 'insert into t (a, b) values {}'.format(records_list_template)

Printing the insert_query outputs

insert into t (a, b) values %s,%s

Now to the usual Psycopg arguments substitution

cursor.execute(insert_query, data)

Or just testing what will be sent to the server

print (cursor.mogrify(insert_query, data).decode('utf8'))

Output:

insert into t (a, b) values (1, 'x'),(2, 'y')


Update with psycopg2 2.7:

The classic executemany() is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com

This implementation was added to psycopg2 in version 2.7 and is called execute_values():

from psycopg2.extras import execute_valuesexecute_values(cur,    "INSERT INTO test (id, v1, v2) VALUES %s",    [(1, 2, 3), (4, 5, 6), (7, 8, 9)])

Previous Answer:

To insert multiple rows, using the multirow VALUES syntax with execute() is about 10x faster than using psycopg2 executemany(). Indeed, executemany() just runs many individual INSERT statements.

@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify() returns bytes, cursor.execute() takes either bytes or strings, and ','.join() expects str instance.

So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8'):

args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)cur.execute("INSERT INTO table VALUES " + args_str)

Or by using bytes (with b'' or b"") only:

args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)cur.execute(b"INSERT INTO table VALUES " + args_bytes)