Improving Postgres psycopg2 query performance for Python to the same level of Java's JDBC driver Improving Postgres psycopg2 query performance for Python to the same level of Java's JDBC driver postgresql postgresql

Improving Postgres psycopg2 query performance for Python to the same level of Java's JDBC driver


This is not an answer out of the box, with all client/db stuff you may need to do some work to determine exactly what is amiss

backup postgresql.conf changing

log_min_duration_statement to 0 log_destination = 'csvlog'              # Valid values are combinations of      logging_collector = on                # Enable capturing of stderr and csvlog log_directory = 'pg_log'                # directory where log files are written,log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,        debug_print_parse = ondebug_print_rewritten = ondebug_print_plan output = onlog_min_messages = info (debug1 for all server versions prior to 8.4)

Stop and restart your database server ( reload may not pick up the changes )Reproduce your tests ensuring that the server time and client times match and that you record the start times etc.

copy the log file off an import into editor of your choice (excel or another spreadsheet can be useful for getting advance manipulation for sql & plans etc)

now examine the timings from the server side and note:

  • is the sql reported on the server the same in each case

  • if the same you should have the same timings

  • is the client generating a cursor rather than passing sql

  • is one driver doing a lot of casting/converting between character sets or implicit converting of other types such as dates or timestamps.

and so on

The plan data will be included for completeness, this may inform if there are gross differences in the SQL submitted by the clients.


The stuff below is probably aiming above and beyond what you have in mind or what is deemed acceptable in your environment, but I'll put the option on the table just in case.

  1. Is the destination of every SELECT in your test.sql truly a simple |-separated results file?
  2. Is non-portability (Postgres-specificity) acceptable?
  3. Is your backend Postgres 8.2 or newer?
  4. Will the script run on the same host as the database backend, or would it be acceptable to generate the |-separated results file(s) from within the backend (e.g. to a share?)

If the answer to all of the above questions is yes, then you can transform your SELECT ... statements to COPY ( SELECT ... ) TO E'path-to-results-file' WITH DELIMITER '|' NULL E'\\N'.


An alternative could be to use ODBC. This is assuming that Python ODBC driver performs well.

PostgreSQL has ODBC drivers for both Windows and Linux.