How to create a large pandas dataframe from an sql query without running out of memory?

python sql pandas bigdata

As mentioned in a comment, starting from pandas 0.15, you have a chunksize option in read_sql to read and process the query chunk by chunk:

sql = "SELECT * FROM My_Table"for chunk in pd.read_sql_query(sql , engine, chunksize=5):    print(chunk)

Reference: http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#querying

python sql pandas bigdata

Update: Make sure to check out the answer below, as Pandas now has built-in support for chunked loading.

You could simply try to read the input table chunk-wise and assemble your full dataframe from the individual pieces afterwards, like this:

import pandas as pdimport pandas.io.sql as psqlchunk_size = 10000offset = 0dfs = []while True:  sql = "SELECT * FROM MyTable limit %d offset %d order by ID" % (chunk_size,offset)   dfs.append(psql.read_frame(sql, cnxn))  offset += chunk_size  if len(dfs[-1]) < chunk_size:    breakfull_df = pd.concat(dfs)

It might also be possible that the whole dataframe is simply too large to fit in memory, in that case you will have no other option than to restrict the number of rows or columns you're selecting.

python sql pandas bigdata

Code solution and remarks.

# Create empty listdfl = []  # Create empty dataframedfs = pd.DataFrame()  # Start Chunkingfor chunk in pd.read_sql(query, con=conct, ,chunksize=10000000):    # Start Appending Data Chunks from SQL Result set into List    dfl.append(chunk)# Start appending data from list to dataframedfs = pd.concat(dfl, ignore_index=True)

However, my memory analysis tells me that even though the memory is released after each chunk is extracted, the list is growing bigger and bigger and occupying that memory resulting in a net net no gain on free RAM.

Would love to hear what the author / others have to say.

CodeHunter

How to create a large pandas dataframe from an sql query without running out of memory?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last