How do I read a large csv file with pandas?

python pandas csv memory chunks

The error shows that the machine does not have enough memory to read the entireCSV into a DataFrame at one time. Assuming you do not need the entire dataset inmemory all at one time, one way to avoid the problem would be to process the CSV inchunks (by specifying the chunksize parameter):

chunksize = 10 ** 6for chunk in pd.read_csv(filename, chunksize=chunksize):    process(chunk)

The chunksize parameter specifies the number of rows per chunk.(The last chunk may contain fewer than chunksize rows, of course.)

pandas >= 1.2

read_csv with chunksize returns a context manager, to be used like so:

chunksize = 10 ** 6with pd.read_csv(filename, chunksize=chunksize) as reader:    for chunk in reader:        process(chunk)

See GH38225

python pandas csv memory chunks

Chunking shouldn't always be the first port of call for this problem.

Is the file large due to repeated non-numeric data or unwanted columns?
If so, you can sometimes see massive memory savings by reading in columns as categories and selecting required columns via pd.read_csv usecols parameter.
Does your workflow require slicing, manipulating, exporting?
If so, you can use dask.dataframe to slice, perform your calculations and export iteratively. Chunking is performed silently by dask, which also supports a subset of pandas API.
If all else fails, read line by line via chunks.
Chunk via pandas or via csv library as a last resort.

python pandas csv memory chunks

For large data l recommend you use the library "dask"
e.g:

# Dataframes implement the Pandas APIimport dask.dataframe as dddf = dd.read_csv('s3://.../2018-*-*.csv')

You can read more from the documentation here.

Another great alternative would be to use modin because all the functionality is identical to pandas yet it leverages on distributed dataframe libraries such as dask.

CodeHunter

How do I read a large csv file with pandas?

pandas >= 1.2

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last