Is there a chunksize argument for read_excel in pandas? [duplicate]

python pandas tqdm

If you want to add a progress indicator, you could use the .tell() method of file objects. That's of course not quite accurate, but maybe it gives your users enough accuracy to estimate, how long a coffee break they can make :-)

So here is the plan: basically open your excel file with open and pass the result object to pd.read_excel. According to the docs, this should be possible, and I just verified it with a simple example for an xlsx file.

At the beginning you evaluate the size of the file. E.g by:

import iofp.seek(0, io.SEEK_END) # set the file cursor to the end of the filefp_len= fp.tell()fp.seek(0, io.SEEK_SET) # set the file cursor back to the beginning of the file

With this setup, you have two possibilities:

Either you create a thread, that updates the progress bar from time to time by calling fp.tell() on the file object, you opened for the xlsx file, or
create your own wrapper, that provides the methods, pandas needs to read the data (at least a read method) and updates the progress bar synchronously, so you don't need am extra thread. Your class would just need to pass the method calls to the actual file class. In that sense you could compare it with a proxy object.

I have to admit, that 2 is kind of dirty. But I'm convinced that both methods would work, because I just verified, that pd.read_excel really can read from a file object (io.BufferedReader), also xlsx files, which are afaik zipped files. This method would just not be so accurate, because the file pointer might not move linear with time depending on things like fluctuations in the compression rate (some parts of the file might be compressable with a higher rate, than others).

python pandas tqdm

The best you can do is use pandas.read_excel with the skiprows (skips rows from the top of the file) and skip_footer (skips rows from the bottom) arguments. This however will load the whole file to memory first and then parse the required rows only.

python pandas tqdm

That parameter was there but it never did anything, so it was removed. See this issue on github.

You need to take a different approach to do that, as the others have pointed out.

CodeHunter

Is there a chunksize argument for read_excel in pandas? [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last