Is there a chunksize argument for read_excel in pandas? [duplicate] Is there a chunksize argument for read_excel in pandas? [duplicate] pandas pandas

Is there a chunksize argument for read_excel in pandas? [duplicate]


If you want to add a progress indicator, you could use the .tell() method of file objects. That's of course not quite accurate, but maybe it gives your users enough accuracy to estimate, how long a coffee break they can make :-)

So here is the plan: basically open your excel file with open and pass the result object to pd.read_excel. According to the docs, this should be possible, and I just verified it with a simple example for an xlsx file.

At the beginning you evaluate the size of the file. E.g by:

import iofp.seek(0, io.SEEK_END) # set the file cursor to the end of the filefp_len= fp.tell()fp.seek(0, io.SEEK_SET) # set the file cursor back to the beginning of the file

With this setup, you have two possibilities:

  1. Either you create a thread, that updates the progress bar from time to time by calling fp.tell() on the file object, you opened for the xlsx file, or
  2. create your own wrapper, that provides the methods, pandas needs to read the data (at least a read method) and updates the progress bar synchronously, so you don't need am extra thread. Your class would just need to pass the method calls to the actual file class. In that sense you could compare it with a proxy object.

I have to admit, that 2 is kind of dirty. But I'm convinced that both methods would work, because I just verified, that pd.read_excel really can read from a file object (io.BufferedReader), also xlsx files, which are afaik zipped files. This method would just not be so accurate, because the file pointer might not move linear with time depending on things like fluctuations in the compression rate (some parts of the file might be compressable with a higher rate, than others).


The best you can do is use pandas.read_excel with the skiprows (skips rows from the top of the file) and skip_footer (skips rows from the bottom) arguments. This however will load the whole file to memory first and then parse the required rows only.


That parameter was there but it never did anything, so it was removed. See this issue on github.

You need to take a different approach to do that, as the others have pointed out.