How do I make a progress bar for loading pandas DataFrame from a large xlsx file?
Will not work. pd.read_excel
blocks until the file is read, and there is no way to get information from this function about its progress during execution.
It would work for read operations which you can do chunk wise, like
chunks = []for chunk in pd.read_csv(..., chunksize=1000): update_progressbar() chunks.append(chunk)
But as far as I understand tqdm
also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....
DISCLAIMER: This works only with xlrd
engine and is not thoroughly tested!
How it works? We monkey-patch xlrd.xlsx.X12Sheet.own_process_stream
method that is responsible to load sheets from file-like stream. We supply own stream, that contains our progress bar. Each sheet has it's own progress bar.
When we want the progress bar, we use load_with_progressbar()
context manager and then do pd.read_excel('<FILE.xlsx>')
.
import xlrdfrom tqdm import tqdmfrom io import RawIOBasefrom contextlib import contextmanagerclass progress_reader(RawIOBase): def __init__(self, zf, bar): self.bar = bar self.zf = zf def readinto(self, b): n = self.zf.readinto(b) self.bar.update(n=n) return n@contextmanagerdef load_with_progressbar(): def my_get_sheet(self, zf, *other, **kwargs): with tqdm(total=zf._orig_file_size) as bar: sheet = _tmp(self, progress_reader(zf, bar), **kwargs) return sheet _tmp = xlrd.xlsx.X12Sheet.own_process_stream try: xlrd.xlsx.X12Sheet.own_process_stream = my_get_sheet yield finally: xlrd.xlsx.X12Sheet.own_process_stream = _tmpimport pandas as pdwith load_with_progressbar(): df = pd.read_excel('sample2.xlsx')print(df)
Screenshot of progress bar:
This might help for people with similar problem.here you can get help
for example:
for i in tqdm(range(0,3), ncols = 100, desc ="Loading data.."): df=pd.read_excel("some_file.xlsx",header=None) LC_data=pd.read_excel("some_file.xlsx",'Sheet1', header=None) FC_data=pd.read_excel("some_file.xlsx",'Shee2', header=None) print("------Loading is completed ------")