ExcelFile Vs. read_excel in pandas

python excel pandas

There's no particular difference beyond the syntax. Technically, ExcelFile is a class and read_excel is a function. In either case, the actual parsing is handled by the _parse_excel method defined within ExcelFile.

In earlier versions of pandas, read_excel consisted entirely of a single statement (other than comments):

return ExcelFile(path_or_buf,kind=kind).parse(sheetname=sheetname,                                              kind=kind, **kwds)

And ExcelFile.parse didn't do much more than call ExcelFile._parse_excel.

In recent versions of pandas, read_excel ensures that it has an ExcelFile object (and creates one if it doesn't), and then calls the _parse_excel method directly:

if not isinstance(io, ExcelFile):    io = ExcelFile(io, engine=engine)return io._parse_excel(...)

and with the updated (and unified) parameter handling, ExcelFile.parse really is just the single statement:

return self._parse_excel(...)

That is why the docs for ExcelFile.parse now say

Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

As for another answer which claims that ExcelFile.parse is faster in a loop, that really just comes down to whether you are creating the ExcelFile object from scratch every time. You could certainly create your ExcelFile once, outside the loop, and pass that to read_excel inside your loop:

xl = pd.ExcelFile(path)for name in xl.sheet_names:    df = pd.read_excel(xl, name)

This would be equivalent to

xl = pd.ExcelFile(path)for name in xl.sheet_names:    df = xl.parse(name)

If your loop involves different paths (in other words, you are reading many different workbooks, not just multiple sheets within a single workbook), then you can't get around having to create a brand-new ExcelFile instance for each path anyway, and then once again, both ExcelFile.parse and read_excel will be equivalent (and equally slow).

python excel pandas

ExcelFile.parse is faster.

Suppose you are reading dataframes in a loop.With ExcelFile.parse you just pass the Excelfile object(xl in your case). So the excel sheet is just loaded once and you use this to get your dataframes.In case of Read_Excel you pass the path instead of Excelfile object. So essentially every time the workbook is loaded again. Makes a mess if your workbook has loads of sheets and tens of thousands of rows.

python excel pandas

I believe Pandas first implementation of excel used the two step process, but then added the one step process called read_excel. Probably left the first one in because folks were already using it

CodeHunter

ExcelFile Vs. read_excel in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last