Is it possible to get an Excel document's row count without loading the entire document into memory?

python openpyxl

Adding on to what Hubro said, apparently get_highest_row() has been deprecated. Using the max_row and max_column properties returns the row and column count. For example:

    wb = load_workbook(path, use_iterators=True)    sheet = wb.worksheets[0]    row_count = sheet.max_row    column_count = sheet.max_column

python openpyxl

The solution suggested in this answer has been deprecated, and might no longer work.

Taking a look at the source code of OpenPyXL (IterableWorksheet) I've figured out how to get the column and row count from an iterator worksheet:

wb = load_workbook(path, use_iterators=True)sheet = wb.worksheets[0]row_count = sheet.get_highest_row() - 1column_count = letter_to_index(sheet.get_highest_column()) + 1

IterableWorksheet.get_highest_column returns a string with the column letter that you can see in Excel, e.g. "A", "B", "C" etc. Therefore I've also written a function to translate the column letter to a zero based index:

def letter_to_index(letter):    """Converts a column letter, e.g. "A", "B", "AA", "BC" etc. to a zero based    column index.    A becomes 0, B becomes 1, Z becomes 25, AA becomes 26 etc.    Args:        letter (str): The column index letter.    Returns:        The column index as an integer.    """    letter = letter.upper()    result = 0    for index, char in enumerate(reversed(letter)):        # Get the ASCII number of the letter and subtract 64 so that A        # corresponds to 1.        num = ord(char) - 64        # Multiply the number with 26 to the power of `index` to get the correct        # value of the letter based on it's index in the string.        final_num = (26 ** index) * num        result += final_num    # Subtract 1 from the result to make it zero-based before returning.    return result - 1

I still haven't figured out how to get the column sizes though, so I've decided to use a fixed-width font and automatically scaled columns in my application.

python openpyxl

This might be extremely convoluted and I might be missing the obvious, but without OpenPyXL filling in the column_dimensions in Iterable Worksheets (see my comment above), the only way I can see of finding the column size without loading everything is to parse the xml directly:

from xml.etree.ElementTree import iterparsefrom openpyxl import load_workbookwb=load_workbook("/path/to/workbook.xlsx", use_iterators=True)ws=wb.worksheets[0]xml = ws._xml_sourcexml.seek(0)for _,x in iterparse(xml):    name= x.tag.split("}")[-1]    if name=="col":        print "Column %(max)s: Width: %(width)s"%x.attrib # width = x.attrib["width"]    if name=="cols":        print "break before reading the rest of the file"        break

CodeHunter

Is it possible to get an Excel document's row count without loading the entire document into memory?

The solution suggested in this answer has been deprecated, and might no longer work.

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last