Numpy.float64 changes when writing to Excel (.xlsx)

python pandas numpy xlsxwriter xlwt

I also tried this with Openpyxl (.xlsx format) and Xlwt (.xls format) as engines. While the former produced the same erroneous result as xlsxwriter, Xlwt was actually working as expected and wrote the float according to the exact variable value.

The difference is that .xls is a binary file format and the 64 bit representation of the the IEEE 754 double is written exactly to the file and can be read back to the same 64 bits.

The .xlsx file format however is a collection of text XML files in a zip container. As such doubles are written as a string representation of the double (using a format like '%.16g') and read in by converting that string representation back to a double. That is essentially a lossey process for doubles since there isn't an exact string representation for a vast majority of IEEE 754 numbers.

For example if you take the numpy number in your example and format it with different precisions you will get different representations:

>>> '%.16g' % f[0]'1054.487585785468'>>> '%.17g' % f[0]'1054.4875857854684'>>> '%.18g' % f[0]'1054.48758578546835'

You can also demonstrate this yourself by pasting 1054.4875857854684 into a cell in Excel, saving the file and examining the output:

So for a file like this:

You would get something like this:

$ unzip numpy.xlsx -d numpy$ xmllint --format numpy/xl/worksheets/sheet1.xml | grep 1054        <v>1054.4875857854599</v>

This is more or less what you are seeing when you read the file back in using Pandas.

python pandas numpy xlsxwriter xlwt

After some digging in Pandas and XlsxWriter, I essentially found two conversion steps from numpy.float64 to the .xlsx file:

1) numpy.float64 => float (no loss of fidelity) in pandas/io/excel.py

def _conv_value(val):    # Convert numpy types to Python types for the Excel writers.    if com.is_integer(val):        val = int(val)    elif com.is_float(val):        val = float(val)    elif com.is_bool(val):        val = bool(val)    elif isinstance(val, Period):        val = "%s" % val    elif com.is_list_like(val):        val = str(val)    return val

2) float => string (attr += ' %s="%s"' % (key, value)). this is where precision is changed (in xlswriter/xmlwriter.py)

def _xml_number_element(self, number, attributes=[]):    # Optimised tag writer for <c> cell number elements in the inner loop.    attr = ''    for key, value in attributes:        value = self._escape_attributes(value)        attr += ' %s="%s"' % (key, value)    self.fh.write("""<c%s><v>%.15g</v></c>""" % (attr, number))

So the serialisation (step 2) is where the the precision is changed. I guess, since xls is a binary format, the float would be directly written, without conversion.

CodeHunter

Numpy.float64 changes when writing to Excel (.xlsx)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last