Numpy.float64 changes when writing to Excel (.xlsx) Numpy.float64 changes when writing to Excel (.xlsx) pandas pandas

Numpy.float64 changes when writing to Excel (.xlsx)


I also tried this with Openpyxl (.xlsx format) and Xlwt (.xls format) as engines. While the former produced the same erroneous result as xlsxwriter, Xlwt was actually working as expected and wrote the float according to the exact variable value.

The difference is that .xls is a binary file format and the 64 bit representation of the the IEEE 754 double is written exactly to the file and can be read back to the same 64 bits.

The .xlsx file format however is a collection of text XML files in a zip container. As such doubles are written as a string representation of the double (using a format like '%.16g') and read in by converting that string representation back to a double. That is essentially a lossey process for doubles since there isn't an exact string representation for a vast majority of IEEE 754 numbers.

For example if you take the numpy number in your example and format it with different precisions you will get different representations:

>>> '%.16g' % f[0]'1054.487585785468'>>> '%.17g' % f[0]'1054.4875857854684'>>> '%.18g' % f[0]'1054.48758578546835'

You can also demonstrate this yourself by pasting 1054.4875857854684 into a cell in Excel, saving the file and examining the output:

So for a file like this:

enter image description here

You would get something like this:

$ unzip numpy.xlsx -d numpy$ xmllint --format numpy/xl/worksheets/sheet1.xml | grep 1054        <v>1054.4875857854599</v>

This is more or less what you are seeing when you read the file back in using Pandas.


After some digging in Pandas and XlsxWriter, I essentially found two conversion steps from numpy.float64 to the .xlsx file:

1) numpy.float64 => float (no loss of fidelity) in pandas/io/excel.py

def _conv_value(val):    # Convert numpy types to Python types for the Excel writers.    if com.is_integer(val):        val = int(val)    elif com.is_float(val):        val = float(val)    elif com.is_bool(val):        val = bool(val)    elif isinstance(val, Period):        val = "%s" % val    elif com.is_list_like(val):        val = str(val)    return val

2) float => string (attr += ' %s="%s"' % (key, value)). this is where precision is changed (in xlswriter/xmlwriter.py)

def _xml_number_element(self, number, attributes=[]):    # Optimised tag writer for <c> cell number elements in the inner loop.    attr = ''    for key, value in attributes:        value = self._escape_attributes(value)        attr += ' %s="%s"' % (key, value)    self.fh.write("""<c%s><v>%.15g</v></c>""" % (attr, number))

So the serialisation (step 2) is where the the precision is changed. I guess, since xls is a binary format, the float would be directly written, without conversion.