Exporting ints with missing values to csv in Pandas

csv pandas int nan missing-data

Using float_format = '%.12g' inside the to_csv function solved a similar problem for me. It keeps the decimals for legitimate floats with up to 12 significant digits, but drops them for ints being forced to floats by the presence of NaN's:

In [4]: dfOut[4]:      a    bi_1  1    2.0i_2  3    NaNi_3  5.9  6.0In [5]: df.to_csv('file.csv', float_format = '%.12g')

Output is:

   , a,  bi_1, 1,  2i_2, 3, i_3, 5.9, 6

csv pandas int nan missing-data

This snippet does what you want and should be relatively efficient at doing it.

import numpy as npimport pandas as pdEPSILON = 1e-9def _lost_precision(s):    """    The total amount of precision lost over Series `s`    during conversion to int64 dtype    """    try:        return (s - s.fillna(0).astype(np.int64)).sum()    except ValueError:        return np.nandef _nansafe_integer_convert(s):    """    Convert Series `s` to an object type with `np.nan`    represented as an empty string ""    """    if _lost_precision(s) < EPSILON:        # Here's where the magic happens        as_object = s.fillna(0).astype(np.int64).astype(np.object)        as_object[s.isnull()] = ""        return as_object    else:        return sdef nansafe_to_csv(df, *args, **kwargs):    """    Write `df` to a csv file, allowing for missing values    in integer columns    Uses `_lost_precision` to test whether a column can be    converted to an integer data type without losing precision.    Missing values in integer columns are represented as empty    fields in the resulting csv.    """    df.apply(_nansafe_integer_convert).to_csv(*args, **kwargs)

We can test this with a simple DataFrame which should cover all bases:

In [75]: df = pd.DataFrame([[1,2, 3.1, "i"],[3,np.nan, 4.0, "j"],[5,6, 7.1, "k"]]                  columns=["a","b", "c", "d"],                  index=["i_1","i_2","i_3"])In [76]: dfOut[76]:      a   b    c  di_1  1   2  3.1  ii_2  3 NaN  4.0  ji_3  5   6  7.1  kIn [77]: nansafe_to_csv(df, 'deleteme.csv', index=False)

Which produces the following csv file:

a,b,c,d1,2,3.1,i3,,4.0,j5,6,7.1,k

csv pandas int nan missing-data

I'm expanding the sample data here to hopefully make sure this is handling the situations you are dealing with:

df = pd.DataFrame([[1.1,2,9.9,44,1.0],                   [3.3,np.nan,4.4,22,3.0],                   [5.5,8,np.nan,66,4.0]],                  columns=list('abcde'),                  index=["i_1","i_2","i_3"])       a   b    c   d  ei_1  1.1   2  9.9  44  1i_2  3.3 NaN  4.4  22  3i_3  5.5   8  NaN  66  4df.dtypesa    float64b    float64c    float64d      int64e    float64

I think if you want a general solution, it's going to have to be explicitly coded due to pandas not allowing NaNs in int columns. What I do below here is check for integers values (since we can't really check the type as they will have been recast to float if they contain NaNs), and if it's an integer value then convert to a string format and also convert 'NAN' to '' (empty). Of course, this is not how you want to store the integers except as a final step before outputting.

for col in df.columns:    if any( df[col].isnull() ):        tmp = df[col][ df[col].notnull() ]        if all( tmp.astype(int).astype(float) == tmp.astype(float) ):            df[col] = df[col].map('{:.0F}'.format).replace('NAN','')df.to_csv('x.csv')

Here's the output file and also what it looks like if you read it back into pandas although the purpose of this is presumably to read it into other numerical packages.

%more x.csv,a,b,c,d,ei_1,1.1,2,9.9,44,1.0i_2,3.3,,4.4,22,3.0i_3,5.5,8,,66,4.0pd.read_csv('x.csv')  Unnamed: 0    a   b    c   d  e0        i_1  1.1   2  9.9  44  11        i_2  3.3 NaN  4.4  22  32        i_3  5.5   8  NaN  66  4

CodeHunter

Exporting ints with missing values to csv in Pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last