How to stop Pandas DataFrame from converting int to float for no reason? How to stop Pandas DataFrame from converting int to float for no reason? pandas pandas

How to stop Pandas DataFrame from converting int to float for no reason?


df.loc["rowX"] = int(0) will work and solves the problem posed in the question. df.loc["rowX",:] = int(0) does not work. That is a surprise.

df.loc["rowX"] = int(0) provides the ability to populate an empty dataframe while preserving the desired dtype. But one can do so for an entire row at a time.

df.loc["rowX"] = [np.int64(0), np.int64(1)] works.

.loc[] is appropriate for label based assignment per https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html. Note: the 0.24 doc does not depict .loc[] for inserting new rows.

The doc shows use of .loc[] to add rows by assignment in a column sensitive way. But does so where the DataFrame is populated with data.

But it gets weird when slicing on the empty frame.

import pandas as pdimport numpy as npimport sysprint(sys.version)print(pd.__version__)print("int dtypes preserved")# append on populated DataFramedf = pd.DataFrame([[0, 0], [1,1]], index=['a', 'b'], columns=["col1", "col2"])df.loc["c"] = np.int64(0)# slice existing rowsdf.loc["a":"c"] = np.int64(1)df.loc["a":"c", "col1":"col2":1] = np.int64(2)print(df.dtypes)# no selection AND no data, remains np.int64 if defined as suchdf = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)df.loc[:, "col1":"col2":1] = np.int64(0)df.loc[:,:] = np.int64(0)print(df.dtypes)# and works if no index but datadf = pd.DataFrame([[0, 0], [1,1]], columns=["col1", "col2"])df.loc[:,"col1":"col2":1] = np.int64(0)print(df.dtypes)# the surprise... label based insertion for the entire row does not convert to floatdf = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)df.loc["a"] = np.int64(0)print(df.dtypes)# a surprise because referring to all columns, as above, does convert to floatprint("unexpectedly converted to float dtypes")df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)df.loc["a", "col1":"col2"] = np.int64(0)print(df.dtypes)
3.7.2 (default, Mar 19 2019, 10:33:22) [Clang 10.0.0 (clang-1000.11.45.5)]0.24.2int dtypes preservedcol1    int64col2    int64dtype: objectcol1    int64col2    int64dtype: objectcol1    int64col2    int64dtype: objectcol1    int64col2    int64dtype: objectunexpectedly converted to float dtypescol1    float64col2    float64dtype: object