Assign values to SparseArray in Pandas?
It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.
Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:
- convert a few columns in your DataFrame to dense format while keeping the others sparse,
- insert your data with .loc[] in the dense columns,
- and then convert these columns back to sparse.
This is obviously a lot less memory intensive than converting the entire DataFrame to dense.
Here is a very simple function to illustrate what I mean:
def sp_loc(df, index, columns, val): """ Insert data in a DataFrame with SparseDtype format Only applicable for pandas version > 0.25 Args ---- df : DataFrame with series formatted with pd.SparseDtype index: str, or list, or slice object Same as one would use as first argument of .loc[] columns: str, list, or slice Same one would normally use as second argument of .loc[] val: insert values Returns ------- df: DataFrame Modified DataFrame """ # Save the original sparse format for reuse later spdtypes = df.dtypes[columns] # Convert concerned Series to dense format df[columns] = df[columns].sparse.to_dense() # Do a normal insertion with .loc[] df.loc[index, columns] = val # Back to the original sparse format df[columns] = df[columns].astype(spdtypes) return df
Simple usage example:
# DÉFINITION DATAFRAME SPARSEdf1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])df1.loc['a', 'J'] = 0.42df1 = df1.astype(pd.SparseDtype(float))# | I | J# ----+-----+--------# a | nan | 0.42# b | nan | nan# c | nan | nandf1.dtypes#I Sparse[float64, nan]#J Sparse[float64, nan]df1.sparse.density# 0.16666666666666666# INSERTIONdf1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])# | I | J# ----+-----+--------# a | -1 | 0.42# b | 1 | nan# c | nan | nandf1.sparse.density()# 0.5