Assign values to SparseArray in Pandas? Assign values to SparseArray in Pandas? pandas pandas

Assign values to SparseArray in Pandas?


It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.

Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:

  • convert a few columns in your DataFrame to dense format while keeping the others sparse,
  • insert your data with .loc[] in the dense columns,
  • and then convert these columns back to sparse.

This is obviously a lot less memory intensive than converting the entire DataFrame to dense.

Here is a very simple function to illustrate what I mean:

def sp_loc(df, index, columns, val):    """ Insert data in a DataFrame with SparseDtype format    Only applicable for pandas version > 0.25    Args    ----    df : DataFrame with series formatted with pd.SparseDtype    index: str, or list, or slice object        Same as one would use as first argument of .loc[]    columns: str, list, or slice        Same one would normally use as second argument of .loc[]    val: insert values    Returns    -------    df: DataFrame        Modified DataFrame    """    # Save the original sparse format for reuse later    spdtypes = df.dtypes[columns]    # Convert concerned Series to dense format    df[columns] = df[columns].sparse.to_dense()    # Do a normal insertion with .loc[]    df.loc[index, columns] = val    # Back to the original sparse format    df[columns] = df[columns].astype(spdtypes)    return df

Simple usage example:

# DÉFINITION DATAFRAME SPARSEdf1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])df1.loc['a', 'J'] = 0.42df1 = df1.astype(pd.SparseDtype(float))#     |   I |      J# ----+-----+--------# a   | nan |   0.42# b   | nan | nan# c   | nan | nandf1.dtypes#I    Sparse[float64, nan]#J    Sparse[float64, nan]df1.sparse.density# 0.16666666666666666# INSERTIONdf1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])#     |   I |      J# ----+-----+--------#  a  |  -1 |   0.42#  b  |   1 | nan#  c  | nan | nandf1.sparse.density()# 0.5