Assign values to SparseArray in Pandas?

python pandas sparse-matrix

It is frustrating to not be able to insert directly in sparse format with .loc[]. I'm afraid I only have a workaround.

Since the original posting of the question (and version 0.25) pandas has deprecated SparseDataFrame. Instead, it created a data type (SparseDtype) that can be applied to individual series within the DataFrame. In other words, it is no longer "all or nothing". You can:

convert a few columns in your DataFrame to dense format while keeping the others sparse,
insert your data with .loc[] in the dense columns,
and then convert these columns back to sparse.

This is obviously a lot less memory intensive than converting the entire DataFrame to dense.

Here is a very simple function to illustrate what I mean:

def sp_loc(df, index, columns, val):    """ Insert data in a DataFrame with SparseDtype format    Only applicable for pandas version > 0.25    Args    ----    df : DataFrame with series formatted with pd.SparseDtype    index: str, or list, or slice object        Same as one would use as first argument of .loc[]    columns: str, list, or slice        Same one would normally use as second argument of .loc[]    val: insert values    Returns    -------    df: DataFrame        Modified DataFrame    """    # Save the original sparse format for reuse later    spdtypes = df.dtypes[columns]    # Convert concerned Series to dense format    df[columns] = df[columns].sparse.to_dense()    # Do a normal insertion with .loc[]    df.loc[index, columns] = val    # Back to the original sparse format    df[columns] = df[columns].astype(spdtypes)    return df

Simple usage example:

# DÉFINITION DATAFRAME SPARSEdf1 = pd.DataFrame(index=['a', 'b', 'c'], columns=['I', 'J'])df1.loc['a', 'J'] = 0.42df1 = df1.astype(pd.SparseDtype(float))#     |   I |      J# ----+-----+--------# a   | nan |   0.42# b   | nan | nan# c   | nan | nandf1.dtypes#I    Sparse[float64, nan]#J    Sparse[float64, nan]df1.sparse.density# 0.16666666666666666# INSERTIONdf1 = sp_loc(df1, ['a','b'], 'I', [-1, 1])#     |   I |      J# ----+-----+--------#  a  |  -1 |   0.42#  b  |   1 | nan#  c  | nan | nandf1.sparse.density()# 0.5

CodeHunter

Assign values to SparseArray in Pandas?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last