Python Pandas dataframe subtract cumulative column Python Pandas dataframe subtract cumulative column pandas pandas

Python Pandas dataframe subtract cumulative column


You can use substract by sub with shifted column:

Last fill NaN by first value in Raw data.

df['Processed data'] = df['Raw data'].sub(df['Raw data'].shift())df['Processed data'].iloc[0] = df['Raw data'].iloc[0]print (df)                     Raw data  Processed data2016-11-23 10:00:00      48.6            48.62016-11-23 11:00:00     158.7           110.12016-11-23 12:00:00     377.8           219.12016-11-23 13:00:00     591.7           213.92016-11-23 14:00:00     748.5           156.82016-11-23 15:00:00     848.2            99.7


I think you can try this one, i found it easy :Will create a new column with subtracted value.

df['processed_data'] = df['Raw_data'].diff(1)


You can do this with join in Pandas, which has the advantage of handling cases with more columns (which are not themselves unique).

Assume you have a DataFrame like

timestep                fid        cumul_value2016-11-23 10:00:00     1          48.6 2016-11-23 11:00:00     1          158.7 2016-11-23 12:00:00     1          377.8 2016-11-23 13:00:00     1          591.7 2016-11-23 14:00:00     1          748.5 2016-11-23 15:00:00     1          848.2 2016-11-23 10:00:00     2          88.6 2016-11-23 11:00:00     2          758.7 ...2016-11-23 12:00:00     5          577.8 2016-11-23 13:00:00     5          691.7 2016-11-23 14:00:00     5          348.5 2016-11-23 15:00:00     5          148.2 

where fid represents another parameter, over whose values cumul_value varies. You want to obtain a column value from the column cumul_value such that value(fid,timestep) = cumul_value(fid,timestep) - cumul_value(fid,timestep - 1) for each fid.

onestep = timedelta(hours=1)df['prev_timestep'] = df['timestep'] - onestepdf_cumul = df[['id','fid','timestep','cumul_value']]        .set_index(['timestep','fid'])df_val = df.join(df_cumul,on=['prev_timestep','fid'],rsuffix='_prev')df_val['value'] = df_val['cumul_value'] - df_val['cumul_value_prev']df_val = df_val.drop(['prev_timestep','cumul_value_prev','cumul_value','id_prev'],axis=1)

Finish up by handling the first timestep which is likely a special case(call it t0)

df_t0 = df_cumul[df_cumul['timestep'] == t0]df_val.loc[df_val.index.isin(df_t0.index),'value'] = df_t0['cumul_value']