Python Pandas dataframe subtract cumulative column
You can use substract by sub
with shift
ed column:
Last fill NaN
by first value in Raw data
.
df['Processed data'] = df['Raw data'].sub(df['Raw data'].shift())df['Processed data'].iloc[0] = df['Raw data'].iloc[0]print (df) Raw data Processed data2016-11-23 10:00:00 48.6 48.62016-11-23 11:00:00 158.7 110.12016-11-23 12:00:00 377.8 219.12016-11-23 13:00:00 591.7 213.92016-11-23 14:00:00 748.5 156.82016-11-23 15:00:00 848.2 99.7
I think you can try this one, i found it easy :Will create a new column with subtracted value.
df['processed_data'] = df['Raw_data'].diff(1)
You can do this with join
in Pandas, which has the advantage of handling cases with more columns (which are not themselves unique).
Assume you have a DataFrame like
timestep fid cumul_value2016-11-23 10:00:00 1 48.6 2016-11-23 11:00:00 1 158.7 2016-11-23 12:00:00 1 377.8 2016-11-23 13:00:00 1 591.7 2016-11-23 14:00:00 1 748.5 2016-11-23 15:00:00 1 848.2 2016-11-23 10:00:00 2 88.6 2016-11-23 11:00:00 2 758.7 ...2016-11-23 12:00:00 5 577.8 2016-11-23 13:00:00 5 691.7 2016-11-23 14:00:00 5 348.5 2016-11-23 15:00:00 5 148.2
where fid
represents another parameter, over whose values cumul_value
varies. You want to obtain a column value
from the column cumul_value
such that value(fid,timestep) = cumul_value(fid,timestep) - cumul_value(fid,timestep - 1)
for each fid
.
onestep = timedelta(hours=1)df['prev_timestep'] = df['timestep'] - onestepdf_cumul = df[['id','fid','timestep','cumul_value']] .set_index(['timestep','fid'])df_val = df.join(df_cumul,on=['prev_timestep','fid'],rsuffix='_prev')df_val['value'] = df_val['cumul_value'] - df_val['cumul_value_prev']df_val = df_val.drop(['prev_timestep','cumul_value_prev','cumul_value','id_prev'],axis=1)
Finish up by handling the first timestep which is likely a special case(call it t0
)
df_t0 = df_cumul[df_cumul['timestep'] == t0]df_val.loc[df_val.index.isin(df_t0.index),'value'] = df_t0['cumul_value']