Pandas: conditional shift

I am using a new para here record the condition df2['close_time']<df2['open_time']

df['New']=((df.open_time-df.close_time.shift()).dt.days>0).shift(-1)s=df.groupby('user').apply(lambda x : (x['value']*x['New']).cumsum().shift()).reset_index(level=0,drop=True)s.loc[~(df.New.shift()==True)]=np.nandf['Cumsum']=sdfOut[1043]:    user  open_time close_time  value    New Cumsum0     1 2017-01-01 2017-03-01      5  False    NaN1     1 2017-01-02 2017-02-01      6   True    NaN2     1 2017-02-03 2017-02-05      7   True      63     1 2017-02-07 2017-04-01      3  False     134     2 2017-01-01 2017-02-01     15   True    NaN5     2 2017-03-01 2017-04-01      3    NaN     15

Update : since op update the question (Data from Gabriel A)

df['New']=df.user.map(df.groupby('user').close_time.apply(lambda x: np.array(x)))df['New1']=df.user.map(df.groupby('user').value.apply(lambda x: np.array(x)))df['New2']=[[x>m for m in y] for x,y in zip(df['open_time'],df['New'])  ]df['Yourtarget']=list(map(sum,df['New2']*df['New1'].values))df.drop(['New','New1','New2'],1)Out[1376]:    user  open_time close_time  value  Yourtarget0     1 2016-12-30 2016-12-31      1           01     1 2017-01-01 2017-03-01      5           12     1 2017-01-02 2017-02-01      6           13     1 2017-02-03 2017-02-05      7           74     1 2017-02-07 2017-04-01      3          145     1 2017-09-07 2017-09-11      1          226     2 2018-01-01 2018-02-01     15           07     2 2018-03-01 2018-04-01      3          15

python pandas datetime data-analysis

I made a modification to you test case that I think you should include. This solution does handle your edit.

import pandas as pdimport numpy as npdf = pd.read_csv("cond_shift.csv")df

input:

   user open_time   close_time  value0   1   12/30/2016  12/31/2016  11   1   1/1/2017    3/1/2017    52   1   1/2/2017    2/1/2017    63   1   2/3/2017    2/5/2017    74   1   2/7/2017    4/1/2017    35   1   9/7/2017    9/11/2017   16   2   1/1/2018    2/1/2018    157   2   3/1/2018    4/1/2018    3

create columns to shift:

df["open_time"] = pd.to_datetime(df["open_time"])df["close_time"] = pd.to_datetime(df["close_time"])df.sort_values(['user','close_time'],inplace=True)df['close_cumsum']=df.groupby('user')['value'].cumsum()df.sort_values(['user','open_time'],inplace=True)df   user open_time   close_time  value   close_cumsum0   1   2016-12-30  2016-12-31  1       11   1   2017-01-01  2017-03-01  5       192   1   2017-01-02  2017-02-01  6       73   1   2017-02-03  2017-02-05  7       144   1   2017-02-07  2017-04-01  3       225   1   2017-09-07  2017-09-11  1       236   2   2018-01-01  2018-02-01  15      157   2   2018-03-01  2018-04-01  3       18

Shift columns (explanation below):

df["cumulated_closed_value"] = df.groupby("user")["close_cumsum"].transform("shift")condition = ~(df.groupby("user")['close_time'].transform("shift") < df["open_time"])df.loc[ condition,"cumulated_closed_value" ] = Nonedf["cumulated_closed_value"] =df.groupby("user")["cumulated_closed_value"].fillna(method="ffill").fillna(0)dfuser    open_time   close_time  value   close_cumsum    cumulated_closed_value0   1   2016-12-30  2016-12-31  1       1               0.01   1   2017-01-01  2017-03-01  5       19              1.02   1   2017-01-02  2017-02-01  6       7               1.03   1   2017-02-03  2017-02-05  7       14              7.04   1   2017-02-07  2017-04-01  3       22              14.05   1   2017-09-07  2017-09-11  1       23              22.06   2   2018-01-01  2018-02-01  15      15              0.07   2   2018-03-01  2018-04-01  3       18              15.0

All of this has been written is such a way that it's done across all users. I believe the logic is easier if you only focus on one user at a time.

Assume no events happens at the same time. This is the same thing asshifting the cumulative sum down one row.
Remove events that happens at the same time as other events.
Fill in the missing values. With a forwards fill.

I would still thoroughly test this before you use it. Time intervals are weird and there are a lot of edge cases.

python pandas datetime data-analysis

(Note: @wen's answer seems fine to me, so I'm not sure if the OP is looking for something more or something different. In any event, here's an alternate approach using merge_asof that should also work well.)

First reshape the dataframes as follows:

lookup = ( df[['close_time','value','user']].set_index(['user','close_time'])           .sort_index().groupby('user').cumsum().reset_index(0) )df = df.set_index('open_time').sort_index()

The idea with "lookup" is simply to sort by "close_time" and then take a (grouped) cumulative sum:

            user  valueclose_time             2017-02-01     1      62017-02-05     1     132017-03-01     1     182017-04-01     1     212017-09-11     1     222018-02-01     2     152018-04-01     2     18

For "df" we just take a subset of the original dataframe:

            user close_time  valueopen_time                         2017-01-01     1 2017-03-01      52017-01-02     1 2017-02-01      62017-02-03     1 2017-02-05      72017-02-07     1 2017-04-01      32017-09-07     1 2017-09-11      12018-01-01     2 2018-02-01     152018-03-01     2 2018-04-01      3

From here, you just want to conceptually merge the two datasets on "user" and the "open_time"/"close_time" but the complicating factor is that we don't want to do an exact match on the time, but rather a sort of "nearest" match.

For these sorts merges you can use merge_asof which is a great tool for various non-exact matches (including 'nearest', 'backward', and 'forward'). Unfortunately due to the inclusion of groupby, it's necessary to also loop over the users, but it's still pretty simple code to read:

df_merged = pd.DataFrame()for u in df['user'].unique():    df_merged = df_merged.append( pd.merge_asof( df[df.user==u],  lookup[lookup.user==u],                                                  left_index=True, right_index=True,                                                  direction='backward' ) )df_merged.drop('user_y',axis=1).rename({'value_y':'close_cumsum'},axis=1)

Results:

            user_x close_time  value_x  close_cumsumopen_time                                           2017-01-01       1 2017-03-01        5           NaN2017-01-02       1 2017-02-01        6           NaN2017-02-03       1 2017-02-05        7           6.02017-02-07       1 2017-04-01        3          13.02017-09-07       1 2017-09-11        1          21.02018-01-01       2 2018-02-01       15           NaN2018-03-01       2 2018-04-01        3          15.0

CodeHunter

Pandas: conditional shift

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last