Shift time series with missing dates in Pandas
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007], 'value':[5,10,8,72,12,13] })In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]In [590]: dfOut[590]: date value previous_value0 2000 5 NaN1 2001 10 52 2003 8 NaN3 2004 72 84 2005 12 725 2007 13 NaN
Also see here for a time series approach using resample()
: Using shift() with unevenly spaced data
Your example doesn't look like real time series data with timestamps. Let's take another example with the missing date 2020-01-03
:
df = pd.DataFrame({"val": [10, 20, 30, 40, 50]}, index=pd.date_range("2020-01-01", "2020-01-05"))df.drop(pd.Timestamp('2020-01-03'), inplace=True) val2020-01-01 102020-01-02 202020-01-04 402020-01-05 50
To shift by one day you can set the freq
parameter to 'D':
df.shift(1, freq='D')
Output:
val2020-01-02 102020-01-03 202020-01-05 402020-01-06 50
To combine original data with the shifted one you can merge both tables:
df.merge(df.shift(1, freq='D'), left_index=True, right_index=True, how='left', suffixes=('', '_previous'))
Output:
val val_previous2020-01-01 10 NaN2020-01-02 20 10.02020-01-04 40 NaN2020-01-05 50 40.0
Other offset aliases you can find here