Extrapolate Pandas DataFrame
Extrapolating a DataFrame
with a DatetimeIndex
index
This can be done with two steps:
- Extend the
DatetimeIndex
- Extrapolate the data
Extend the Index
Overwrite df
with a new DataFrame
where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df
to come from anywhere, as in the csv
example case. With this the columns get conveniently filled with NaNs!
# Fake DataFrame for example (could come from anywhere)X1 = range(10)X2 = map(lambda x: x**2, X1)df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))# Number of months to extendextend = 5# Extrapolate the index first based on original indexdf = pd.DataFrame( data=df, index=pd.date_range( start=df.index[0], periods=len(df.index) + extend, freq=df.index.freq ))# Displayprint df
x1 x22013-01-31 0 02013-02-28 1 12013-03-31 2 42013-04-30 3 92013-05-31 4 162013-06-30 5 252013-07-31 6 362013-08-31 7 492013-09-30 8 642013-10-31 9 812013-11-30 NaN NaN2013-12-31 NaN NaN2014-01-31 NaN NaN2014-02-28 NaN NaN2014-03-31 NaN NaN
Extrapolate the data
Most extrapolators will require the inputs to be numeric instead of dates. This can be done with
# Temporarily remove dates and make index numericdi = df.indexdf = df.reset_index().drop('index', 1)
See this answer for how to extrapolate the values of each column of a DataFrame
with a 3rd order polynomial.
Snippet from answer
# Curve fit each columnfor col in fit_df.columns: # Get x & y x = fit_df.index.astype(float).values y = fit_df[col].values # Curve fit column and get curve parameters params = curve_fit(func, x, y, guess) # Store optimized parameters col_params[col] = params[0]# Extrapolate each columnfor col in df.columns: # Get the index values for NaNs in the column x = df[pd.isnull(df[col])].index.astype(float).values # Extrapolate those points with the fitted function df[col][x] = func(x, *col_params[col])
Once the columns are extrapolated, put the dates back
# Put date index backdf.index = di# Displayprint df
x1 x22013-01-31 0 02013-02-28 1 12013-03-31 2 42013-04-30 3 92013-05-31 4 162013-06-30 5 252013-07-31 6 362013-08-31 7 492013-09-30 8 642013-10-31 9 812013-11-30 10 1002013-12-31 11 1212014-01-31 12 1442014-02-28 13 1692014-03-31 14 196