Using pandas TimeStamp with scikit-learn Using pandas TimeStamp with scikit-learn python-3.x python-3.x

Using pandas TimeStamp with scikit-learn


You can translate it to a proper integer or float

test_df['date'] = test_df['date'].astype(int)


you want to fit on X and y, where X are features (2 or more) and y is a target. use your datetimeindex as a time series, not a feature. In my example, I fit earthquakes with mag > 7 and calculate the elapsed days between each quake. The elapsed days and depth and latitude and longitude are fed to the linear regression classifier.

 events=df[df.mag >7] events=events.sort_index() index=0 #dates ascending False events['previous']=events.index for key,item in events.iterrows():      if index>0:          events.loc[key,'previous']=events.index.values[index-1]          events.loc[key,'time_delta']=events.index.values[index]-events.index.values[index-1]          index+=1events['elapsed_days']=events['time_delta'].apply(lambda x: np.nan_to_num(x.days))from sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitX=events[['latitude','longitude','elapsed_days','depth']]y=np.nan_to_num(events['mag'])X_train,X_test,y_train, y_test=train_test_split(X,y,test_size=0.3,random_state=42)lr = LinearRegression()lr.fit(X,y)y_pred=lr.predict(X_test)fig, ax= plt.subplots()ax.plot(X_test['elapsed_days'],y_pred)plt.title('Magnitude Prediction')plt.show()fig, ax= plt.subplots()ax.plot(events.index,np.nan_to_num(events['mag']))plt.xticks(rotation=90)plt.legend(['Magnitude'])twin_ax=ax.twinx()twin_ax.plot(events.index,events['elapsed_days'],color='red')plt.legend(['Elapsed Days'],loc=1)plt.show()