Getting the average of a certain hour on weekdays over several years in a pandas dataframe
Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.
Update: You can do the groupby more directly (without the lambda):
In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()Out[21]: ValueDate/Time Date/Time2010 0 60 1 50 2 52 3 49In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()In [23]: res.index.names = ["year", "hour"]In [24]: resOut[24]: Valueyear hour2010 0 60 1 50 2 52 3 49
If it's a datetime64 index you can do:
In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()Out[31]: Value2010 0 60 1 50 2 52 3 49
Old answer (will be slower):
Assuming Date/Time was the index* you can use a mapping function in the groupby:
In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()In [12]: year_hour_meansOut[12]: Value(2010, 0) 60(2010, 1) 50(2010, 2) 52(2010, 3) 49
For a more useful index, you could then create a MultiIndex from the tuples:
In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index, names=['year', 'hour'])In [14]: year_hour_meansOut[14]: Valueyear hour2010 0 60 1 50 2 52 3 49
* if not, then first use set_index
:
df1 = df.set_index('Date/Time')
If your date/time column were in the datetime format (see dateutil.parser for automatic parsing options), you can use pandas resample as below:
year_hour_means = df.resample('H',how = 'mean')
which will keep your data in the datetime format. This may help you with whatever you are going to be doing with your data down the line.