Getting the average of a certain hour on weekdays over several years in a pandas dataframe Getting the average of a certain hour on weekdays over several years in a pandas dataframe pandas pandas

Getting the average of a certain hour on weekdays over several years in a pandas dataframe


Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.

Update: You can do the groupby more directly (without the lambda):

In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()Out[21]:                     ValueDate/Time Date/Time2010      0             60          1             50          2             52          3             49In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()In [23]: res.index.names = ["year", "hour"]In [24]: resOut[24]:           Valueyear hour2010 0        60     1        50     2        52     3        49

If it's a datetime64 index you can do:

In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()Out[31]:        Value2010 0     60     1     50     2     52     3     49

Old answer (will be slower):

Assuming Date/Time was the index* you can use a mapping function in the groupby:

In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()In [12]: year_hour_meansOut[12]:           Value(2010, 0)     60(2010, 1)     50(2010, 2)     52(2010, 3)     49

For a more useful index, you could then create a MultiIndex from the tuples:

In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index,                                                           names=['year', 'hour'])In [14]: year_hour_meansOut[14]:           Valueyear hour2010 0        60     1        50     2        52     3        49

* if not, then first use set_index:

df1 = df.set_index('Date/Time')


If your date/time column were in the datetime format (see dateutil.parser for automatic parsing options), you can use pandas resample as below:

year_hour_means = df.resample('H',how = 'mean')

which will keep your data in the datetime format. This may help you with whatever you are going to be doing with your data down the line.