Python & Pandas - Group by day and count for each day
You can use dt.floor
for convert to date
s and then value_counts
or groupby
with size
:
df = (pd.to_datetime(df['date & time of connection']) .dt.floor('d') .value_counts() .rename_axis('date') .reset_index(name='count'))print (df) date count0 2017-06-23 61 2017-06-21 52 2017-06-19 33 2017-06-22 34 2017-06-20 2
Or:
s = pd.to_datetime(df['date & time of connection'])df = s.groupby(s.dt.floor('d')).size().reset_index(name='count')print (df) date & time of connection count0 2017-06-19 31 2017-06-20 22 2017-06-21 53 2017-06-22 34 2017-06-23 6
Timings:
np.random.seed(1542)N = 220000a = np.unique(np.random.randint(N, size=int(N/2)))df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a)df.columns = ['date & time of connection']df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S')print (df.head()) In [193]: %%timeit ...: df['date & time of connection']=pd.to_datetime(df['date & time of connection']) ...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count() ...: 539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)In [194]: %%timeit ...: df1 = (pd.to_datetime(df['date & time of connection']) ...: .dt.floor('d') ...: .value_counts() ...: .rename_axis('date') ...: .reset_index(name='count')) ...: 12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [195]: %%timeit ...: s = pd.to_datetime(df['date & time of connection']) ...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count') ...: 17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
To make sure your columns in in date format.
df['date & time of connection']=pd.to_datetime(df['date & time of connection'])
Then you can group the data by date and do a count:
df.groupby(by=df['date & time of connection'].dt.date).count()Out[10]: date & time of connectiondate & time of connection 2017-06-19 32017-06-20 22017-06-21 52017-06-22 32017-06-23 6
Hey I found easy way to do this with resample.
# Set the date column as index column.df = df.set_index('your_date_column')# Make countsdf_counts = df.your_date_column.resample('D').count()
Although your column name is long and contains spaces, which makes me a little cringy. I would use dashes instead of spaces.