Python & Pandas - Group by day and count for each day Python & Pandas - Group by day and count for each day python-3.x python-3.x

Python & Pandas - Group by day and count for each day


You can use dt.floor for convert to dates and then value_counts or groupby with size:

df = (pd.to_datetime(df['date & time of connection'])       .dt.floor('d')       .value_counts()       .rename_axis('date')       .reset_index(name='count'))print (df)        date  count0 2017-06-23      61 2017-06-21      52 2017-06-19      33 2017-06-22      34 2017-06-20      2

Or:

s = pd.to_datetime(df['date & time of connection'])df = s.groupby(s.dt.floor('d')).size().reset_index(name='count')print (df)  date & time of connection  count0                2017-06-19      31                2017-06-20      22                2017-06-21      53                2017-06-22      34                2017-06-23      6

Timings:

np.random.seed(1542)N = 220000a = np.unique(np.random.randint(N, size=int(N/2)))df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a)df.columns = ['date & time of connection']df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S')print (df.head()) In [193]: %%timeit     ...: df['date & time of connection']=pd.to_datetime(df['date & time of connection'])     ...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count()     ...: 539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)In [194]: %%timeit     ...: df1 = (pd.to_datetime(df['date & time of connection'])     ...:        .dt.floor('d')     ...:        .value_counts()     ...:        .rename_axis('date')     ...:        .reset_index(name='count'))     ...: 12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [195]: %%timeit     ...: s = pd.to_datetime(df['date & time of connection'])     ...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count')     ...: 17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


To make sure your columns in in date format.

df['date & time of connection']=pd.to_datetime(df['date & time of connection'])

Then you can group the data by date and do a count:

df.groupby(by=df['date & time of connection'].dt.date).count()Out[10]:                            date & time of connectiondate & time of connection                           2017-06-19                                         32017-06-20                                         22017-06-21                                         52017-06-22                                         32017-06-23                                         6


Hey I found easy way to do this with resample.

# Set the date column as index column.df = df.set_index('your_date_column')# Make countsdf_counts = df.your_date_column.resample('D').count() 

Although your column name is long and contains spaces, which makes me a little cringy. I would use dashes instead of spaces.