MySQL: group by consecutive days and count groups MySQL: group by consecutive days and count groups mysql mysql

MySQL: group by consecutive days and count groups


You can find the first day of each visit by finding checkins where there was no checkin the day before.

select count(distinct date(start_of_visit.datetime))from checkin start_of_visitleft join checkin previous_day    on start_of_visit.user = previous_day.user    and start_of_visit.city = previous_day.city    and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)where previous_day.id is null

There are several important parts to this query.

First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.

Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)

Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:

select count(distinct date(datetime))from checkinwhere user='some user' and city='some city'


Try to apply this code to your task -

CREATE TABLE visits(  user_id INT(11) NOT NULL,  dt DATETIME DEFAULT NULL);INSERT INTO visits VALUES   (1, '2011-06-30 12:11:46'),  (1, '2011-07-01 13:16:34'),  (1, '2011-07-01 15:22:45'),  (1, '2011-07-01 22:35:00'),  (1, '2011-07-02 13:45:12'),  (1, '2011-08-01 00:11:45'),  (1, '2011-08-05 17:14:34'),  (1, '2011-08-05 18:11:46'),  (1, '2011-08-06 20:22:12'),  (2, '2011-08-30 16:13:34'),  (2, '2011-08-31 16:13:41');SET @i = 0;SET @last_dt = NULL;SET @last_user = NULL;SELECT v.user_id,  COUNT(DISTINCT(DATE(dt))) number_of_days,  MAX(days) number_of_visitsFROM  (SELECT user_id, dt        @i := IF(@last_user IS NULL OR @last_user <> user_id, 1, IF(@last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(@last_dt), @i + 1, @i)) AS days,        @last_dt := DATE(dt),        @last_user := user_id   FROM     visits   ORDER BY     user_id, dt  ) vGROUP BY  v.user_id;----------------Output:+---------+----------------+------------------+| user_id | number_of_days | number_of_visits |+---------+----------------+------------------+|       1 |              6 |                3 ||       2 |              2 |                1 |+---------+----------------+------------------+

Explanation:

To understand how it works let's check the subquery, here it is.

SET @i = 0;SET @last_dt = NULL;SET @last_user = NULL;SELECT user_id, dt,        @i := IF(@last_user IS NULL OR @last_user <> user_id, 1, IF(@last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(@last_dt), @i + 1, @i)) AS days,        @last_dt := DATE(dt) lt,        @last_user := user_id luFROM  visitsORDER BY  user_id, dt;

As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -

+---------+---------------------+------+------------+----+| user_id | dt                  | days | lt         | lu |+---------+---------------------+------+------------+----+|       1 | 2011-06-30 12:11:46 |    1 | 2011-06-30 |  1 ||       1 | 2011-07-01 13:16:34 |    1 | 2011-07-01 |  1 ||       1 | 2011-07-01 15:22:45 |    1 | 2011-07-01 |  1 ||       1 | 2011-07-01 22:35:00 |    1 | 2011-07-01 |  1 ||       1 | 2011-07-02 13:45:12 |    1 | 2011-07-02 |  1 ||       1 | 2011-08-01 00:11:45 |    2 | 2011-08-01 |  1 ||       1 | 2011-08-05 17:14:34 |    3 | 2011-08-05 |  1 ||       1 | 2011-08-05 18:11:46 |    3 | 2011-08-05 |  1 ||       1 | 2011-08-06 20:22:12 |    3 | 2011-08-06 |  1 ||       2 | 2011-08-30 16:13:34 |    1 | 2011-08-30 |  2 ||       2 | 2011-08-31 16:13:41 |    1 | 2011-08-31 |  2 |+---------+---------------------+------+------------+----+

Then we group this data set by user and use aggregate functions:'COUNT(DISTINCT(DATE(dt)))' - counts the number of days'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.

That is all;)


As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the @LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the @LUser and @LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of

User ID    Distinct Visits   Total Days1           3                 92           1                 2select PreQuery.User_ID,       sum( PreQuery.NextVisit ) as DistinctVisits,       count(*) as TotalDays   from      (  select v.user_id,               if( @LUser <> v.User_ID OR @LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,               @LUser := v.user_id,               @LDate := date( v.dt )            from                Visits v,               ( select @LUser := -1, @LDate := date(now()) ) AtVars             order by               v.user_id,               v.dt  ) PreQuery    group by        PreQuery.User_ID