PostgreSQL - Getting statistical data PostgreSQL - Getting statistical data sql sql

PostgreSQL - Getting statistical data


You should look into aggregate functions (min, max, count, avg), which go hand in hand with GROUP BY. For date-based aggregations, date_trunc is also useful.

For example, this will return the number of rows per day:

SELECT date_trunc('day', date_time) AS day_start,       COUNT(id) AS user_count FROM tb_user    GROUP BY date_trunc('day', date_time);

You can then do the daily average using something like this (with a CTE):

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,       COUNT(id) AS user_count FROM tb_user    GROUP BY date_trunc('day', date_time))SELECT AVG(user_count) FROM daily_count;

Use 'week' instead of day for the weekly counts, and so on (see date_trunc documentation).

EDIT: (Following comment: average up to and including 5/1/2012, i.e. before the 6th.)

WITH daily_count AS (SELECT date_trunc('day', date_time) AS day_start,       COUNT(id) AS user_count    FROM tb_user       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06')     GROUP BY date_trunc('day', date_time))SELECT SUM(user_count)/(DATE('2012-01-06') - DATE('2012-01-01')) FROM daily_count;

What's above is over-complicated, in this case. This should give you the same result:

SELECT COUNT(id)/(DATE('2012-01-06') - DATE('2012-01-01'))    FROM tb_user       WHERE date_time >= DATE('2012-01-01') AND date_time < DATE('2012-01-06');

EDIT 2: After your edit, I guess what you're after is just a single global average for the entire period of existence of your database, rather than groups by month/week/day.

This should give you the average number of rows per day:

WITH total_min_max AS (SELECT        COUNT(id) AS total_visits,        MIN(date_time) AS first_date_time,        MAX(date_time) AS last_date_time,    FROM tb_user)SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day    FROM total_min_max

(I would replace last_date_time with NOW() to make the average over the time until now, rather than until the last visit, if there's no recent visit.)

Then, for daily, weekly, and "monthly":

WITH daily_avg AS (    WITH total_min_max AS (SELECT            COUNT(id) AS total_visits,            MIN(date_time) AS first_date_time,            MAX(date_time) AS last_date_time,        FROM tb_user)    SELECT total_visits/((last_date_time::date-first_date_time::date)+1) AS users_per_day        FROM total_min_max)SELECT         users_per_day,         (users_per_day * 7) AS users_per_week,         (users_per_month * 30) AS users_per_month    FROM daily_avg

This being said, conclusions you draw from such statistics might not be great, especially if you want to see how it changes.

I would also normalise the data per day rather than assuming 30 days in a month (if not per hour, because not all days have 24 hours). Say you have 10 visits per day in Jan 2011 and 10 visits per day in Feb 2011. That gives you 310 visits in Jan and 280 visits in Feb. If you don't pay attention, you could think you've had a almost a 10% drop in terms of number of visitors, so something went wrong in Feb, when really, this isn't the case.