Optimal way to create a histogram/frequency distribution in Oracle?
If your createtime
were a date column, this would be trivial:
SELECT TO_CHAR(CREATE_TIME, 'DAY:HH24'), COUNT(*) FROM EVENTS GROUP BY TO_CHAR(CREATE_TIME, 'DAY:HH24');
As it is, casting the createtime
column isn't too hard:
select TO_CHAR( TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 'DAY:HH24') AS BUCKET, COUNT(*) FROM EVENTS WHERE createtime between 1305504000000 and 1306108800000 group by TO_CHAR( TO_DATE('19700101', 'YYYYMMDD') + createtime / 86400000), 'DAY:HH24') order by 1
If, alternatively, you're looking for the fencepost values (for example, where do I go from the first decile (0-10%) to the next (11-20%), you'd do something like:
select min(createtime) over (partition by decile) as decile_start, max(createtime) over (partition by decile) as decile_end, decile from (select createtime, ntile (10) over (order by createtime asc) as decile from events where createtime between 1305504000000 and 1306108800000 )
I'm unfamiliar with Oracle's date functions, but I'm pretty certain there's an equivalent way of writing this Postgres statement:
select date_trunc('hour', stamp), count(*)from your_datagroup by date_trunc('hour', stamp)order by date_trunc('hour', stamp)
Pretty much the same response as Adam, but I would prefer to keep the period_start as a time field so it is easier to filter further if needed:
withevents as( select rownum eventkey, round(dbms_random.value(1305504000000, 1306108800000)) createtime from dual connect by level <= 1000 )select trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH') period_start, count(*) numeventsfrom eventswhere createtime between 1305504000000 and 1306108800000group by trunc(timestamp '1970-01-01 00:00:00' + numtodsinterval(createtime/1000, 'second'), 'HH')order by period_start