Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC) Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC) sqlite sqlite

Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC)


I ended up writing my own gem. Check it out and feel free to contribute:https://github.com/lakim/sql_funk

It allows you to make calls like:

Subscriber.count_by("created_at", :group_by => "day")


You speak of some pretty difficult problems that Rails, unfortunately, completely overlooks. The ActiveRecord::Calculations docs are written like they're all you ever need, but databases can do much more advanced things. As Donal Fellows mentioned in his comment, the problem is much trickier than it seems.

I've developed a Rails application over the last two years that makes heavy use of aggregation, and I've tried a few different approaches to the problem. I unfortunately don't have the luxary of ignoring things like daylight savings because the statistics are "only trends". The calculations I generate are tested by my customers to exact specifications.

To expand upon the problem a bit, I think you'll find that your current solution of grouping by dates is inadequate. It seems like a natural option to use STRFTIME. The primary problem is that it doesn't let you group by arbitrary time periods. If you want to do aggregation by year, month, day, hour, and/or minute, STRFTIME will work fine. If not, you'll find yourself looking for another solution. Another huge problem is that of aggregation upon aggregation. Say, for example, you want to group by month, but you want to do it starting from the 15th of every month. How would you do it using STRFTIME? You'd have to group by each day, and then by month, but then someone account for the starting offset of the 15th day of each month. The final straw is that grouping by STRFTIME necessitates grouping by a string value, which you'll find very slow when performing aggregation upon aggregation.

The most performant and best designed solution I've come to is one based upon integer time periods. Here is an excerpt from one of my mysql queries:

SELECT  field1, field2, field3,  CEIL((UNIX_TIMESTAMP(CONVERT_TZ(date, '+0:00', @@session.time_zone)) + :begin_offset) / :time_interval) AS time_periodFROM  some_tableGROUP BY   time_period

In this case, :time_interval is the number of seconds in the grouping period (e.g. 86400 for daily) and :begin_offset is the number of seconds to offset the period start. The CONVERT_TZ() business accounts for the way mysql interprets dates. Mysql always assumes that the date field is in the mysql local time zone. But because I store times in UTC, I must convert it from UTC to the session time zone if I want the UNIX_TIMESTAMP() function to give me a correct response. The time period ends up being an integer that describes the number of time intervals since the start of unix time. This solution is much more flexible because it lets you group by arbitrary periods and doesn't require aggregation upon aggregation.

Now, to get to my real point. For a robust solution, I'd recommend that you consider not using Rails at all to generate these queries. The biggest issue is that the performance characteristics and subtleties of aggregation are different across the databases. You might find one design that works well in your development environment but not in production, or vice-versa. You'll jump through a lot of hoops to get Rails to play nicely with both databases in query construction.

Instead I'd recommend that you generate database-specific views in your chosen database and bring those along to the correct environment. Try to model the view as you would any other ActiveRecord table (id's and all), and of course make the fields in the view identical across databases. Because these statistics are read-only queries, you can use a model to back them and pretend like they're full-fledged tables. Just raise an exception if somebody tries to save, create, update, or destroy.

Not only will you get simplified model management by doing things the Rails way, you'll also find that you can write units tests for your aggregation features in ways you wouldn't dream of in pure SQL. And if you decide to switch databases, you'll have to rewrite those views, but your tests will tell you where you're wrong, and make life so much easier.


I just released a gem that allows you to do this easily with MySQL. https://github.com/ankane/groupdate

You should really try to run MySQL in development, too. Your development and production environments should be as close as possible - less of a chance for something to work on development and totally break production.