When to use R, when to use SQL?
I usually do the data manipulations in SQL until the data I want is in a single table, and then, I do the rest in R.Only when there is a performance issuedo I start to move some of the computations to the database.This is already what you are doing.
Computations involving timestamps often become unreadable in SQL(the "analytic functions", similar to ddply
, are supposed to simplify this,but I think they are not available in MySQL).
However, your example can probably be written entirely in SQL as follows (not tested).
-- Join the tables and compute the maximumCREATE VIEW t1 ASSELECT a.id AS a_id, a.group AS a_group, b.id AS b_id, a.time AS a_time, a.time - MAX(b.time) AS dtFROM a, b, abWHERE a.id = ab.a_id AND b.id = ab.b_idAND b.time < a.timeGROUP BY a.id, a.group, b.id;-- Extract the desired rowsCREATE VIEW t2 AS SELECT t1.*FROM t1, (SELECT group, MIN(dt) AS min_dt FROM t1) XWHERE t1.a_id = X.a_id AND t1.b_id = X.b_id AND t1.a_group = X.a.group;