Translating SQL joins on foreign keys to R data.table syntax
Good question. Note the following (admittedly buried) in ?data.table
:
When
i
is adata.table
,x
must have a key.i
is joined tox
using the key and the rows inx
that match are returned. An equi-join is performed between each column ini
to each column inx
's key. The match is a binary search in compiled C in O(log n) time. Ifi
has less columns thanx
's key then many rows ofx
may match to each row ofi
. Ifi
has more columns thanx
's key, the columns ofi
not involved in the join are included in the result. Ifi
also has a key, it isi
's key columns that are used to match tox
's key columns and a binary merge of the two tables is carried out.
So, the key here is that i
doesn't have to be keyed. Only x
must be keyed.
X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id") id y_id[1,] 11 14[2,] 12 14[3,] 13 11[4,] 14 12[5,] 15 12Y2 <- data.table(id = 11:15, b = letters[1:5], key="id") id b[1,] 11 a[2,] 12 b[3,] 13 c[4,] 14 d[5,] 15 eY2[J(X2$y_id)] # binary search for each item of (unsorted and unkeyed) i id b[1,] 14 d[2,] 14 d[3,] 11 a[4,] 12 b[5,] 12 b
or,
Y2[SJ(X2$y_id)] # binary merge of keyed i, see ?SJ id b[1,] 11 a[2,] 12 b[3,] 12 b[4,] 14 d[5,] 14 didentical(Y2[J(X2$y_id)], Y2[X2$y_id])[1] FALSE