Translating SQL joins on foreign keys to R data.table syntax Translating SQL joins on foreign keys to R data.table syntax r r

Translating SQL joins on foreign keys to R data.table syntax


Good question. Note the following (admittedly buried) in ?data.table :

When i is a data.table, x must have a key. i is joined to x using the key and the rows in x that match are returned. An equi-join is performed between each column in i to each column in x's key. The match is a binary search in compiled C in O(log n) time. If i has less columns than x's key then many rows of x may match to each row of i. If i has more columns than x's key, the columns of i not involved in the join are included in the result. If i also has a key, it is i's key columns that are used to match to x's key columns and a binary merge of the two tables is carried out.

So, the key here is that i doesn't have to be keyed. Only x must be keyed.

X2 <- data.table(id = 11:15, y_id = c(14,14,11,12,12), key="id")     id y_id[1,] 11   14[2,] 12   14[3,] 13   11[4,] 14   12[5,] 15   12Y2 <- data.table(id = 11:15, b = letters[1:5], key="id")     id b[1,] 11 a[2,] 12 b[3,] 13 c[4,] 14 d[5,] 15 eY2[J(X2$y_id)]  # binary search for each item of (unsorted and unkeyed) i     id b[1,] 14 d[2,] 14 d[3,] 11 a[4,] 12 b[5,] 12 b

or,

Y2[SJ(X2$y_id)]  # binary merge of keyed i, see ?SJ     id b[1,] 11 a[2,] 12 b[3,] 12 b[4,] 14 d[5,] 14 didentical(Y2[J(X2$y_id)], Y2[X2$y_id])[1] FALSE