Select first and last row from grouped data
Not dplyr
, but it's much more direct using data.table
:
library(data.table)setDT(df)df[ df[order(id, stopSequence), .I[c(1L,.N)], by=id]$V1 ]# id stopId stopSequence# 1: 1 a 1# 2: 1 c 3# 3: 2 b 1# 4: 2 c 4# 5: 3 b 1# 6: 3 a 3
More detailed explanation:
# 1) get row numbers of first/last observations from each group# * basically, we sort the table by id/stopSequence, then,# grouping by id, name the row numbers of the first/last# observations for each id; since this operation produces# a data.table# * .I is data.table shorthand for the row number# * here, to be maximally explicit, I've named the variable V1# as row_num to give other readers of my code a clearer# understanding of what operation is producing what variablefirst_last = df[order(id, stopSequence), .(row_num = .I[c(1L,.N)]), by=id]idx = first_last$row_num# 2) extract rows by numberdf[idx]
Be sure to check out the Getting Started wiki for getting the data.table
basics covered