Select first and last row from grouped data Select first and last row from grouped data r r

Select first and last row from grouped data


There is probably a faster way:

df %>%  group_by(id) %>%  arrange(stopSequence) %>%  filter(row_number()==1 | row_number()==n())


Just for completeness: You can pass slice a vector of indices:

df %>% arrange(stopSequence) %>% group_by(id) %>% slice(c(1,n()))

which gives

  id stopId stopSequence1  1      a            12  1      c            33  2      b            14  2      c            45  3      b            16  3      a            3


Not dplyr, but it's much more direct using data.table:

library(data.table)setDT(df)df[ df[order(id, stopSequence), .I[c(1L,.N)], by=id]$V1 ]#    id stopId stopSequence# 1:  1      a            1# 2:  1      c            3# 3:  2      b            1# 4:  2      c            4# 5:  3      b            1# 6:  3      a            3

More detailed explanation:

# 1) get row numbers of first/last observations from each group#    * basically, we sort the table by id/stopSequence, then,#      grouping by id, name the row numbers of the first/last#      observations for each id; since this operation produces#      a data.table#    * .I is data.table shorthand for the row number#    * here, to be maximally explicit, I've named the variable V1#      as row_num to give other readers of my code a clearer#      understanding of what operation is producing what variablefirst_last = df[order(id, stopSequence), .(row_num = .I[c(1L,.N)]), by=id]idx = first_last$row_num# 2) extract rows by numberdf[idx]

Be sure to check out the Getting Started wiki for getting the data.table basics covered