How to calculate the number of occurrence of a given character in each row of a column of strings?
The stringr package provides the str_count
function which seems to do what you're interested in
# Load your example dataq.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F)library(stringr)# Count the number of 'a's in each element of stringq.data$number.of.a <- str_count(q.data$string, "a")q.data# number string number.of.a#1 1 greatgreat 2#2 2 magic 1#3 3 not 0
nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string))[1] 2 1 0
Notice that I coerce the factor variable to character, before passing to nchar. The regex functions appear to do that internally.
Here's benchmark results (with a scaled up size of the test to 3000 rows)
q.data<-q.data[rep(1:NROW(q.data), 1000),] str(q.data)'data.frame': 3000 obs. of 3 variables: $ number : int 1 2 3 1 2 3 1 2 3 1 ... $ string : Factor w/ 3 levels "greatgreat","magic",..: 1 2 3 1 2 3 1 2 3 1 ... $ number.of.a: int 2 1 0 2 1 0 2 1 0 2 ... benchmark( Dason = { q.data$number.of.a <- str_count(as.character(q.data$string), "a") }, Tim = {resT <- sapply(as.character(q.data$string), function(x, letter = "a"){ sum(unlist(strsplit(x, split = "")) == letter) }) }, DWin = {resW <- nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string))}, Josh = {x <- sapply(regmatches(q.data$string, gregexpr("g",q.data$string )), length)}, replications=100)#----------------------- test replications elapsed relative user.self sys.self user.child sys.child1 Dason 100 4.173 9.959427 2.985 1.204 0 03 DWin 100 0.419 1.000000 0.417 0.003 0 04 Josh 100 18.635 44.474940 17.883 0.827 0 02 Tim 100 3.705 8.842482 3.646 0.072 0 0