Understand the `Reduce` function
Reduce
takes a binary function and a list of data items and successively applies the function to the list elements in a recursive fashion. For example:
Reduce(intersect,list(a,b,c))
is the same as
intersect((intersect(a,b),c)
However, I don't think that construct will help you here as it will only return those elements that are common to all vectors.
To count the number of vectors that a gene appears in you could do the following:
vlist <- list(v1,v2,v3,v4,v5)addmargins(table(gene=unlist(vlist), vec=rep(paste0("v",1:5),times=sapply(vlist,length))),2,list(Count=function(x) sum(x[x>0]))) vecgene v1 v2 v3 v4 v5 Count geneA 1 1 0 1 0 3 geneB 1 0 0 0 1 2 geneC 0 1 0 0 1 2 geneD 0 0 1 0 0 1 geneE 0 0 1 1 0 2
A nice way to see what Reduce()
is doing is to run it with its argument accumulate=TRUE
. When accumulate=TRUE
, it will return a vector or list in which each element shows its state after processing the first n elements of the list in x
. Here are a couple of examples:
Reduce(`*`, x=list(5,4,3,2), accumulate=TRUE)# [1] 5 20 60 120i2 <- seq(0,100,by=2)i3 <- seq(0,100,by=3)i5 <- seq(0,100,by=5)Reduce(intersect, x=list(i2,i3,i5), accumulate=TRUE)# [[1]]# [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36# [20] 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74# [39] 76 78 80 82 84 86 88 90 92 94 96 98 100# # [[2]]# [1] 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96# # [[3]]# [1] 0 30 60 90
Assuming the input values given at the end of this answer, the expression
Reduce(intersect,list(a,b,c,d,e))## character(0)
gives the genes that are present in all vectors, not the genes that are present in at least two vectors. It means:
intersect(intersect(intersect(intersect(a, b), c), d), e)## character(0)
If we want the genes that are in at least two vectors:
L <- list(a, b, c, d, e)u <- unlist(lapply(L, unique)) # or: Reduce(c, lapply(L, unique))tab <- table(u)names(tab[tab > 1])## [1] "geneA" "geneB" "geneC" "geneE"
or
sort(unique(u[duplicated(u)]))## [1] "geneA" "geneB" "geneC" "geneE"
Note: We used:
a <- c("geneA","geneB")b <- c("geneA","geneC")c <- c("geneD","geneE")d <- c("geneA","geneE")e <- c("geneB","geneC")