Permute all unique enumerations of a vector in R
EDIT: Here's a faster answer; again based on the ideas of Louisa Grey and Bryce Wagner, but with faster R code thanks to better use of matrix indexing. It's quite a bit faster than my original:
> ddd <- c(1,0,3,4,1,0,0,3,0,4)> system.time(up1 <- uniqueperm(d)) user system elapsed 0.183 0.000 0.186 > system.time(up2 <- uniqueperm2(d)) user system elapsed 0.037 0.000 0.038
And the code:
uniqueperm2 <- function(d) { dat <- factor(d) N <- length(dat) n <- tabulate(dat) ng <- length(n) if(ng==1) return(d) a <- N-c(0,cumsum(n))[-(ng+1)] foo <- lapply(1:ng, function(i) matrix(combn(a[i],n[i]),nrow=n[i])) out <- matrix(NA, nrow=N, ncol=prod(sapply(foo, ncol))) xxx <- c(0,cumsum(sapply(foo, nrow))) xxx <- cbind(xxx[-length(xxx)]+1, xxx[-1]) miss <- matrix(1:N,ncol=1) for(i in seq_len(length(foo)-1)) { l1 <- foo[[i]] nn <- ncol(miss) miss <- matrix(rep(miss, ncol(l1)), nrow=nrow(miss)) k <- (rep(0:(ncol(miss)-1), each=nrow(l1)))*nrow(miss) + l1[,rep(1:ncol(l1), each=nn)] out[xxx[i,1]:xxx[i,2],] <- matrix(miss[k], ncol=ncol(miss)) miss <- matrix(miss[-k], ncol=ncol(miss)) } k <- length(foo) out[xxx[k,1]:xxx[k,2],] <- miss out <- out[rank(as.numeric(dat), ties="first"),] foo <- cbind(as.vector(out), as.vector(col(out))) out[foo] <- d t(out)}
It doesn't return the same order, but after sorting, the results are identical.
up1a <- up1[do.call(order, as.data.frame(up1)),]up2a <- up2[do.call(order, as.data.frame(up2)),]identical(up1a, up2a)
For my first attempt, see the edit history.
The following function (which implements the classic formula for repeated permutations just like you did manually in your question) seems quite fast to me:
upermn <- function(x) { n <- length(x) duplicates <- as.numeric(table(x)) factorial(n) / prod(factorial(duplicates))}
It does compute n!
but not like permn
function which generates all permutations first.
See it in action:
> dat <- c(1,0,3,4,1,0,0,3,0,4)> upermn(dat)[1] 18900> system.time(uperm(dat)) user system elapsed 0.000 0.000 0.001
UPDATE: I have just realized that the question was about generating all unique permutations not just specifying the number of them - sorry for that!
You could improve the unique(perm(...))
part with specifying unique permutations for one less element and later adding the uniqe elements in front of them. Well, my explanation may fail, so let the source speak:
uperm <- function(x) {u <- unique(x) # unique values of the vectorresult <- x # let's start the result matrix with the vectorfor (i in 1:length(u)) { v <- x[-which(x==u[i])[1]] # leave the first occurance of duplicated values result <- rbind(result, cbind(u[i], do.call(rbind, unique(permn(v)))))}return(result)}
This way you could gain some speed. I was lazy to run the code on the vector you provided (took so much time), here is a small comparison on a smaller vector:
> dat <- c(1,0,3,4,1,0,0)> system.time(unique(permn(dat))) user system elapsed 0.264 0.000 0.268 > system.time(uperm(dat)) user system elapsed 0.147 0.000 0.150
I think you could gain a lot more by rewriting this function to be recursive!
UPDATE (again): I have tried to make up a recursive function with my limited knowledge:
uperm <- function(x) { u <- sort(unique(x)) l <- length(u) if (l == length(x)) { return(do.call(rbind,permn(x))) } if (l == 1) return(x) result <- matrix(NA, upermn(x), length(x)) index <- 1 for (i in 1:l) { v <- x[-which(x==u[i])[1]] newindex <- upermn(v) if (table(x)[i] == 1) { result[index:(index+newindex-1),] <- cbind(u[i], do.call(rbind, unique(permn(v)))) } else { result[index:(index+newindex-1),] <- cbind(u[i], uperm(v)) } index <- index+newindex } return(result)}
Which has a great gain:
> system.time(unique(permn(c(1,0,3,4,1,0,0,3,0)))) user system elapsed 22.808 0.103 23.241 > system.time(uperm(c(1,0,3,4,1,0,0,3,0))) user system elapsed 4.613 0.003 4.645
Please report back if this would work for you!
One option that hasn't been mentioned here is the allPerm
function from the multicool
package. It can be used pretty easily to get all the unique permutations:
library(multicool)perms <- allPerm(initMC(dat))dim(perms)# [1] 18900 10head(perms)# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]# [1,] 4 4 3 3 1 1 0 0 0 0# [2,] 0 4 4 3 3 1 1 0 0 0# [3,] 4 0 4 3 3 1 1 0 0 0# [4,] 4 4 0 3 3 1 1 0 0 0# [5,] 3 4 4 0 3 1 1 0 0 0# [6,] 4 3 4 0 3 1 1 0 0 0
In benchmarking I found it to be faster on dat
than the solutions from the OP and daroczig but slower than the solution from Aaron.