k-means clustering in R on very large, sparse matrix? k-means clustering in R on very large, sparse matrix? r r

k-means clustering in R on very large, sparse matrix?


The bigmemory package (or now family of packages -- see their website) used k-means as running example of extended analytics on large data. See in particular the sub-package biganalytics which contains the k-means function.


Please check:

library(foreign)?read.arff

Cheers.


sparkcl performs sparse hierarchical clustering and sparse k-means clusteringThis should be good for R-suitable (so - fitting into memory) matrices.

http://cran.r-project.org/web/packages/sparcl/sparcl.pdf

==

For really big matrices, I would try a solution with Apache Spark sparse matrices, and MLlib - still, do not know how experimental it is now:

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$

https://spark.apache.org/docs/latest/mllib-clustering.html