Kmeans without knowing the number of clusters? [duplicate]

python machine-learning data-mining k-means

In essence, you pick a subset of your data and cluster it into k clusters, and you ask how well it clusters, compared with the rest of the data: Are you assigning data points to the same cluster memberships, or are they falling into different clusters?

If the memberships are roughly the same, the data fit well into k clusters. Otherwise, you try a different k.

Also, you could do PCA (principal component analysis) to reduce your 50 dimensions to some more tractable number. If a PCA run suggests that most of your variance is coming from, say, 4 out of the 50 dimensions, then you can pick k on that basis, to explore how the four cluster memberships are assigned.

python machine-learning data-mining k-means

Take a look at this wikipedia page on determining the number of clusters in a data set.

Also you might want to try Agglomerative hierarchical clustering out. This approach does not need to know the number of clusters, it will incrementally form clusters of cluster till only one exists. This technique also exists in SciPy (scipy.cluster.hierarchy).

python machine-learning data-mining k-means

One interesting approach is that of evidence accumulation by Fred and Jain. This is based on combining multiple runs of k-means with a large number of clusters, aggregating them into an overall solution. Nice aspects of the approach include that the number of clusters is determined in the process and that the final clusters don't have to be spherical.

CodeHunter

Kmeans without knowing the number of clusters? [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last