Understanding "score" returned by scikit-learn KMeans

In the documentation it says:

Returns:    score : floatOpposite of the value of X on the K-means objective.

To understand what that means you need to have a look at the k-means algorithm. What k-means essentially does is find cluster centers that minimize the sum of distances between data samples and their associated cluster centers.

It is a two-step process, where (a) each data sample is associated to its closest cluster center, (b) cluster centers are adjusted to lie at the center of all samples associated to them. These steps are repeated until a criterion (max iterations / min change between last two iterations) is met.

As you can see there remains a distance between the data samples and their associated cluster centers, and the objective of our minimization is that distance (sum of all distances).

You naturally get large distances if you have a big variety in data samples, if the number of data samples is significantly higher than the number of clusters, which in your case is only two. On the contrary, if all data samples were the same, you would always get a zero distance regardless of number of clusters.

From the documentation I would expect that all values are negative, though. If you observe both negative and positive values, maybe there is more to the score than that.

I wonder how you got the idea of clustering into two clusters though.

python scikit-learn k-means

The word chosen by the documentation is a bit confusing. It says "Opposite of the value of X on the K-means objective."It means negative of the K-means objective.

K-Means Objective

The objective in the K-means is to reduce the sum of squares of the distances of points from their respective cluster centroids. It has other names like J-Squared error function, J-score or within-cluster sum of squares. This value tells how internally coherent the clusters are. (The less the better)

The objective function can be directly obtained from the following method.

model.inertia_

python scikit-learn k-means

ypnos is right, you can find some detail here:https://github.com/scikit-learn/scikit-learn/blob/51a765a/sklearn/cluster/k_means_.py#L893

inertia : float    Sum of distances of samples to their closest cluster center."""

CodeHunter

Understanding "score" returned by scikit-learn KMeans

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last