How to set k-Means clustering labels from highest to lowest with Python?

python sorting numpy scikit-learn k-means

Transforming the labels through a lookup table is a straightforward way to achieve what you want.

To begin with I generate some mock data:

import numpy as npnp.random.seed(1000)n = 38X_morning = np.random.uniform(low=.02, high=.18, size=38)X_afternoon = np.random.uniform(low=.05, high=.20, size=38)X_night = np.random.uniform(low=.025, high=.175, size=38)X = np.vstack([X_morning, X_afternoon, X_night]).T

Then I perform clustering on data:

from sklearn.cluster import KMeansk = 4kmeans = KMeans(n_clusters=k, random_state=0).fit(X)

And finally I use NumPy's argsort to create a lookup table like this:

idx = np.argsort(kmeans.cluster_centers_.sum(axis=1))lut = np.zeros_like(idx)lut[idx] = np.arange(k)

Sample run:

In [70]: kmeans.cluster_centers_.sum(axis=1)Out[70]: array([ 0.3214523 ,  0.40877735,  0.26911353,  0.25234873])In [71]: idxOut[71]: array([3, 2, 0, 1], dtype=int64)In [72]: lutOut[72]: array([2, 3, 1, 0], dtype=int64)In [73]: kmeans.labels_Out[73]: array([1, 3, 1, ..., 0, 1, 0])In [74]: lut[kmeans.labels_]Out[74]: array([3, 0, 3, ..., 2, 3, 2], dtype=int64)

idx shows the cluster center labels ordered from lowest to highest consumption level. The appartments for which lut[kmeans.labels_] is 0 / 3 belong to the cluster with the lowest / highest consumption levels.

python sorting numpy scikit-learn k-means

Maybe sort the centroids based on their vector magnitude is better, since you can use it to predict other data using the same model. Here is my implementation in my repo

from sklearn.cluster import KMeansdef sorted_cluster(x, model=None):    if model == None:        model = KMeans()    model = sorted_cluster_centers_(model, x)    model = sorted_labels_(model, x)    return modeldef sorted_cluster_centers_(model, x):    model.fit(x)    new_centroids = []    magnitude = []    for center in model.cluster_centers_:        magnitude.append(np.sqrt(center.dot(center)))    idx_argsort = np.argsort(magnitude)    model.cluster_centers_ = model.cluster_centers_[idx_argsort]    return modeldef sorted_labels_(sorted_model, x):    sorted_model.labels_ = sorted_model.predict(x)    return sorted_model

Example:

import numpy as nparr = np.vstack([    100 + np.random.random((2,3)),    np.random.random((2,3)),    5 + np.random.random((3,3)),    10 + np.random.random((2,3))])print('Data:')print(arr)cluster = KMeans(n_clusters=4)print('\n Without sort:')cluster.fit(arr)print(cluster.cluster_centers_)print(cluster.labels_)print(cluster.predict([[5,5,5],[1,1,1]]))print('\n With sort:')cluster = sorted_cluster(arr, cluster)print(cluster.cluster_centers_)print(cluster.labels_)print(cluster.predict([[5,5,5],[1,1,1]]))

Output:

Data:[[100.52656263 100.57376566 100.63087757] [100.70144046 100.94095196 100.57095386] [  0.21284187   0.75623797   0.77349013] [  0.28241023   0.89878796   0.27965047] [  5.14328748   5.37025887   5.26064209] [  5.21030632   5.09597417   5.29507699] [  5.81531591   5.11629056   5.78542656] [ 10.25686526  10.64181304  10.45651994] [ 10.14153211  10.28765705  10.20653228]] Without sort:[[ 10.19919868  10.46473505  10.33152611] [100.61400155 100.75735881 100.60091572] [  0.24762605   0.82751296   0.5265703 ] [  5.38963657   5.19417453   5.44704855]][1 1 2 2 3 3 3 0 0][3 2] With sort:[[  0.24762605   0.82751296   0.5265703 ] [  5.38963657   5.19417453   5.44704855] [ 10.19919868  10.46473505  10.33152611] [100.61400155 100.75735881 100.60091572]][3 3 0 0 1 1 1 2 2][1 0]

CodeHunter

How to set k-Means clustering labels from highest to lowest with Python?

Sample run:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last