How to set k-Means clustering labels from highest to lowest with Python? How to set k-Means clustering labels from highest to lowest with Python? numpy numpy

How to set k-Means clustering labels from highest to lowest with Python?


Transforming the labels through a lookup table is a straightforward way to achieve what you want.

To begin with I generate some mock data:

import numpy as npnp.random.seed(1000)n = 38X_morning = np.random.uniform(low=.02, high=.18, size=38)X_afternoon = np.random.uniform(low=.05, high=.20, size=38)X_night = np.random.uniform(low=.025, high=.175, size=38)X = np.vstack([X_morning, X_afternoon, X_night]).T

Then I perform clustering on data:

from sklearn.cluster import KMeansk = 4kmeans = KMeans(n_clusters=k, random_state=0).fit(X)

And finally I use NumPy's argsort to create a lookup table like this:

idx = np.argsort(kmeans.cluster_centers_.sum(axis=1))lut = np.zeros_like(idx)lut[idx] = np.arange(k)

Sample run:

In [70]: kmeans.cluster_centers_.sum(axis=1)Out[70]: array([ 0.3214523 ,  0.40877735,  0.26911353,  0.25234873])In [71]: idxOut[71]: array([3, 2, 0, 1], dtype=int64)In [72]: lutOut[72]: array([2, 3, 1, 0], dtype=int64)In [73]: kmeans.labels_Out[73]: array([1, 3, 1, ..., 0, 1, 0])In [74]: lut[kmeans.labels_]Out[74]: array([3, 0, 3, ..., 2, 3, 2], dtype=int64)

idx shows the cluster center labels ordered from lowest to highest consumption level. The appartments for which lut[kmeans.labels_] is 0 / 3 belong to the cluster with the lowest / highest consumption levels.


Maybe sort the centroids based on their vector magnitude is better, since you can use it to predict other data using the same model. Here is my implementation in my repo

from sklearn.cluster import KMeansdef sorted_cluster(x, model=None):    if model == None:        model = KMeans()    model = sorted_cluster_centers_(model, x)    model = sorted_labels_(model, x)    return modeldef sorted_cluster_centers_(model, x):    model.fit(x)    new_centroids = []    magnitude = []    for center in model.cluster_centers_:        magnitude.append(np.sqrt(center.dot(center)))    idx_argsort = np.argsort(magnitude)    model.cluster_centers_ = model.cluster_centers_[idx_argsort]    return modeldef sorted_labels_(sorted_model, x):    sorted_model.labels_ = sorted_model.predict(x)    return sorted_model

Example:

import numpy as nparr = np.vstack([    100 + np.random.random((2,3)),    np.random.random((2,3)),    5 + np.random.random((3,3)),    10 + np.random.random((2,3))])print('Data:')print(arr)cluster = KMeans(n_clusters=4)print('\n Without sort:')cluster.fit(arr)print(cluster.cluster_centers_)print(cluster.labels_)print(cluster.predict([[5,5,5],[1,1,1]]))print('\n With sort:')cluster = sorted_cluster(arr, cluster)print(cluster.cluster_centers_)print(cluster.labels_)print(cluster.predict([[5,5,5],[1,1,1]]))

Output:

Data:[[100.52656263 100.57376566 100.63087757] [100.70144046 100.94095196 100.57095386] [  0.21284187   0.75623797   0.77349013] [  0.28241023   0.89878796   0.27965047] [  5.14328748   5.37025887   5.26064209] [  5.21030632   5.09597417   5.29507699] [  5.81531591   5.11629056   5.78542656] [ 10.25686526  10.64181304  10.45651994] [ 10.14153211  10.28765705  10.20653228]] Without sort:[[ 10.19919868  10.46473505  10.33152611] [100.61400155 100.75735881 100.60091572] [  0.24762605   0.82751296   0.5265703 ] [  5.38963657   5.19417453   5.44704855]][1 1 2 2 3 3 3 0 0][3 2] With sort:[[  0.24762605   0.82751296   0.5265703 ] [  5.38963657   5.19417453   5.44704855] [ 10.19919868  10.46473505  10.33152611] [100.61400155 100.75735881 100.60091572]][3 3 0 0 1 1 1 2 2][1 0]