Sklearn : Mean Distance from Centroid of each cluster

python numpy scikit-learn cluster-analysis k-means

Here's one way. You can substitute another distance measure in the function for k_mean_distance() if you want another distance metric other than Euclidean.

Calculate distance between data points for each assigned cluster and cluster centers and return the mean value.

Function for distance calculation:

def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):    # Calculate Euclidean distance for each data point assigned to centroid     distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]    # return the mean value    return np.mean(distances)

And for each centroid, use the function to get the mean distance:

total_distance = []for i, (cx, cy) in enumerate(centroids):    # Function from above    mean_distance = k_mean_distance(data, cx, cy, i, cluster_labels)    total_dist.append(mean_distance)

So, in the context of your question:

def k_mean_distance(data, cx, cy, i_centroid, cluster_labels):        distances = [np.sqrt((x-cx)**2+(y-cy)**2) for (x, y) in data[cluster_labels == i_centroid]]        return np.mean(distances)t_data=PCA(n_components=2).fit_transform(array_convt)k_means=KMeans()clusters=k_means.fit_predict(t_data)centroids = km.cluster_centers_c_mean_distances = []for i, (cx, cy) in enumerate(centroids):    mean_distance = k_mean_distance(t_data, cx, cy, i, clusters)    c_mean_distances.append(mean_distance)

If you plot the results plt.plot(c_mean_distances) you should see something like this:

python numpy scikit-learn cluster-analysis k-means

alphaleonis gave nice answer.For the general case of n dimentions here is some a changes needed for his answer:

def k_mean_distance(data, cantroid_matrix, i_centroid, cluster_labels):    # Calculate Euclidean distance for each data point assigned to centroid    distances = [np.linalg.norm(x-cantroid_matrix) for x in data[cluster_labels == i_centroid]]    # return the mean value    return np.mean(distances)for i, cent_features in enumerate(centroids):            mean_distance = k_mean_distance(emb_matrix, centroid_matrix, i, kmeans_clusters)            c_mean_distances.append(mean_distance)

python numpy scikit-learn cluster-analysis k-means

You can use following Attribute of KMeans:

cluster_centers_ : array, [n_clusters, n_features]

For every point, test to what cluster it belongs using predict(X) and after that calculate distance to cluster predict returns(it returns index).

CodeHunter

Sklearn : Mean Distance from Centroid of each cluster

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last