Estimation of number of Clusters via gap statistics and prediction strength Estimation of number of Clusters via gap statistics and prediction strength python python

Estimation of number of Clusters via gap statistics and prediction strength


Your graph is showing the correct value of 3. Let me explain a bit

enter image description here

  • As you increase the number of clusters, your distance metric will certainly decrease. Therefore you are assuming that the correct value is 10. If you increase it to beyond 10, the distance metric will further decrease. But this should not be our decision making criteria
  • We need to find the inflection point ( here marked in RED ). It is the point where the slope smoothens out. You might want to take a look at elbow curves
  • Based on the above 2 points, the inflection point is 3 ( which is also the correct solution )

Hope this helps


you could take a look on this code and you could change your output plot format

[![# coding: utf-8# Implémentation de K-means clustering python#Chargement des bibliothèquesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.cluster import KMeansfrom sklearn import datasets#chargement de jeu des données Irisiris = datasets.load_iris()#importer le jeu de données Iris dataset à l'aide du module pandasx = pd.DataFrame(iris.data)x.columns = \['Sepal_Length','Sepal_width','Petal_Length','Petal_width'\]y = pd.DataFrame(iris.target)y.columns = \['Targets'\]#Création d'un objet K-Means avec un regroupement en 3 clusters (groupes)model=KMeans(n_clusters=3)#application du modèle sur notre jeu de données Irismodel.fit(x)#Visualisation des clustersplt.scatter(x.Petal_Length, x.Petal_width)plt.show()colormap=np.array(\['Red','green','blue'\])#Visualisation du jeu de données sans altération de ce dernier (affichage des fleurs selon leur étiquettes)plt.scatter(x.Petal_Length, x.Petal_width,c=colormap\[y.Targets\],s=40)plt.title('Classification réelle')plt.show()#Visualisation des clusters formés par K-Meansplt.scatter(x.Petal_Length, x.Petal_width,c=colormap\[model.labels_\],s=40)plt.title('Classification K-means ')plt.show()][1]][1]

Output 1Output 2