Clustering cosine similarity matrix Clustering cosine similarity matrix python python

Clustering cosine similarity matrix


You can easily do this using spectral clustering. You can use the ready implementations such as the one in sklearn or implement it yourself. It is rather an easy algorithm.

Here is a piece of code doing it in python using sklearn:

import numpy as npfrom sklearn.cluster import SpectralClusteringmat = np.matrix([[1.,.1,.6,.4],[.1,1.,.1,.2],[.6,.1,1.,.7],[.4,.2,.7,1.]])SpectralClustering(2).fit_predict(mat)>>> array([0, 1, 0, 0], dtype=int32)

As you can see it returns the clustering you have mentioned.

The algorithm takes the top k eigenvectors of the input matrix corresponding to the largest eigenvalues, then runs the k-mean algorithm on the new matrix. Here is a simple code that does this for your matrix:

from sklearn.cluster import KMeanseigen_values, eigen_vectors = np.linalg.eigh(mat)KMeans(n_clusters=2, init='k-means++').fit_predict(eigen_vectors[:, 2:4])>>> array([0, 1, 0, 0], dtype=int32)

Note that the implementation of the algorithm in the sklearn library may differ from mine. The example I gave is the simplest way of doing it. There are some good tutorial available online describing the spectral clustering algorithm in depth.

For the cases you want the algorithm to figure out the number of clusters by itself, you can use Density Based Clustering Algorithms like DBSCAN:

from sklearn.cluster import DBSCANDBSCAN(min_samples=1).fit_predict(mat)array([0, 1, 2, 2])