Clustering list for hclust function Clustering list for hclust function r r

Clustering list for hclust function


I will use the dataset available in R to demonstrate how to cut a tree into desired number of pieces. Result is a table.

Construct a hclust object.

hc <- hclust(dist(USArrests), "ave")#plot(hc)

You can now cut the tree into as many branches as you want. For my next trick, I will split the tree into two groups. You set the number of cuts with the k parameter. See ?cutree and the use of paramter h which may be more useful to you (see cutree(hc, k = 2) == cutree(hc, h = 110)).

cutree(hc, k = 2)       Alabama         Alaska        Arizona       Arkansas     California              1              1              1              2              1       Colorado    Connecticut       Delaware        Florida        Georgia              2              2              1              1              2         Hawaii          Idaho       Illinois        Indiana           Iowa              2              2              1              2              2         Kansas       Kentucky      Louisiana          Maine       Maryland              2              2              1              2              1  Massachusetts       Michigan      Minnesota    Mississippi       Missouri              2              1              2              1              2        Montana       Nebraska         Nevada  New Hampshire     New Jersey              2              2              1              2              2     New Mexico       New York North Carolina   North Dakota           Ohio              1              1              1              2              2       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina              2              2              2              2              1   South Dakota      Tennessee          Texas           Utah        Vermont              2              2              2              2              2       Virginia     Washington  West Virginia      Wisconsin        Wyoming              2              2              2              2              2


lets say,

y<-dist(x)clust<-hclust(y)groups<-cutree(clust, k=3)x<-cbind(x,groups)

now you will get for each record, the cluster group. You can subset the dataset as well:

x1<- subset(x, groups==1)x2<- subset(x, groups==2)x3<- subset(x, groups==3)