Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
Abstract
A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Source
Statistics and ComputingCollections
Related items
Showing items related by title, author, creator and subject.
-
Clustering of football players based on performance data and aggregated clustering validity indexes
Akhanlı, Serhat Emre; Hennig, Christian (WALTER DE GRUYTER GMBH, 2023)We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide ... -
A new fuzzy time series model based on robust clustering for forecasting of air pollution
Dinçer, Nevin Güler; Akkuş, Özge (Elsevier Science Bv, 2018)In this study, a new Fuzzy Time Series (FTS) model based on the Fuzzy K-Medoid (FKM) clustering algorithm is proposed in order to forecast air pollution. FTS models generally have some advantages when compared with other ... -
Classification of Cancer Types by Cluster Analysis Methods
İncekırık, Aynur; İşçi Güneri, Öznur; Durmuş, Burcu (Bahadır Fatih YILDIRIM, 2021)Cluster analysis can be defined as the group of methods that aim to classify multivariate observations by using similarity/dissimilarity measures between observations. The clusters obtained as a result of the analysis are ...