Clustering of football players based on performance data and aggregated clustering validity indexes
Citation
Akhanli, S. E., & Hennig, C. (2023). Clustering of football players based on performance data and aggregated clustering validity indexes. Journal of Quantitative Analysis in Sports, (0).Abstract
We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020. "Comparing Clusterings and Numbers of Clusters by Aggregation of Calibrated Clustering Validity Indexes." Statistics and Computing 30 (5): 1523-44). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team's composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.
Source
JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTSCollections
Related items
Showing items related by title, author, creator and subject.
-
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
Akhanlı, Serhat Emre; Hennig, Christian (Springer, 2020)A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, ... -
A new fuzzy time series model based on robust clustering for forecasting of air pollution
Dinçer, Nevin Güler; Akkuş, Özge (Elsevier Science Bv, 2018)In this study, a new Fuzzy Time Series (FTS) model based on the Fuzzy K-Medoid (FKM) clustering algorithm is proposed in order to forecast air pollution. FTS models generally have some advantages when compared with other ... -
Classification of Cancer Types by Cluster Analysis Methods
İncekırık, Aynur; İşçi Güneri, Öznur; Durmuş, Burcu (Bahadır Fatih YILDIRIM, 2021)Cluster analysis can be defined as the group of methods that aim to classify multivariate observations by using similarity/dissimilarity measures between observations. The clusters obtained as a result of the analysis are ...