Skip to main content

Advances in Data Analysis and Classification OnlineFirst articles

Open Access 13.04.2024 | Regular Article

Multidimensional scaling for big data

We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets. MDS is a family of dimensionality reduction techniques using a $$n \times n$$ n × n distance matrix as input, where n is the number of individuals, and …

verfasst von:
Pedro Delicado, Cristian Pachón-García

13.04.2024 | Regular Article

Comparison of internal evaluation criteria in hierarchical clustering of categorical data

The paper discusses eleven internal evaluation criteria that can be used in the area of hierarchical clustering of categorical data. The criteria are divided into two distinct groups based on how they treat the cluster quality: variability- and …

verfasst von:
Zdenek Sulc, Jaroslav Hornicek, Hana Rezankova, Jana Cibulkova

Open Access 12.04.2024 | Regular Article

View selection in multi-view stacking: choosing the meta-learner

Multi-view stacking is a framework for combining information from different views (i.e. different feature sets) describing the same set of objects. In this framework, a base-learner algorithm is trained on each view separately, and their …

verfasst von:
Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij

30.03.2024 | Regular Article

Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data

This work presents a novel undersampling scheme to tackle the imbalance problem in multi-label datasets. We use the principles of the natural nearest neighborhood and follow a paradigm of label-specific undersampling. Natural-nearest neighborhood …

verfasst von:
Payel Sadhukhan, Sarbani Palit

Open Access 29.03.2024 | Regular Article

Entropy-based fuzzy clustering of interval-valued time series

This paper proposes a fuzzy C-medoids-based clustering method with entropy regularization to solve the issue of grouping complex data as interval-valued time series. The dual nature of the data, that are both time-varying and interval-valued …

verfasst von:
Vincenzina Vitale, Pierpaolo D’Urso, Livia De Giovanni, Raffaele Mattera

27.03.2024 | Regular Article

Clustering ensemble extraction: a knowledge reuse framework

Clustering ensemble combines several fundamental clusterings with a consensus function to produce the final clustering without gaining access to data features. The quality and diversity of a vast library of base clusterings influence the …

verfasst von:
Mohaddeseh Sedghi, Ebrahim Akbari, Homayun Motameni, Touraj Banirostam

Open Access 16.03.2024 | Regular Article

Mixtures of regressions using matrix-variate heavy-tailed distributions

Finite mixtures of regressions (FMRs) are powerful clustering devices used in many regression-type analyses. Unfortunately, real data often present atypical observations that make the commonly adopted normality assumption of the mixture components …

verfasst von:
Salvatore D. Tomarchio, Michael P. B. Gallaugher

12.03.2024 | Regular Article

Clustering by deep latent position model with graph convolutional network

With the significant increase of interactions between individuals through numeric means, clustering of nodes in graphs has become a fundamental approach for analyzing large and complex networks. In this work, we propose the deep latent position …

verfasst von:
Dingge Liang, Marco Corneli, Charles Bouveyron, Pierre Latouche

07.03.2024 | Regular Article

Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion

The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been …

verfasst von:
Jianhua Zhao, Changchun Shang, Shulan Li, Ling Xin, Philip L. H. Yu

Open Access 06.03.2024 | Regular Article

Estimators of various kappa coefficients based on the unbiased estimator of the expected index of agreements

To measure the degree of agreement between R observers who independently classify n subjects within K categories, various kappa-type coefficients are often used. When R = 2, it is common to use the Cohen' kappa, Scott's pi, Gwet’s AC1/2, and …

verfasst von:
A. Martín Andrés, M. Álvarez Hernández

27.02.2024 | Editorial

Special issue on “advances in models and learning for clustering and classification”

verfasst von:
Luis-Angel García-Escudero, Salvatore Ingrassia, T. Brendan Murphy

22.02.2024 | Regular Article

Spatial quantile clustering of climate data

In the era of climate change, the distribution of climate variables evolves with changes not limited to the mean value. Consequently, clustering algorithms based on central tendency could produce misleading results when used to summarize spatial …

verfasst von:
Carlo Gaetan, Paolo Girardi, Victor Muthama Musau

Open Access 12.02.2024 | Regular Article

Robust functional logistic regression

Functional logistic regression is a popular model to capture a linear relationship between binary response and functional predictor variables. However, many methods used for parameter estimation in functional logistic regression are sensitive to …

verfasst von:
Berkay Akturk, Ufuk Beyaztas, Han Lin Shang, Abhijit Mandal

Open Access 07.02.2024 | Regular Article

Neural networks with functional inputs for multi-class supervised classification of replicated point patterns

A spatial point pattern is a collection of points observed in a bounded region of the Euclidean plane or space. With the dynamic development of modern imaging methods, large datasets of point patterns are available representing for example …

verfasst von:
Kateřina Pawlasová, Iva Karafiátová, Jiří Dvořák

Open Access 31.01.2024 | Regular Article

k-means clustering for persistent homology

Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram. It has recently gained much popularity from its myriad successful …

verfasst von:
Yueqi Cao, Prudence Leung, Anthea Monod

Open Access 17.01.2024 | Regular Article

RGA: a unified measure of predictive accuracy

A key point to assess statistical forecasts is the evaluation of their predictive accuracy. Recently, a new measure, called Rank Graduation Accuracy (RGA), based on the concordance between the ranks of the predicted values and the ranks of the …

verfasst von:
Paolo Giudici, Emanuela Raffinetti

18.12.2023 | Regular Article

QDA classification of high-dimensional data with rare and weak signals

This paper addresses the two-class classification problem for data with rare and weak signals, under the modern high-dimension setup $$p>>n$$ p > > n . Considering the two-component mixture of Gaussian features with different random mean vector of …

verfasst von:
Hanning Chen, Qiang Zhao, Jingjing Wu

Open Access 15.12.2023 | Regular Article

Loss-guided stability selection

In modern data analysis, sparse model selection becomes inevitable once the number of predictor variables is very high. It is well-known that model selection procedures like the Lasso or Boosting tend to overfit on real data. The celebrated …

verfasst von:
Tino Werner

14.12.2023 | Regular Article

A fresh look at mean-shift based modal clustering

Modal clustering is an unsupervised learning technique where cluster centers are identified as the local maxima of nonparametric probability density estimates. A natural algorithmic engine for the computation of these maxima is the mean shift …

verfasst von:
Jose Ameijeiras-Alonso, Jochen Einbeck

08.12.2023 | Regular Article

A probabilistic method for reconstructing the Foreign Direct Investments network in search of ultimate host economies

The Ultimate Host Economies (UHEs) of a given country are defined as the ultimate destinations of Foreign Direct Investment (FDI) originating in that country. Bilateral FDI statistics struggle to identify them due to the non-negligible presence of …

verfasst von:
Nadia Accoto, Valerio Astuti, Costanza Catalano