Skip to main content

Data Mining and Knowledge Discovery OnlineFirst articles

Open Access 19-04-2024

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

In 2017, a research paper (Bagnall et al. Data Mining and Knowledge Discovery 31(3):606-660. 2017) compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study …

Authors:
Matthew Middlehurst, Patrick Schäfer, Anthony Bagnall

Open Access 10-04-2024

Lost in the Forest: Encoding categorical variables and the absent levels problem

Levels of a predictor variable that are absent when a classification tree is grown can not be subject to an explicit splitting rule. This is an issue if these absent levels are present in a new observation for prediction. To date, there remains no …

Authors:
Helen L. Smith, Patrick J. Biggs, Nigel P. French, Adam N. H. Smith, Jonathan C. Marshall

Open Access 01-04-2024

Time series clustering with random convolutional kernels

Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for …

Authors:
Marco-Blanco Jorge, Cuevas Rubén

Open Access 29-03-2024

A comparative study of methods for estimating model-agnostic Shapley value explanations

Shapley values originated in cooperative game theory but are extensively used today as a model-agnostic explanation framework to explain predictions made by complex machine learning models in the industry and academia. There are several …

Authors:
Lars Henry Berge Olsen, Ingrid Kristine Glad, Martin Jullum, Kjersti Aas

Open Access 25-03-2024

Interpretable linear dimensionality reduction based on bias-variance analysis

One of the central issues of several machine learning applications on real data is the choice of the input features. Ideally, the designer should select a small number of the relevant, nonredundant features to preserve the complete information …

Authors:
Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

22-03-2024

MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data

We introduce MCCE: $${{{\underline{\varvec{M}}}}}$$ M ̲ onte $${{{\underline{\varvec{C}}}}}$$ C ̲ arlo sampling of valid and realistic $${{{\underline{\varvec{C}}}}}$$ C ̲ ounterfactual $${{{\underline{\varvec{E}}}}}$$ E ̲ xplanations for tabular …

Authors:
Annabelle Redelmeier, Martin Jullum, Kjersti Aas, Anders Løland

Open Access 18-03-2024

Binary quantification and dataset shift: an experimental investigation

Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the …

Authors:
Pablo González, Alejandro Moreo, Fabrizio Sebastiani

15-03-2024

Online concept evolution detection based on active learning

Concept evolution detection is an important and difficult problem in streaming data mining. When the labeled samples in streaming data insufficient to reflect the training data distribution, it will often further restrict the detection …

Authors:
Husheng Guo, Hai Li, Lu Cong, Wenjian Wang

Open Access 27-02-2024

Marginal effects for non-linear prediction functions

Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models such as generalized linear models, the estimated coefficients cannot be interpreted as a direct feature …

Authors:
Christian A. Scholbeck, Giuseppe Casalicchio, Christoph Molnar, Bernd Bischl, Christian Heumann

Open Access 27-02-2024 | Correction

Correction to: Bias characterization, assessment, and mitigation in location-based recommender systems

Authors:
Pablo Sánchez, Alejandro Bellogín, Ludovico Boratto

22-02-2024

Learning a Bayesian network with multiple latent variables for implicit relation representation

Artificial intelligence applications could be more powerful and comprehensive by incorporating the ability of inference, which could be achieved by probabilistic inference over implicit relations. It is significant yet challenging to represent …

Authors:
Xinran Wu, Kun Yue, Liang Duan, Xiaodong Fu

19-02-2024

MMA: metadata supported multi-variate attention for onset detection and prediction

Deep learning has been applied successfully in sequence understanding and translation problems, especially in univariate, unimodal contexts, where large number of supervision data are available. The effectiveness of deep learning in more complex …

Authors:
Manjusha Ravindranath, K. Selçuk Candan, Maria Luisa Sapino, Brian Appavu

15-02-2024

Structural learning of simple staged trees

Bayesian networks faithfully represent the symmetric conditional independences existing between the components of a random vector. Staged trees are an extension of Bayesian networks for categorical random vectors whose graph represents …

Authors:
Manuele Leonelli, Gherardo Varando

Open Access 09-02-2024

Universal representation learning for multivariate time series using the instance-level and cluster-level supervised contrastive learning

The multivariate time series classification (MTSC) task aims to predict a class label for a given time series. Recently, modern deep learning-based approaches have achieved promising performance over traditional methods for MTSC tasks. The success …

Authors:
Nazanin Moradinasab, Suchetha Sharma, Ronen Bar-Yoseph, Shlomit Radom-Aizik, Kenneth C. Bilchick, Dan M. Cooper, Arthur Weltman, Donald E. Brown

Open Access 06-02-2024

Revealing the structural behaviour of Brunelleschi’s Dome with machine learning techniques

The Brunelleschi’s Dome is one of the most iconic symbols of the Renaissance and is among the largest masonry domes ever constructed. Since the late 17th century, first masonry cracks appeared on the Dome, giving the start to a monitoring …

Authors:
Stefano Masini, Silvia Bacci, Fabrizio Cipollini, Bruno Bertaccini

05-02-2024

MASS: distance profile of a query over a time series

Given a long time series, the distance profile of a query time series computes distances between the query and every possible subsequence of a long time series. MASS (Mueen’s Algorithm for Similarity Search) is an algorithm to efficiently compute …

Authors:
Sheng Zhong, Abdullah Mueen

31-01-2024

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their …

Authors:
Rafael Gomes Mantovani, Tomáš Horváth, André L. D. Rossi, Ricardo Cerri, Sylvio Barbon Junior, Joaquin Vanschoren, André C. P. L. F. de Carvalho

31-01-2024

Central node identification via weighted kernel density estimation

The detection of central nodes in a network is a fundamental task in network science and graph data analysis. During the past decades, numerous centrality measures have been presented to characterize what is a central node. However, few studies …

Authors:
Yan Liu, Xue Feng, Jun Lou, Lianyu Hu, Zengyou He

Open Access 24-01-2024 | Correction

Correction to: Effective signal reconstruction from multiple ranked lists via convex optimization

Authors:
Michael G. Schimek, Luca Vitale, Bastian Pfeifer, Michele La Rocca

19-01-2024

Fusing structural information with knowledge enhanced text representation for knowledge graph completion

Although knowledge graphs store a large number of facts in the form of triplets, they are still limited by incompleteness. Hence, Knowledge Graph Completion (KGC), defined as inferring missing entities or relations based on observed facts, has …

Authors:
Kang Tang, Shasha Li, Jintao Tang, Dong Li, Pancheng Wang, Ting Wang