Skip to main content
Top

2023 | Book

Machine Learning and Knowledge Discovery in Databases: Research Track

European Conference, ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Proceedings, Part V

Editors: Danai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, Francesco Bonchi

Publisher: Springer Nature Switzerland

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The multi-volume set LNAI 14169 until 14175 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2023, which took place in Turin, Italy, in September 2023.

The 196 papers were selected from the 829 submissions for the Research Track, and 58 papers were selected from the 239 submissions for the Applied Data Science Track.

The volumes are organized in topical sections as follows:

Part I: Active Learning; Adversarial Machine Learning; Anomaly Detection; Applications; Bayesian Methods; Causality; Clustering.

Part II: ​Computer Vision; Deep Learning; Fairness; Federated Learning; Few-shot learning; Generative Models; Graph Contrastive Learning.

Part III: ​Graph Neural Networks; Graphs; Interpretability; Knowledge Graphs; Large-scale Learning.

Part IV: ​Natural Language Processing; Neuro/Symbolic Learning; Optimization; Recommender Systems; Reinforcement Learning; Representation Learning.

Part V: ​Robustness; Time Series; Transfer and Multitask Learning.

Part VI: ​Applied Machine Learning; Computational Social Sciences; Finance; Hardware and Systems; Healthcare & Bioinformatics; Human-Computer Interaction; Recommendation and Information Retrieval.

​Part VII: Sustainability, Climate, and Environment.- Transportation & Urban Planning.- Demo.

Table of Contents

Frontmatter

Robustness

Frontmatter
MMA: Multi-Metric-Autoencoder for Analyzing High-Dimensional and Incomplete Data
Abstract
High-dimensional and incomplete (HDI) data usually arise in various complex applications, e.g., bioinformatics and recommender systems, making them commonly heterogeneous and inclusive. Deep neural networks (DNNs)-based approaches have provided state-of-the-art representation learning performance on HDI data. However, most prior studies adopt fixed and exclusive \(L_2\)-norm-oriented loss and regularization terms. Such a single-metric-oriented model yields limited performance on heterogeneous and inclusive HDI data. Motivated by this, we propose a Multi-Metric-Autoencoder (MMA) whose main ideas are two-fold: 1) employing different \(L_p\)-norms to build four variant Autoencoders, each of which resides in a unique metric representation space with different loss and regularization terms, and 2) aggregating these Autoencoders with a tailored, self-adaptive weighting strategy. Theoretical analysis guarantees that our MMA could attain a better representation from a set of dispersed metric spaces. Extensive experiments on four real-world datasets demonstrate that our MMA significantly outperforms seven state-of-the-art models. Our code is available at the link https://​github.​com/​wudi1989/​MMA/​
Cheng Liang, Di Wu, Yi He, Teng Huang, Zhong Chen, Xin Luo
Exploring and Exploiting Data-Free Model Stealing
Abstract
Deep machine learning models, e.g., image classifier, are increasingly deployed in the wild to provide services to users. Adversaries are shown capable of stealing the knowledge of these models by sending inference queries and then training substitute models based on query results. The availability and quality of adversarial query inputs are undoubtedly crucial in the stealing process. The recent prior art demonstrates the feasibility of replacing real data by exploring the synthetic adversarial queries, so called data-free attacks, under strong adversarial assumptions, i.e., the deployed classier returns not only class labels but also class probabilities. In this paper, we consider a general adversarial model and propose an effective data-free stealing algorithm, TandemGAN, which not only explores synthetic queries but also explicitly exploits the high quality ones. The core of TandemGAN is composed of (i) substitute model which imitates the target model through synthetic queries and their inferred labels; and (ii) a tandem generator consisting of two networks, \(\mathcal {G}_x\) and \(\mathcal {G}_e\), which first explores the synthetic data space via \(\mathcal {G}_x\) and then exploits high-quality examples via \(\mathcal {G}_e\) to maximize the knowledge transfer from the target to the substitute model. Our results on four datasets show that the accuracy of our trained substitute model ranges between 96–67% of the target model and outperforms the existing state-of-the-art data-free model stealing approach by up to 2.5X.
Chi Hong, Jiyue Huang, Robert Birke, Lydia Y. Chen
Exploring the Training Robustness of Distributional Reinforcement Learning Against Noisy State Observations
Abstract
In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the Kullback-Leibler (KL) divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpar (Code is available in https://​github.​com/​datake/​RobustDistRL. The extended version of the paper is in https://​arxiv.​org/​abs/​2109.​08776.).
Ke Sun, Yingnan Zhao, Shangling Jui, Linglong Kong
Overcoming the Limitations of Localization Uncertainty: Efficient and Exact Non-linear Post-processing and Calibration
Abstract
Robustly and accurately localizing objects in real-world environments can be challenging due to noisy data, hardware limitations, and the inherent randomness of physical systems. To account for these factors, existing works estimate the aleatoric uncertainty of object detectors by modeling their localization output as a Gaussian distribution \(\mathcal {N}(\mu ,\,\sigma ^{2})\,\), and training with loss attenuation. We identify three aspects that are unaddressed in the state of the art, but warrant further exploration: (1) the efficient and mathematically sound propagation of \(\mathcal {N}(\mu ,\,\sigma ^{2})\,\) through non-linear post-processing, (2) the calibration of the predicted uncertainty, and (3) its interpretation. We overcome these limitations by: (1) implementing loss attenuation in EfficientDet, and proposing two deterministic methods for the exact and fast propagation of the output distribution, (2) demonstrating on the KITTI and BDD100K datasets that the predicted uncertainty is miscalibrated, and adapting two calibration methods to the localization task, and (3) investigating the correlation between aleatoric uncertainty and task-relevant error sources. Our contributions are: (1) up to five times faster propagation while increasing localization performance by up to 1%, (2) up to fifteen times smaller expected calibration error, and (3) the predicted uncertainty is found to correlate with occlusion, object distance, detection accuracy, and image quality.
Moussa Kassem Sbeyti, Michelle Karg, Christian Wirth, Azarm Nowzad, Sahin Albayrak
Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching
Abstract
Quantification learning deals with the task of estimating the target label distribution under label shift. In this paper, we first present a unifying framework, distribution feature matching (DFM), that recovers as particular instances various estimators introduced in previous literature. We derive a general performance bound for DFM procedures, improving in several key aspects upon previous bounds derived in particular cases. We then extend this analysis to study robustness of DFM procedures in the misspecified setting under departure from the exact label shift hypothesis, in particular in the case of contamination of the target by an unknown distribution. These theoretical findings are confirmed by a detailed numerical study on simulated and real-world datasets. We also introduce an efficient, scalable and robust version of kernel-based DFM using Random Fourier Features.
Bastien Dussap, Gilles Blanchard, Badr-Eddine Chérief-Abdellatif
Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance
Abstract
Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.
Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta
DualMatch: Robust Semi-supervised Learning with Dual-Level Interaction
Abstract
Semi-supervised learning provides an expressive framework for exploiting unlabeled data when labels are insufficient. Previous semi-supervised learning methods typically match model predictions of different data-augmented views in a single-level interaction manner, which highly relies on the quality of pseudo-labels and results in semi-supervised learning not robust. In this paper, we propose a novel SSL method called DualMatch, in which the class prediction jointly invokes feature embedding in a dual-level interaction manner. DualMatch requires consistent regularizations for data augmentation, specifically, 1) ensuring that different augmented views are regulated with consistent class predictions, and 2) ensuring that different data of one class are regulated with similar feature embeddings. Extensive experiments demonstrate the effectiveness of DualMatch. In the standard SSL setting, the proposal achieves 9% error reduction compared with SOTA methods, even in a more challenging class-imbalanced setting, the proposal can still achieve 6% error reduction. Code is available at https://​github.​com/​CWangAI/​DualMatch.
Cong Wang, Xiaofeng Cao, Lanzhe Guo, Zenglin Shi
Detecting Evasion Attacks in Deployed Tree Ensembles
Abstract
Tree ensembles are powerful models that are widely used. However, they are susceptible to evasion attacks where an adversary purposely constructs an adversarial example in order to elicit a misprediction from the model. This can degrade performance and erode a user’s trust in the model. Typically, approaches try to alleviate this problem by verifying how robust a learned ensemble is or robustifying the learning process. We take an alternative approach and attempt to detect adversarial examples in a post-deployment setting. We present a novel method for this task that works by analyzing an unseen example’s output configuration, which is the set of leaves activated by the example in the ensemble’s constituent trees. Our approach works with any additive tree ensemble and does not require training a separate model. We evaluate our approach on three different tree ensemble learners. We empirically show that our method is currently the best adversarial detection method for tree ensembles.
Laurens Devos, Lorenzo Perini, Wannes Meert, Jesse Davis

Time Series

Frontmatter
Deep Imbalanced Time-Series Forecasting via Local Discrepancy Density
Abstract
Time-series forecasting models often encounter abrupt changes in a given period of time which generally occur due to unexpected or unknown events. Despite their scarce occurrences in the training set (i.e., data imbalance), abrupt changes incur loss that significantly contributes to the total loss (i.e., heteroscedasticity). Therefore, they act as noisy training samples and prevent the model from learning generalizable patterns, namely the normal states. To resolve overfitting problem posed by heteroscedasticity and data imbalance, we propose a reweighting framework that down-weights the losses incurred by abrupt changes and up-weights those by normal states. For the reweighting framework, we first define a measurement termed Local Discrepancy (LD) which measures the degree of abruptness of a change in a given period of time. Since a training set is mostly composed of normal states, we then consider how frequently the temporal changes appear in the training set based on LD (i.e., estimated LD density). Our reweighting framework is applicable to existing time-series forecasting models regardless of the architectures. Through extensive experiments on 12 time-series forecasting models over eight datasets with various in-output sequence lengths, we demonstrate that applying our reweighting framework reduces MSE by 10.1% on average and by up to 18.6% in the state-of-the-art model.
Junwoo Park, Jungsoo Lee, Youngin Cho, Woncheol Shin, Dongmin Kim, Jaegul Choo, Edward Choi
Online Deep Hybrid Ensemble Learning for Time Series Forecasting
Abstract
The complex and changing nature of time series data renders forecasting one of the most challenging tasks in time series analysis. It is also commonly acknowledged that no single ML model can be perfectly appropriate for all applications. One solution to tackle this issue is to learn a heterogeneous ensemble by combining a diverse set of forecasters. In addition, ML models usually reveal a time-dependent performance. This can be explained by the fact that different models have varying regions of expertise or so-called Regions of Competence (RoCs) over the time series. In this paper, we propose a novel online deep hybrid ensemble architecture for time series forecasting. The architecture is composed of convolutional layers for learning new enriched time series representation connected to a pool of heterogeneous models composed of classical ML models and neural nets. The models are combined using a weighted average, where the weights are set in a timely adaptive manner using their pre-computed RoCs. The RoCs are computed using a gradient-based approach that maps the performance of these models to input regions in the time series and can therefore be exploited to generate saliency maps that provide suitable explanations for particular ensemble aggregation, i.e., weights setting at a particular time. The RoCs are updated in an informed manner following drift detection in the time series. An extensive empirical study on various real-world datasets demonstrates that our method achieves excellent or on-par results in comparison to the state-of-the-art approaches as well as several baselines.
Amal Saadallah, Matthias Jakobs
Sparse Transformer Hawkes Process for Long Event Sequences
Abstract
Large quantities of asynchronous event sequence data such as crime records, emergence call logs, and financial transactions are becoming increasingly available from various fields. These event sequences often exhibit both long-term and short-term temporal dependencies. Variations of neural network based temporal point processes have been widely used for modeling such asynchronous event sequences. However, many current architectures including attention based point processes struggle with long event sequences due to computational inefficiency. To tackle the challenge, we propose an efficient sparse transformer Hawkes process (STHP), which has two components. For the first component, a transformer with a novel temporal sparse self-attention mechanism is applied to event sequences with arbitrary intervals, mainly focusing on short-term dependencies. For the second component, a transformer is applied to the time series of aggregated event counts, primarily targeting the extraction of long-term periodic dependencies. Both components complement each other and are fused together to model the conditional intensity function of a point process for future event forecasting. Experiments on real-world datasets show that the proposed STHP outperforms baselines and achieves significant improvement in computational efficiency without sacrificing prediction performance for long sequences.
Zhuoqun Li, Mingxuan Sun
Adacket: ADAptive Convolutional KErnel Transform for Multivariate Time Series Classification
Abstract
While existing multivariate time series classification (MTSC) methods using massive convolutional kernels show promise, they are resource-intensive and also rely on the trial and error design of convolutional kernels, limiting comprehensive design space exploration. This hinders fully exploiting convolutional kernels for feature extraction from multivariate time series (MTS) data. To address this issue, we propose a novel method called Adaptive Convolutional Kernel Transform (Adacket) to automatically design efficient 1D dilated convolutional kernels for various MTSC scenarios. Adacket formulates the design problem as a multi-objective optimization problem, with a focus on performance and resource efficiency jointly. It introduces a reinforcement learning agent to adaptively determine convolutional kernels in a sequential decision-making manner, and creates multi-action spaces to support comprehensive search in both the channel and time dimensions. By exploring the maximum value of multi-objective rewards within continuous action spaces, Adacket achieves high granularity establishment of convolutional kernels. Empirical evaluations on public UEA archives demonstrate that Adacket outperforms other advanced MTSC baselines, while providing a deeper understanding of its design selections.
Junru Zhang, Lang Feng, Haowen Zhang, Yuhan Wu, Yabo Dong
Efficient Adaptive Spatial-Temporal Attention Network for Traffic Flow Forecasting
Abstract
Urban traffic flow prediction is a challenging task in the field of intelligent transportation and spatio-temporal data analysis. Accurate prediction of traffic states by leveraging sophisticated spatio-temporal patterns is critical. However, existing methods ignore the local validity of dynamic spatio-temporal auto-correlations, resulting in bottlenecks in the performance and efficiency of the model. In this work, we investigate the effects of dominant as well as invalid spatio-temporal patterns and propose a spatio-temporal forecasting framework. Specifically, we propose a dominant spatial-temporal attention mechanism, which extends the empirical approximation of Kullback-Leibler divergence to the spatial-temporal domain to optimize the computational efficiency of the attention mechanism, and identifies locally valid associations through dominant query generation. Meanwhile, we theoretically demonstrate the validity of the extension. Furthermore, we design an adaptive spatial-temporal fusion embedding scheme to generate heterogeneous and synchronous traffic states without pre-defined graph structures. We further propose an Efficient Adaptive Spatial-Temporal Attention Network (EASTAN) to capture fine-grained spatio-temporal dependencies based on the above modules and perform sequential forecasting. Extensive experiments (Code and appendix available at: https://​github.​com/​ecmlpkdd2023/​EASTAN) on four real-world datasets show that the proposed framework improves the prediction accuracy by 3.31%–48.93%, and significantly reduces the training time as well as model parameters compared to state-of-the-arts.
Hongyang Su, Xiaolong Wang, Qingcai Chen, Yang Qin
Estimating Dynamic Time Warping Distance Between Time Series with Missing Data
Abstract
Many techniques for analyzing time series rely on some notion of similarity between two time series, such as Dynamic Time Warping (DTW) distance. But DTW cannot handle missing values, and simple fixes (e.g., dropping missing values, or interpolating) fail when entire intervals are missing, as is often the case with, e.g., temporary sensor or communication failures. There is hardly any research on how to address this problem. In this paper, we propose two hyperparameter-free techniques to estimate the DTW distance between time series with missing values. The first technique, DTW-AROW, significantly decreases the impact of missing values on the DTW distance by modifying the optimization problem in the DTW algorithm. The second technique, DTW-CAI, can further improve upon DTW-AROW by exploiting additional contextual information when that is available (more specifically, more time series from the same population). We show that, on multiple datasets, the proposed techniques outperform existing techniques in estimating pairwise DTW distances as well as in classification and clustering tasks based on these distances. The proposed techniques can enable many machine learning algorithms to more accurately handle time series with missing values.
Aras Yurtman, Jonas Soenen, Wannes Meert, Hendrik Blockeel
Uncovering Multivariate Structural Dependency for Analyzing Irregularly Sampled Time Series
Abstract
Predictive analytics on Irregularly Sampled Multivariate Time Series (IS-MTS) presents a challenging problem in many real-world applications. Previous methods have primarily focused on incorporating temporal information into prediction while little effort is made to exploit the intrinsic structural information interchange among different IS-MTS at the same or across different timestamps. Recent developments in graph-based learning have shown promise in modeling spatial and structural dependencies of graph data. However, when applied to IS-MTS, they face significant challenges due to the complex data characteristics: 1) variable time intervals between observations; 2) asynchronous time points across dimensions resulting in missing values; 3) a lack of prior knowledge of connectivity structure for information propagation. To address these challenges, we propose a multivariate temporal graph network that coherently captures structural interactions, learns time-aware dependencies, and handles challenging characteristics of IS-MTS data. Specifically, we first develop a multivariate interaction module that handles the frequent missing values and adaptively extracts graph structural relations using a novel reinforcement learning module. Second, we design a correlation-aware neighborhood aggregation mechanism to capture within and across time dependencies and structural interactions. Third, we construct a novel masked time-aware self-attention to explicitly consider timestamp information and interval irregularity for determining optimal attention weights and distinguishing the influence of observation embeddings. Based on an extensive experimental evaluation, we demonstrate that our method outperforms a variety of competitors for the IS-MTS classification task.
Zhen Wang, Ting Jiang, Zenghui Xu, Jianliang Gao, Ou Wu, Ke Yan, Ji Zhang
Weighted Multivariate Mean Reversion for Online Portfolio Selection
Abstract
Portfolio selection is a fundamental task in finance and it is to seek the best allocation of wealth among a basket of assets. Nowadays, Online portfolio selection has received increasing attention from both AI and machine learning communities. Mean reversion is an essential property of stock performance. Hence, most state-of-the-art online portfolio strategies have been built based on this. Though they succeed in specific datasets, most of the existing mean reversion strategies applied the same weights on samples in multiple periods and considered each of the assets separately, ignoring the data noise from short-lived events, trend changing in the time series data, and the dependence of multi-assets. To overcome these limitations, in this paper, we exploit the reversion phenomenon with multivariate robust estimates and propose a novel online portfolio selection strategy named “Weighted Multivariate Mean Reversion” (WMMR) (Code is available at: https://​github.​com/​boqian333/​WMMR).. Empirical studies on various datasets show that WMMR has the ability to overcome the limitations of existing mean reversion algorithms and achieve superior results.
Boqian Wu, Benmeng Lyu, Jiawen Gu
H-Nets: Hyper-hodge Convolutional Neural Networks for Time-Series Forecasting
Abstract
Hypergraphs recently have emerged as a new promising alternative to describe complex dependencies in spatio-temporal processes, resulting in the newest trend in multivariate time series forecasting, based semi-supervised learning of spatio-temporal data with Hypergraph Convolutional Networks. Nevertheless, such recent approaches are often limited in their capability to accurately describe higher-order interactions among spatio-temporal entities and to learn hidden interrelations among network substructures. Motivated by the emerging results on simplicial convolution, we introduce the concepts of Hodge theory and Hodge Laplacians, that is, a higher-order generalization of the graph Laplacian, to hypergraph learning. Furthermore, we develop a novel framework for hyper-simplex-graph representation learning which describes complex relationships among both graph and hyper-simplex-graph simplices and, as a result, simultaneously extracts latent higher-order spatio-temporal dependencies. We provide theoretical foundations behind the proposed hyper-simplex-graph representation learning and validate our new Hodge-style Hyper-simplex-graph Neural Networks (H\(^2\)-Nets) on 7 real world spatio-temporal benchmark datasets. Our experimental results indicate that H\(^2\)-Nets outperforms the state-of-the-art methods by a significant margin, while demonstrating lower computational costs.
Yuzhou Chen, Tian Jiang, Yulia R. Gel

Transfer and Multitask Learning

Frontmatter
Overcoming Catastrophic Forgetting for Fine-Tuning Pre-trained GANs
Abstract
The great transferability of DNNs has induced a popular paradigm of “pre-training & fine-tuning”, by which a data-scarce task can be performed much more easily. However, compared to the existing efforts made in the context of supervised transfer learning, fewer explorations have been made on effectively fine-tuning pre-trained Generative Adversarial Networks (GANs). As reported in recent empirical studies, fine-tuning GANs faces the similar challenge of catastrophic forgetting as in supervised transfer learning. This causes a severe capacity loss of the pre-trained model when adapting it to downstream datasets. While most existing approaches suggest to directly interfere parameter updating, this paper introduces novel schemes from another perspective, i.e. inputs and features, thus essentially focuses on data aspect. Firstly, we adopt a trust-region method to smooth the adaptation dynamics by progressively adjusting input distributions, aiming to avoid dramatic parameter changes, especially when the pre-trained GAN has no information of target data. Secondly, we aim to avoid the loss of the diversity of the generated results of the fine-tuned GAN. This is achieved by explicitly encouraging generated images to encompass diversified spectral components in their deep features. We theoretically study the rationale of the proposed schemes, and conduct extensive experiments on popular transfer learning benchmarks to demonstrate the superiority of the schemes. The code and corresponding supplemental materials are available at https://​github.​com/​zezeze97/​Transfer-Pretrained-Gan.
Zeren Zhang, Xingjian Li, Tao Hong, Tianyang Wang, Jinwen Ma, Haoyi Xiong, Cheng-Zhong Xu
Unsupervised Domain Adaptation via Bidirectional Cross-Attention Transformer
Abstract
Unsupervised Domain Adaptation (UDA) seeks to utilize the knowledge acquired from a source domain, abundant in labeled data, and apply it to a target domain that contains only unlabeled data. The majority of existing UDA research focuses on learning domain-invariant feature representations for both domains by minimizing the domain gap using convolution-based neural networks. Recently, vision transformers have made significant strides in enhancing performance across various visual tasks. In this paper, we introduce a Bidirectional Cross-Attention Transformer (BCAT) for UDA, which is built upon vision transformers with the goal of improving performance. The proposed BCAT employs an attention mechanism to extract implicit source and target mixup feature representations, thereby reducing the domain discrepancy. More specifically, BCAT is designed as a weight-sharing quadruple-branch transformer with a bidirectional cross-attention mechanism, allowing it to learn domain-invariant feature representations. Comprehensive experiments indicate that our proposed BCAT model outperforms existing state-of-the-art UDA methods, both convolution-based and transformer-based, on four benchmark datasets.
Xiyu Wang, Pengxin Guo, Yu Zhang
Multiple-Source Adaptation Using Variational Rényi Bound Optimization
Abstract
Multiple Source Adaptation (MSA) is a problem that involves identifying a predictor which minimizes the error for the target domain while utilizing the predictors from the source domains. In practice, the source domains typically exhibit varying probability distributions across the input space and are unknown to the learner. Consequently, accurate probability estimates are essential for effectively addressing the MSA problem. To this end, variation inference is an attractive approach that aims to approximate probability densities. Traditionally, it is done by maximizing a lower bound for the likelihood of the observed data (evidence), i.e. maximizing the Evidence Lower BOund (ELBO). Recently, researchers have proposed optimizing the Variational Rényi bound (VR) instead of ELBO, which can be biased or difficult to approximate due to high variance. To address these issues, we propose a new upper bound called Variational Rényi Log Upper bound (VRLU). Unlike existing VR bounds, the VRLU bound maintains the upper bound property when using the Monte Carlo (MC) approximation. Additionally, we introduce the Variational Rényi Sandwich (VRS) method, which jointly optimizes an upper and a lower bound, resulting in a more accurate density estimate. Following this, we apply the VRS density estimate to the MSA problem. We show, both theoretically and empirically, that using VRS estimators provides tighter error bounds and improved performance, compared to leading MSA methods.
Dana Zalman (Oshri), Shai Fine
Match-And-Deform: Time Series Domain Adaptation Through Optimal Transport and Temporal Alignment
Abstract
While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time warping. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies.
François Painblanc, Laetitia Chapel, Nicolas Courty, Chloé Friguet, Charlotte Pelletier, Romain Tavenard
Bi-tuning: Efficient Transfer from Pre-trained Models
Abstract
It is a de facto practice in the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows the natural intuition that both the discriminative knowledge and the intrinsic structure of the downstream task can be useful for fine-tuning. However, existing fine-tuning methods mainly leverage the former and discard the latter. A natural question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning approach that is capable of fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
Jincheng Zhong, Haoyu Ma, Ximei Wang, Zhi Kou, Mingsheng Long

Open Access

Generality-Training of a Classifier for Improved Calibration in Unseen Contexts
Abstract
Artificial neural networks tend to output class probabilities that are miscalibrated, i.e., their reported uncertainty is not a very good indicator of how much we should trust the model. Consequently, methods have been developed to improve the model’s predictive uncertainty, both during training and post-hoc. Even if the model is calibrated on the domain used in training, it typically becomes over-confident when applied on slightly different target domains, e.g. due to perturbations or shifts in the data. The model can be recalibrated for a fixed list of target domains, but its performance can still be poor on unseen target domains. To address this issue, we propose a generality-training procedure that learns a modified head for the neural network to achieve better calibration generalization to new domains while retaining calibration performance on the given domains. This generality-head is trained on multiple domains using a new objective function with increased emphasis on the calibration loss compared to cross-entropy. Such training results in a more general model in the sense of not only better calibration but also better accuracy on unseen domains, as we demonstrate experimentally on multiple datasets. The code and supplementary for the paper is available (https://​github.​com/​bsl-traveller/​CaliGen.​git).
Bhawani Shankar Leelar, Meelis Kull
Informed Priors for Knowledge Integration in Trajectory Prediction
Abstract
Informed learning approaches explicitly integrate prior knowledge into learning systems, which can reduce data needs and increase robustness. However, existing work typically aims to integrate formal scientific knowledge by directly pruning the problem space, which is infeasible for more intuitive world and expert knowledge, or requires specific architecture changes and knowledge representations. We propose a probabilistic informed learning approach to integrate prior world and expert knowledge without these requirements. Our approach repurposes continual learning methods to operationalize Baye’s rule for informed learning and to enable probabilistic and multi-modal predictions. We exemplify our proposal in an application to two state-of-the-art trajectory predictors for autonomous driving. This safety-critical domain is subject to an overwhelming variety of rare scenarios requiring robust and accurate predictions. We evaluate our models on a public benchmark dataset and demonstrate that our approach outperforms non-informed and informed learning baselines. Notably, we can compete with a conventional baseline, even using only half as many observations of the training dataset.
Christian Schlauch, Christian Wirth, Nadja Klein
CAENet: Efficient Multi-task Learning for Joint Semantic Segmentation and Depth Estimation
Abstract
In this paper, we propose an efficient multi-task method, named Context-aware Attentive Enrichment Network (CAENet), to deal with the problem of real-time joint semantic segmentation and depth estimation. Building upon a light-weight encoder backbone, an efficient decoder is devised to fully leverage available information from multi-scale encoder features. In particular, a new Inception Residual Pooling (IRP) module is designed to efficiently extract contextual information from the high-level features with diverse receptive fields to improve semantic understanding ability. Then the context-aware features are enriched adaptively with spatial details from low-level features via a Light-weight Attentive Fusion (LAF) module using pseudo stereoscopic attention mechanism. These two modules are progressively used in a recursive manner to generate high-resolution shared features, which are further processed by task-specific heads to produce final outputs. Such network design effectively captures beneficial information for both semantic segmentation and depth estimation tasks while largely reducing the computational budget. Extensive experiments across multi-task benchmarks validate that CAENet achieves state-of-the-art performance with comparable inference speed against other real-time competing methods. Code is available at https://​github.​com/​wlx-zju/​CAENet.
Luxi Wang, Yingming Li
Click-Aware Structure Transfer with Sample Weight Assignment for Post-Click Conversion Rate Estimation
Abstract
Post-click Conversion Rate (CVR) prediction task plays an essential role in industrial applications, such as recommendation and advertising. Conventional CVR methods typically suffer from the data sparsity problem as they rely only on samples where the user has clicked. To address this problem, researchers have introduced the method of multi-task learning, which utilizes non-clicked samples and shares feature representations of the Click-Through Rate (CTR) task with the CVR task. However, it should be noted that the CVR and CTR tasks are fundamentally different and may even be contradictory. Therefore, introducing a large amount of CTR information without distinction may drown out valuable information related to CVR. This phenomenon is called the curse of knowledge problem in this paper. To tackle this issue, we argue that a trade-off should be achieved between the introduction of large amounts of auxiliary information and the protection of valuable information related to CVR. Hence, we propose a Click-aware Structure Transfer model with sample Weight Assignment, abbreviated as CSTWA. It pays more attention to the latent structure information, which could refine the input information related to CVR, instead of directly sharing feature representations. Meanwhile, to capture the representation conflict between CTR and CVR, we calibrate the representation layer and reweight the discriminant layer to excavate the click bias information from the CTR tower. Moreover, it incorporates a sample weight assignment algorithm biased towards CVR modeling, to make the knowledge from CTR would not mislead the CVR. Extensive experiments on industrial and public datasets have demonstrated that CSTWA significantly outperforms widely used and competitive models.
Kai Ouyang, Wenhao Zheng, Chen Tang, Xuanji Xiao, Hai-Tao Zheng
Constrained-HIDA: Heterogeneous Image Domain Adaptation Guided by Constraints
Abstract
Supervised deep learning relies heavily on the existence of a huge amount of labelled data, which in many cases is difficult to obtain. Domain adaptation deals with this problem by learning on a labelled dataset and applying that knowledge to another, unlabelled or scarcely labelled dataset, with a related but different probability distribution. Heterogeneous domain adaptation is an especially challenging area where domains lie in different input spaces. These methods are very interesting for the field of remote sensing (and indeed computer vision in general), where a variety of sensors are used, capturing images of different modalities, different spatial and spectral resolutions, and where labelling is a very expensive process. With two heterogeneous domains, however, unsupervised domain adaptation is difficult to perform, and class-flipping is frequent. At least a small amount of labelled data is therefore necessary in the target domain in many cases. This work proposes loosening the label requirement by labelling the target domain with must-link and cannot-link constraints instead of class labels. Our method Constrained-HIDA, based on constraints, contrastive loss, and learning domain invariant features, shows that a significant performance improvement can be achieved by using a very small number of constraints. This demonstrates that a reduced amount of information, in the form of constraints, is as effective as giving class labels. Moreover, this paper shows the benefits of interactive supervision—assigning constraints to the samples from classes that are known to be prone to flipping can further reduce the necessary amount of constraints.
Mihailo Obrenović, Thomas Lampert, Miloš Ivanović, Pierre Gançarski
Backmatter
Metadata
Title
Machine Learning and Knowledge Discovery in Databases: Research Track
Editors
Danai Koutra
Claudia Plant
Manuel Gomez Rodriguez
Elena Baralis
Francesco Bonchi
Copyright Year
2023
Electronic ISBN
978-3-031-43424-2
Print ISBN
978-3-031-43423-5
DOI
https://doi.org/10.1007/978-3-031-43424-2

Premium Partner