nach oben

2023 | Buch

Kapitel lesen Erstes Kapitel lesen

Machine Learning and Knowledge Discovery in Databases

European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV

herausgegeben von: Massih-Reza Amini, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, Grigorios Tsoumakas

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The multi-volume set LNAI 13713 until 13718 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2022, which took place in Grenoble, France, in September 2022.

The 236 full papers presented in these proceedings were carefully reviewed and selected from a total of 1060 submissions. In addition, the proceedings include 17 Demo Track contributions.

The volumes are organized in topical sections as follows:

Part I: Clustering and dimensionality reduction; anomaly detection; interpretability and explainability; ranking and recommender systems; transfer and multitask learning;

Part II: Networks and graphs; knowledge graphs; social network analysis; graph neural networks; natural language processing and text mining; conversational systems;

Part III: Deep learning; robust and adversarial machine learning; generative models; computer vision; meta-learning, neural architecture search;

Part IV: Reinforcement learning; multi-agent reinforcement learning; bandits and online learning; active and semi-supervised learning; private and federated learning;

Part V: Supervised learning; probabilistic inference; optimal transport; optimization; quantum, hardware; sustainability;

Part VI: Time series; financial machine learning; applications; applications: transportation; demo track.

Inhaltsverzeichnis

Frontmatter

Reinforcement Learning

Frontmatter

Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation

Electric Vehicle (EV) charging recommendation that both accommodates user preference and adapts to the ever-changing external environment arises as a cost-effective strategy to alleviate the range anxiety of private EV drivers. Previous studies focus on centralized strategies to achieve optimized resource allocation, particularly useful for privacy-indifferent taxi fleets and fixed-route public transits. However, private EV driver seeks a more personalized and resource-aware charging recommendation that is tailor-made to accommodate the user preference (when and where to charge) yet sufficiently adaptive to the spatiotemporal mismatch between charging supply and demand. Here we propose a novel Regularized Actor-Critic (RAC) charging recommendation approach that would allow each EV driver to strike an optimal balance between the user preference (historical charging pattern) and the external reward (driving distance and wait time). Experimental results on two real-world datasets demonstrate the unique features and superior performance of our approach to the competing methods.

Chengyin Li, Zheng Dong, Nathan Fisher, Dongxiao Zhu

Multi-Objective Actor-Critics for Real-Time Bidding in Display Advertising

Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches.

Haolin Zhou, Chaoqi Yang, Xiaofeng Gao, Qiong Chen, Gongshen Liu, Guihai Chen

Batch Reinforcement Learning from Crowds

A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for the lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward by learning a reward function from preferences between trajectories. Generating preferences only requires a basic understanding of a task, and it is faster than performing demonstrations. Thus, preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.

Guoxi Zhang, Hisashi Kashima

Oracle-SAGE: Planning Ahead in Graph-Based Deep Reinforcement Learning

Deep reinforcement learning (RL) commonly suffers from high sample complexity and poor generalisation, especially with high-dimensional (image-based) input. Where available (such as some robotic control domains), low dimensional vector inputs outperform their image based counterparts, but it is challenging to represent complex dynamic environments in this manner. Relational reinforcement learning instead represents the world as a set of objects and the relations between them; offering a flexible yet expressive view which provides structural inductive biases to aid learning. Recently relational RL methods have been extended with modern function approximation using graph neural networks (GNNs). However, inherent limitations in the processing model for GNNs result in decreased returns when important information is dispersed widely throughout the graph. We outline a hybrid learning and planning model which uses reinforcement learning to propose and select subgoals for a planning model to achieve. This includes a novel action selection mechanism and loss function to allow training around the non-differentiable planner. We demonstrate our algorithms effectiveness on a range of domains, including MiniHack and a challenging extension of the classic taxi domain.

Andrew Chester, Michael Dann, Fabio Zambetta, John Thangarajah

Reducing the Planning Horizon Through Reinforcement Learning

Planning is a computationally expensive process, which can limit the reactivity of autonomous agents. Planning problems are usually solved in isolation, independently of similar, previously solved problems. The depth of search that a planner requires to find a solution, known as the planning horizon, is a critical factor when integrating planners into reactive agents. We consider the case of an agent repeatedly carrying out a task from different initial states. We propose a combination of classical planning and model-free reinforcement learning to reduce the planning horizon over time. Control is smoothly transferred from the planner to the model-free policy as the agent compiles the planner’s policy into a value function. Local exploration of the model-free policy allows the agent to adapt to the environment and eventually overcome model inaccuracies. We evaluate the efficacy of our framework on symbolic PDDL domains and a stochastic grid world environment and show that we are able to significantly reduce the planning horizon while improving upon model inaccuracies.

Logan Dunbar, Benjamin Rosman, Anthony G. Cohn, Matteo Leonetti

State Representation Learning for Goal-Conditioned Reinforcement Learning

This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Compared to previous methods, our approach does not require any domain knowledge, learning from offline and unlabeled data. We show how this representation can be leveraged to learn goal-conditioned policies, providing a notion of similarity between states and goals and a useful heuristic distance to guide planning and reinforcement learning algorithms. Finally, we empirically validate our method in classic control domains and multi-goal environments, demonstrating that our method can successfully learn representations in large and/or continuous domains.

Lorenzo Steccanella, Anders Jonsson

Bootstrap State Representation Using Style Transfer for Better Generalization in Deep Reinforcement Learning

Deep Reinforcement Learning (RL) agents often overfit the training environment, leading to poor generalization performance. In this paper, we propose Thinker, a bootstrapping method to remove adversarial effects of confounding features from the observation in an unsupervised way, and thus, it improves RL agents’ generalization. Thinker first clusters experience trajectories into several clusters. These trajectories are then bootstrapped by applying a style transfer generator, which translates the trajectories from one cluster’s style to another while maintaining the content of the observations. The bootstrapped trajectories are then used for policy learning. Thinker has wide applicability among many RL settings. Experimental results reveal that Thinker leads to better generalization capability in the Procgen benchmark environments compared to base algorithms and several data augmentation techniques.

Md Masudur Rahman, Yexiang Xue

Imitation Learning with Sinkhorn Distances

Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation .

Georgios Papagiannis, Yunpeng Li

Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance

Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.

Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami, Toru Namerikawa

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) environments – Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) – under general function approximations. In the model selection framework, we do not know the function classes, denoted by $$\mathcal {F}$$ F and $$\mathcal {M}$$ M , where the true models – reward generating function for MABs and transition kernel for MDPs – lie, respectively. Instead, we are given M nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that adapt to the smallest function class (among the nested M classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $$\mathcal {F}$$ F and $$\mathcal {M}$$ M ) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon T.

Avishek Ghosh, Sayak Ray Chowdhury

Multi-agent Reinforcement Learning

Frontmatter

Heterogeneity Breaks the Game: Evaluating Cooperation-Competition with Multisets of Agents

The value of an agent for a team can vary significantly depending on the heterogeneity of the team and the kind of game: cooperative, competitive, or both. Several evaluation approaches have been introduced in some of these scenarios, from homogeneous competitive multi-agent systems, using a simple average or sophisticated ranking protocols, to completely heterogeneous cooperative scenarios, using the Shapley value. However, we lack a general evaluation metric to address situations with both cooperation and (asymmetric) competition, and varying degrees of heterogeneity (from completely homogeneous teams to completely heterogeneous teams with no repeated agents) to better understand whether multi-agent learning agents can adapt to this diversity. In this paper, we extend the Shapley value to incorporate both repeated players and competition. Because of the combinatorial explosion of team multisets and opponents, we analyse several sampling strategies, which we evaluate empirically. We illustrate the new metric in a predator and prey game, where we show that the gain of some multi-agent reinforcement learning agents for homogeneous situations is lost when operating in heterogeneous teams.

Yue Zhao, José Hernández-Orallo

Constrained Multiagent Reinforcement Learning for Large Agent Population

Learning control policies for a large number of agents in a decentralized setting is challenging due to partial observability, uncertainty in the environment, and scalability challenges. While several scalable multiagent RL (MARL) methods have been proposed, relatively few approaches exist for large scale constrained MARL settings. To address this, we first formulate the constrained MARL problem in a collective multiagent setting where interactions among agents are governed by the aggregate count and types of agents, and do not depend on agents’ specific identities. Second, we show that standard Lagrangian relaxation methods, which are popular for single agent RL, do not perform well in constrained MARL settings due to the problem of credit assignment—how to identify and modify behavior of agents that contribute most to constraint violations (and also optimize primary objective alongside)? We develop a fictitious MARL method that addresses this key challenge. Finally, we evaluate our approach on two large-scale real-world applications: maritime traffic management and vehicular network routing. Empirical results show that our approach is highly scalable, can optimize the cumulative global reward and effectively minimize constraint violations, while also being significantly more sample efficient than previous best methods.

Jiajing Ling, Arambam James Singh, Nguyen Duc Thien, Akshat Kumar

Reinforcement Learning for Multi-Agent Stochastic Resource Collection

Stochastic Resource Collection (SRC) describes tasks where an agent tries to collect a maximal amount of dynamic resources while navigating through a road network. An instance of SRC is the traveling officer problem (TOP), where a parking officer tries to maximize the number of fined parking violations. In contrast to vehicular routing problems, in SRC tasks, resources might appear and disappear by an unknown stochastic process, and thus, the task is inherently more dynamic. In most applications of SRC, such as TOP, covering realistic scenarios requires more than one agent. However, directly applying multi-agent approaches to SRC yields challenges considering temporal abstractions and inter-agent coordination. In this paper, we propose a novel multi-agent reinforcement learning method for the task of Multi-Agent Stochastic Resource Collection (MASRC). To this end, we formalize MASRC as a Semi-Markov Game which allows the use of temporal abstraction and asynchronous actions by various agents. In addition, we propose a novel architecture trained with independent learning, which integrates the information about collaborating agents and allows us to take advantage of temporal abstractions. Our agents are evaluated on the multiple traveling officer problem, an instance of MASRC where multiple officers try to maximize the number of fined parking violations. Our simulation environment is based on real-world sensor data. Results demonstrate that our proposed agent can beat various state-of-the-art approaches.

Niklas Strauss, David Winkel, Max Berrendorf, Matthias Schubert

Team-Imitate-Synchronize for Solving Dec-POMDPs

Multi-agent collaboration under partial observability is a difficult task. Multi-agent reinforcement learning (MARL) algorithms that do not leverage a model of the environment struggle with tasks that require sequences of collaborative actions, while Dec-POMDP algorithms that use such models to compute near-optimal policies, scale poorly. In this paper, we suggest the Team-Imitate-Synchronize (TIS) approach, a heuristic, model-based method for solving such problems. Our approach begins by solving the joint team problem, assuming that observations are shared. Then, for each agent we solve a single agent problem designed to imitate its behavior within the team plan. Finally, we adjust the single agent policies for better synchronization. Our experiments demonstrate that our method provides comparable solutions to Dec-POMDP solvers over small problems, while scaling to much larger problems, and provides collaborative plans that MARL algorithms are unable to identify.

Eliran Abdoo, Ronen I. Brafman, Guy Shani, Nitsan Soffair

DistSPECTRL: Distributing Specifications in Multi-Agent Reinforcement Learning Systems

While notable progress has been made in specifying and learning objectives for general cyber-physical systems, applying these methods to distributed multi-agent systems still pose significant challenges. Among these are the need to (a) craft specification primitives that allow expression and interplay of both local and global objectives, (b) tame explosion in the state and action spaces to enable effective learning, and (c) minimize coordination frequency and the set of engaged participants for global objectives. To address these challenges, we propose a novel specification framework that allows natural composition of local and global objectives used to guide training of a multi-agent system. Our technique enables learning expressive policies that allow agents to operate in a coordination-free manner for local objectives, while using a decentralized communication protocol for enforcing global ones. Experimental results support our claim that sophisticated multi-agent distributed planning problems can be effectively realized using specification-guided learning. Code is provided at https://github.com/yokian/distspectrl .

Joe Eappen, Suresh Jagannathan

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang

Bandits and Online Learning

Frontmatter

Hierarchical Unimodal Bandits

We study a multi-armed bandit problem with clustered arms and a unimodal reward structure, which has applications in millimeter wave (mmWave) communication, road navigation, etc. More specifically, a set of N arms are grouped together to form C clusters, and the expected reward of arms belonging to the same cluster forms a Unimodal function (a function is Unimodal if it has only one peak, e.g. parabola). First, in the setting when $$C=1$$ C = 1 , we propose an algorithm, SGSD (Stochastic Golden Search for Discrete Arm), that has better guarantees than the prior Unimodal bandit algorithm [Yu and Mannor 2011]. Second, in the setting when $$C \ge 2$$ C ≥ 2 , we develop HUUCB (Hierarchical Unimodal Upper Confidence Bound (UCB) algorithm), an algorithm that utilizes the clustering structure of the arms and the Unimodal structure of the rewards. We show that the regret upper bound of our algorithm grows as $$O(\sqrt{CT\log (T)})$$ O ( C T log ( T ) ) , which can be significantly smaller than UCB’s $$O(\sqrt{NT\log (T)})$$ O ( N T log ( T ) ) regret guarantee. We perform a multi-channel mmWave communication simulation to evaluate our algorithm. Our simulation results confirm the advantage of our algorithm over the UCB algorithm [Auer et al. 2002] and a two-level policy (TLP) proposed in prior works [Pandey et al. 2007].

Tianchi Zhao, Chicheng Zhang, Ming Li

Hypothesis Transfer in Bandits by Weighted Models

We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. That is, we assume having access to a previously learned model on an unobserved set of contexts, and we leverage it in order to accelerate exploration on a new bandit problem. Our transfer strategy is based on a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired, while recovering the classic regret rate when the two tasks are unrelated. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step. Additionally we discuss an approach where a dynamic convex combination of source models is given in terms of a biased regularization term in the classic LinUCB algorithm. The algorithms and the theoretical analysis of our proposed methods substantiated by empirical evaluations on simulated and real-world data.

Steven Bilaj, Sofien Dhouib, Setareh Maghsudi

Multi-agent Heterogeneous Stochastic Linear Bandits

It has been empirically observed in several recommendation systems, that their performance improve as more people join the system by learning across heterogeneous users. In this paper, we seek to theoretically understand this phenomenon by studying the problem of minimizing regret in an N users heterogeneous stochastic linear bandits framework. We study this problem under two models of heterogeneity; (i) a personalization framework where no two users are necessarily identical, but are all similar, and (ii) a clustering framework where users are partitioned into groups with users in the same group being identical, but different across groups. In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model and show that, any agent i whose parameter deviates from the population average by $$\epsilon _i$$ ϵ i , attains a regret scaling of $$\widetilde{O}(\epsilon _i\sqrt{T})$$ O ~ ( ϵ i T ) . In the clustered users’ setup, we propose a successive refinement algorithm, which for any agent, achieves regret scaling as $$\mathcal {O}(\sqrt{T/N})$$ O ( T / N ) , if the agent is in a ‘well separated’ cluster, or scales as $$\mathcal {O}(T^{\frac{1}{2} + \varepsilon }/(N)^{\frac{1}{2} -\varepsilon })$$ O ( T 1 2 + ε / ( N ) 1 2 - ε ) if its cluster is not well separated, where $$\varepsilon $$ ε is positive and arbitrarily close to 0. Our algorithms enjoy several attractive features of being problem complexity adaptive and parameter free—if there is structure such as well separated clusters, or all users are similar to each other, then the regret of every agent goes down with N (collaborative gain). On the other hand, in the worst case, the regret of any user is no worse than that of having individual algorithms per user that does not leverage collaborations.

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

On the Complexity of All -Best Arms Identification

We consider the question introduced by [16] of identifying all the $$\varepsilon $$ ε -optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm solving the problem with a confidence at least $$1-\delta $$ 1 - δ . The first, unimprovable in the asymptotic regime, motivates the design of a Track-and-Stop strategy whose average sample complexity is asymptotically optimal when the risk $$\delta $$ δ goes to zero. Notably, we provide an efficient numerical method to solve the convex max-min program that appears in the lower bound. Our method is based on a complete characterization of the alternative bandit instances that the optimal sampling strategy needs to rule out, thus making our bound tighter than the one provided by [16]. The second lower bound deals with the regime of high and moderate values of the risk $$\delta $$ δ , and characterizes the behavior of any algorithm in the initial phase. It emphasizes the linear dependency of the sample complexity in the number of arms. Finally, we report on numerical simulations demonstrating our algorithm’s advantage over state-of-the-art methods, even for moderate risks.

Aymen al Marjani, Tomas Kocak, Aurélien Garivier

Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

In this paper, we improve the regret bound for online kernel selection under bandit feedback. Previous algorithm enjoys a $$O((\Vert f\Vert ^2_{\mathcal {H}_i}+1)K^{\frac{1}{3}}T^{\frac{2}{3}})$$ O ( ( ‖ f ‖ H i 2 + 1 ) K 1 3 T 2 3 ) expected bound for Lipschitz loss functions. We prove two types of regret bounds improving the previous bound. For smooth loss functions, we propose an algorithm with a $$O(U^{\frac{2}{3}}K^{-\frac{1}{3}}(\sum ^K_{i=1}L_T(f^*_i))^{\frac{2}{3}})$$ O ( U 2 3 K - 1 3 ( ∑ i = 1 K L T ( f i ∗ ) ) 2 3 ) expected bound where $$L_T(f^*_i)$$ L T ( f i ∗ ) is the cumulative losses of optimal hypothesis in $$\mathbb {H}_{i} =\{f\in \mathcal {H}_i:\Vert f\Vert _{\mathcal {H}_i}\le U\}$$ H i = { f ∈ H i : ‖ f ‖ H i ≤ U } . The data-dependent bound keeps the previous worst-case bound and is smaller if most of candidate kernels match well with the data. For Lipschitz loss functions, we propose an algorithm with a $$O(U\sqrt{KT}\ln ^{\frac{2}{3}}{T})$$ O ( U KT ln 2 3 T ) expected bound asymptotically improving the previous bound. We apply the two algorithms to online kernel selection with time constraint and prove new regret bounds matching or improving the previous $$O(\sqrt{T\ln {K}} +\Vert f\Vert ^2_{\mathcal {H}_i}\max \{\sqrt{T},\frac{T}{\sqrt{\mathcal {R}}}\})$$ O ( T ln K + ‖ f ‖ H i 2 max { T , T R } ) expected bound where $$\mathcal {R}$$ R is the time budget. Finally, we empirically verify our algorithms on online regression and classification tasks.

Junfan Li, Shizhong Liao

Online Learning of Convex Sets on Graphs

We study online learning of general convex sets and halfspaces on graphs. While online learning of halfspaces in Euclidean space is a classical learning problem, the corresponding problem on graphs is understudied. In this context, a set of vertices is convex if it contains all connecting shortest paths and a halfspace is a convex set whose complement is also convex. We discuss mistake bounds based on the Halving algorithm and shortest path covers. Halving achieves near-optimal bounds but is inefficient in general. The shortest path cover based algorithm is efficient but provides optimal bounds only for restricted graph families such as trees. To mitigate the weaknesses of both approaches, we propose a novel polynomial time algorithm which achieves near-optimal bounds on graphs that are $$K_{2,k}$$ K 2 , k minor-free for some constant $$k\in \mathbb {N}$$ k ∈ N . In contrast to previous mistake bounds on graphs, which typically depend on the induced cut of the labelling, our bounds only depend on the graph itself. Finally, we discuss the agnostic version of this problem and introduce an adaptive variant of Halving for k-intersections of halfspaces.

Maximilian Thiessen, Thomas Gärtner

Active and Semi-supervised Learning

Frontmatter

Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning

We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for input data consisting of a small amount of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually either encounter suboptimal performance because of an improper graph constructed from input data or are impractical due to the high-computational complexity of solving large-scale optimization problems. In this paper, we propose to address both problems by constructing a novel graph of input data for graph-based SSL methods. A density-based approach is proposed to learn a latent graph from input data. Based on the latent graph, a novel graph construction approach is proposed to construct the graph of input data by an efficient formula. With this formula, two transductive graph-based SSL methods are devised with the computational complexity linear in the number of input data points. Extensive experiments on synthetic data and real datasets demonstrate that the proposed methods not only are scalable for large-scale data, but also achieve good classification performance, especially for an extremely small number of labeled data.

Zitong Wang, Li Wang, Raymond Chan, Tieyong Zeng

Near Out-of-Distribution Detection for Low-Resolution Radar Micro-doppler Signatures

Near out-of-distribution detection (OODD) aims at discriminating semantically similar data points without the supervision required for classification. This paper puts forward an OODD use case for radar targets detection extensible to other kinds of sensors and detection scenarios. We emphasize the relevance of OODD and its specific supervision requirements for the detection of a multimodal, diverse targets class among other similar radar targets and clutter in real-life critical systems. We propose a comparison of deep and non-deep OODD methods on simulated low-resolution pulse radar micro-doppler signatures, considering both a spectral and a covariance matrix input representation. The covariance representation aims at estimating whether dedicated second-order processing is appropriate to discriminate signatures. The potential contributions of labeled anomalies in training, self-supervised learning, contrastive learning insights and innovative training losses are discussed, and the impact of training set contamination caused by mislabelling is investigated.

Martin Bauw, Santiago Velasco-Forero, Jesus Angulo, Claude Adnet, Olivier Airiau

SemiITE: Semi-supervised Individual Treatment Effect Estimation via Disagreement-Based Co-training

Recent years have witnessed a surge of interests in Individual Treatment Effect (ITE) estimation, which aims to estimate the causal effect of a treatment (e.g., job training) on an outcome (e.g., employment status) for each individual (e.g., an employee). Various machine learning based methods have been proposed recently and have achieved satisfactory performance of ITE estimation from observational data. However, most of these methods overwhelmingly rely on a large amount of data with labeled treatment assignments and corresponding outcomes. Unfortunately, a significant amount of labeled observational data can be difficult to collect in real-world applications due to time and expense constraints. In this paper, we propose a Semi-supervised Individual Treatment Effect estimation (SemiITE) framework with a disagreement-based co-training style, which aims to utilize massive unlabeled data to better infer the factual and counterfactual outcomes of each instance with limited labeled data. Extensive experiments on two widely used real-world datasets validate the superiority of our SemiITE over the state-of-the-art ITE estimation models.

Qiang Huang, Jing Ma, Jundong Li, Huiyan Sun, Yi Chang

Multi-task Adversarial Learning for Semi-supervised Trajectory-User Linking

Trajectory-User Linking (TUL), which aims to link the trajectories to the users who have generated them, is critically important to many real applications. Existing approaches generally consider TUL as a supervised learning problem which requires a large number of labeled trajectory-user pairs. However, in real scenarios users may not be willing to make their identities publicly available due to data privacy concerns, leading to the scarcity of labeled trajectory-user pairs. In addition, the trajectory data are usually sparse as users will not always check-in when they go to POIs. To address these issues, in this paper we propose a multi-task adversarial learning model named TULMAL for semi-supervised TUL with spare trajectory data. Specifically, TULMAL first conducts sparse trajectory completion through a proposed seq2seq model. Kalman filter is also coupled into the decoder of the seq2seq model to calibrate the generated new locations. The completed trajectories are next input into a generative adversarial learning model for semi-supervised TUL. The insight is that we consider all the users and their trajectories as a whole and perform TUL in the data distribution level. We first project users and trajectories into the common latent feature space through learning a projection function (generator) to minimize the distance between the user distribution and the trajectory distribution. Then each unlabeled trajectory will be linked to the user who is closest to it in the latent feature space without much guidance of labels. The two tasks are jointly conducted and optimized under a multi-task learning framework. Extensive experimental results on two real-world trajectory datasets demonstrate the superiority of our proposal by comparison with existing approaches.

Sen Zhang, Senzhang Wang, Xiang Wang, Shigeng Zhang, Hao Miao, Junxing Zhu

Consistent and Tractable Algorithm for Markov Network Learning

Markov network (MN) structured output classifiers provide a transparent and powerful way to model dependencies between output labels. The MN classifiers can be learned using the M3N algorithm, which, however, is not statistically consistent and requires expensive fully annotated examples. We propose an algorithm to learn MN classifiers that is based on Fisher-consistent adversarial loss minimization. Learning is transformed into a tractable convex optimization that is amenable to standard gradient methods. We also extend the algorithm to learn from examples with missing labels. We show that the extended algorithm remains convex, tractable, and statistically consistent.

Vojtech Franc, Daniel Prusa, Andrii Yermakov

Deep Active Learning for Detection of Mercury’s Bow Shock and Magnetopause Crossings

Accurate and timely detection of bow shock and magnetopause crossings is essential for understanding the dynamics of a planet’s magnetosphere. However, for Mercury, due to the variable nature of its magnetosphere, this remains a challenging task. Existing approaches based on geometric equations only provide average boundary shapes, and can be hard to generalise to environments with variable conditions. On the other hand, data-driven methods require large amounts of annotated data to account for variations, which can scale up the costs quickly. We propose to solve this problem with machine learning. To this end, we introduce a suitable dataset, prepared by processing raw measurements from NASA’s MESSENGER (MErcury Surface, Space Environment, GEochemistry, and Ranging) mission and design a five-class supervised learning problem. We perform an architectural search to find a suitable model, and report our best model, a Convolutional Recurrent Neural Network (CRNN), achieves a macro F1 score of 0.82 with accuracies of approximately 80% and 88% on the bow shock and magnetopause crossings, respectively. Further, we introduce an approach based on active learning that includes only the most informative orbits from the MESSENGER dataset measured by Shannon entropy. We observe that by employing this technique, the model is able to obtain near maximal information gain by training on just two Mercury years worth of data, which is about 10% of the entire dataset. This has the potential to significantly reduce the need for manual labeling. This work sets the ground for future machine learning endeavors in this direction and may be highly relevant to future missions such as BepiColombo, which is expected to enter orbit around Mercury in December 2025.

Sahib Julka, Nikolas Kirschstein, Michael Granitzer, Alexander Lavrukhin, Ute Amerstorfer

Open Access

A Stopping Criterion for Transductive Active Learning

In transductive active learning, the goal is to determine the correct labels for an unlabeled, known dataset. Therefore, we can either ask an oracle to provide the right label at some cost or use the prediction of a classifier which we train on the labels acquired so far. In contrast, the commonly used (inductive) active learning aims to select instances for labeling out of the unlabeled set to create a generalized classifier, which will be deployed on unknown data. This article formally defines the transductive setting and shows that it requires new solutions. Additionally, we formalize the theoretically cost-optimal stopping point for the transductive scenario. Building upon the probabilistic active learning framework, we propose a new transductive selection strategy that includes a stopping criterion and show its superiority.

Daniel Kottke, Christoph Sandrock, Georg Krempl, Bernhard Sick

PDF Zum Volltext

Multi-domain Active Learning for Semi-supervised Anomaly Detection

Active learning aims to ease the burden of collecting large amounts of annotated data by intelligently acquiring labels during the learning process that will be most helpful to learner. Current active learning approaches focus on learning from a single dataset. However, a common setting in practice requires simultaneously learning models from multiple datasets, where each dataset requires a separate learned model. This paper tackles the less-explored multi-domain active learning setting. We approach this from the perspective of multi-armed bandits and propose the active learning bandits (Alba) method, which uses bandit methods to both explore and exploit the usefulness of querying a label from different datasets in subsequent query rounds. We evaluate our approach on a benchmark of 7 datasets collected from a retail environment, in the context of a real-world use case of detecting anomalous resource usage. Alba outperforms existing active learning strategies, providing evidence that the standard active learning approaches are less suitable for the multi-domain setting.

Vincent Vercruyssen, Lorenzo Perini, Wannes Meert, Jesse Davis

CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection

Recently, contrastive learning with data and class augmentations has been shown to produce markedly better results for out-of-distribution (OOD) detection than previous approaches. However, a major shortcoming of this approach is that it is extremely slow due to the significant increase in data size and in the number of classes and the quadratic pairwise similarity computation. This paper shows that this heavy machinery is unnecessary. A novel approach, called CMG (Class-Mixed Generation), is proposed, which generates pseudo-OOD data by mixing class embeddings as abnormal conditions to CVAE (conditional variational Auto-Encoder) and then uses the data to fine-tune a classifier built using the given in-distribution (IND) data. To our surprise, the obvious approach of using the IND data and the pseudo-OOD data to directly train an OOD model is a very poor choice. The fine-tuning based approach turns out to be markedly better. Empirical evaluation shows that CMG not only produces new state-of-the-art results but also is much more efficient than contrastive learning, at least 10 times faster (Code is available at: https://github.com/shaoyijia/CMG ).

Mengyu Wang, Yijia Shao, Haowei Lin, Wenpeng Hu, Bing Liu

GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction

Data imbalance, i.e., some classes may have much fewer samples than others, is a serious problem that can lead to unfavorable node classification. However, most existing GNNs are based on the assumption that node samples for different classes are balanced. In this case, directly training a GNN classifier with raw data would under-represent samples from those minority classes and result in sub-optimal performance. This paper proposes GraphMixup, a novel mixup-based framework for improving class-imbalanced node classification on graphs. However, directly performing mixup in the input space or embedding space may produce out-of-domain samples due to the extreme sparsity of minority classes; hence we construct semantic relation spaces that allow Feature Mixup to be performed at the semantic level. Moreover, we apply two context-based self-supervised techniques to capture both local and global information in the graph structure and specifically propose Edge Mixup to handle graph data. Finally, we develop a Reinforcement Mixup mechanism to adaptively determine how many samples are to be generated by mixup for those minority classes. Extensive experiments on three real-world datasets have shown that GraphMixup yields truly encouraging results for the task of class-imbalanced node classification. Codes are available at: https://github.com/LirongWu/GraphMixup .

Lirong Wu, Jun Xia, Zhangyang Gao, Haitao Lin, Cheng Tan, Stan Z. Li

Private and Federated Learning

Frontmatter

Non-IID Distributed Learning with Optimal Mixture Weights

Distributed learning can well solve the problem of training model with large-scale data, which has attracted much attention in recent years. However, most existing distributed learning algorithms set uniform mixture weights across clients when aggregating the global model, which impairs the accuracy under Non-IID (Not Independently or Identically Distributed) setting. In this paper, we present a general framework to optimize the mixture weights and show that our framework has lower expected loss than the uniform mixture weights framework theoretically. Moreover, we provide strong generalization guarantee for our framework, where the excess risk bound can converge at $$\mathcal {O}(1/n)$$ O ( 1 / n ) , which is as fast as centralized training. Motivated by the theoretical findings, we propose a novel algorithm to improve the performance of distributed learning under Non-IID setting. Through extensive experiments, we show that our algorithm outperforms other mainstream methods, which coincides with our theory.

Jian Li, Bojian Wei, Yong Liu, Weiping Wang

Marginal Release Under Multi-party Personalized Differential Privacy

Given a set of local datasets held by multiple parties, we study the problem of learning marginals over the integrated dataset while satisfying differential privacy for each local dataset. Different from existing works in the multi-party setting, our work allows the parties to have different privacy preferences for their data, which is referred to as the multi-party personalized differential privacy (PDP) problem. The existing solutions to PDP problems in the centralized setting mostly adopt sampling-based approaches. However, extending similar ideas to the multi-party setting cannot satisfactorily solve our problem. On the one hand, the data owned by multiple parties are usually not identically distributed. Sampling-based approaches will incur a serious distortion in the results. On the other hand, when the parties hold different attributes of the same set of individuals, sampling at the tuple level cannot meet parties’ personalized privacy requirements for different attributes. To address the above problems, we first present a mixture-of-multinomials-based marginal calculation approach, where the global marginals over the stretched datasets are formalized as a multinomial mixture model. As such, the global marginals over the original datasets can be reconstructed based on the calculated model parameters with high accuracy. We then propose a privacy budget segmentation method, which introduces a privacy division composition strategy from the view of attributes to make full use of each party’s privacy budget while meeting personalized privacy requirements for different attributes. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.

Peng Tang, Rui Chen, Chongshi Jin, Gaoyuan Liu, Shanqing Guo

Beyond Random Selection: A Perspective from Model Inversion in Personalized Federated Learning

With increasing concern for privacy issues in data, federated learning has emerged as one of the most prevalent approaches to collaboratively train statistical models without disclosing raw data. However, heterogeneity among clients in federated learning hinders optimization convergence and generalization performance. For example, clients usually differ in data distributions, network conditions, input/output dimensions, and model architectures, leading to the misalignment of clients’ participation in training and degrading the model performance. In this work, we propose PFedRe, a personalized approach that introduces individual relevance, measured by Wasserstein distances among dummy datasets, into client selection in federated learning. The server generates dummy datasets from the inversion of local model updates, identifies clients with large distribution divergences, and aggregates updates from high relevant clients. Theoretically, we perform a convergence analysis of PFedRe and quantify how selection affects the convergence rate. We empirically demonstrate the efficacy of our framework on a variety of non-IID datasets. The results show that PFedRe outperforms other client selection baselines in the context of heterogeneous settings.

Zichen Ma, Yu Lu, Wenye Li, Shuguang Cui

Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles

Differentially private decision tree algorithms have been popular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of operation comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.

Zhanliang Huang, Yunwen Lei, Ata Kabán

Differentially Private Bayesian Neural Networks on Accuracy, Privacy and Reliability

Bayesian neural network (BNN) allows for uncertainty quantification in prediction, offering an advantage over regular neural networks that has not been explored in the differential privacy (DP) framework. We fill this important gap by leveraging recent development in Bayesian deep learning and privacy accounting to offer a more precise analysis of the trade-off between privacy and accuracy in BNN. We propose three DP-BNNs that characterize the weight uncertainty for the same network architecture in distinct ways, namely DP-SGLD (via the noisy gradient method), DP-BBP (via changing the parameters of interest) and DP-MC Dropout (via the model architecture). Interestingly, we show a new equivalence between DP-SGD and DP-SGLD, implying that some non-Bayesian DP training naturally allows for uncertainty quantification. However, the hyperparameters such as learning rate and batch size, can have different or even opposite effects in DP-SGD and DP-SGLD.Extensive experiments are conducted to compare DP-BNNs, in terms of privacy guarantee, prediction accuracy, uncertainty quantification, calibration, computation speed, and generalizability to network architecture. As a result, we observe a new tradeoff between the privacy and the reliability. When compared to non-DP and non-Bayesian approaches, DP-SGLD is remarkably accurate under strong privacy guarantee, demonstrating the great potential of DP-BNN in real-world tasks.

Qiyiwen Zhang, Zhiqi Bu, Kan Chen, Qi Long

Differentially Private Federated Combinatorial Bandits with Constraints

There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from others, but the part of the information it shares for others to learn from could be sensitive; thus, it desires its privacy. This work investigates a group of agents working concurrently to solve similar combinatorial bandit problems while maintaining quality constraints. Can these agents collectively learn while keeping their sensitive information confidential by employing differential privacy? We observe that communicating can reduce the regret. However, differential privacy techniques for protecting sensitive information makes the data noisy and may deteriorate than help to improve regret. Hence, we note that it is essential to decide when to communicate and what shared data to learn to strike a functional balance between regret and privacy. For such a federated combinatorial MAB setting, we propose a Privacy-preserving Federated Combinatorial Bandit algorithm, P-FCB. We illustrate the efficacy of P-FCB through simulations. We further show that our algorithm provides an improvement in terms of regret while upholding quality threshold and meaningful privacy guarantees.

Sambhav Solanki, Samhita Kanaparthy, Sankarshan Damle, Sujit Gujar

Backmatter

Titel: Machine Learning and Knowledge Discovery in Databases
herausgegeben von: Massih-Reza Amini
Stéphane Canu
Asja Fischer
Tias Guns
Petra Kralj Novak
Grigorios Tsoumakas
Verlag: Springer Nature Switzerland
Electronic ISBN: 978-3-031-26412-2
Print ISBN: 978-3-031-26411-5
DOI: https://doi.org/10.1007/978-3-031-26412-2

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Reinforcement Learning

Frontmatter

Coupling User Preference with External Rewards to Enable Driver-centered and Resource-aware EV Charging Recommendation

Multi-Objective Actor-Critics for Real-Time Bidding in Display Advertising

Batch Reinforcement Learning from Crowds

Oracle-SAGE: Planning Ahead in Graph-Based Deep Reinforcement Learning

Reducing the Planning Horizon Through Reinforcement Learning

State Representation Learning for Goal-Conditioned Reinforcement Learning

Bootstrap State Representation Using Style Transfer for Better Generalization in Deep Reinforcement Learning

Imitation Learning with Sinkhorn Distances

Safe Exploration Method for Reinforcement Learning Under Existence of Disturbance

Model Selection in Reinforcement Learning with General Function Approximations

Multi-agent Reinforcement Learning

Frontmatter

Heterogeneity Breaks the Game: Evaluating Cooperation-Competition with Multisets of Agents

Constrained Multiagent Reinforcement Learning for Large Agent Population

Reinforcement Learning for Multi-Agent Stochastic Resource Collection

Team-Imitate-Synchronize for Solving Dec-POMDPs

DistSPECTRL: Distributing Specifications in Multi-Agent Reinforcement Learning Systems

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Bandits and Online Learning

Frontmatter

Hierarchical Unimodal Bandits

Hypothesis Transfer in Bandits by Weighted Models

Multi-agent Heterogeneous Stochastic Linear Bandits

On the Complexity of All -Best Arms Identification

Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback

Online Learning of Convex Sets on Graphs

Active and Semi-supervised Learning

Frontmatter

Exploring Latent Sparse Graph for Large-Scale Semi-supervised Learning

Near Out-of-Distribution Detection for Low-Resolution Radar Micro-doppler Signatures

SemiITE: Semi-supervised Individual Treatment Effect Estimation via Disagreement-Based Co-training

Multi-task Adversarial Learning for Semi-supervised Trajectory-User Linking

Consistent and Tractable Algorithm for Markov Network Learning

Deep Active Learning for Detection of Mercury’s Bow Shock and Magnetopause Crossings

A Stopping Criterion for Transductive Active Learning

Multi-domain Active Learning for Semi-supervised Anomaly Detection

CMG: A Class-Mixed Generation Approach to Out-of-Distribution Detection

GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction

Private and Federated Learning

Frontmatter

Non-IID Distributed Learning with Optimal Mixture Weights

Marginal Release Under Multi-party Personalized Differential Privacy

Beyond Random Selection: A Perspective from Model Inversion in Personalized Federated Learning

Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles

Differentially Private Bayesian Neural Networks on Accuracy, Privacy and Reliability

Differentially Private Federated Combinatorial Bandits with Constraints

Backmatter

Premium Partner