Skip to main content
Top

2021 | Book

Machine Learning and Knowledge Discovery in Databases. Research Track

European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part II

Editors: Nuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, Jose A. Lozano

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The multi-volume set LNAI 12975 until 12979 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2021, which was held during September 13-17, 2021. The conference was originally planned to take place in Bilbao, Spain, but changed to an online event due to the COVID-19 pandemic.

The 210 full papers presented in these proceedings were carefully reviewed and selected from a total of 869 submissions.

The volumes are organized in topical sections as follows:

Research Track:

Part I: Online learning; reinforcement learning; time series, streams, and sequence models; transfer and multi-task learning; semi-supervised and few-shot learning; learning algorithms and applications.

Part II: Generative models; algorithms and learning theory; graphs and networks; interpretation, explainability, transparency, safety.

Part III: Generative models; search and optimization; supervised learning; text mining and natural language processing; image processing, computer vision and visual analytics.

Applied Data Science Track:

Part IV: Anomaly detection and malware; spatio-temporal data; e-commerce and finance; healthcare and medical applications (including Covid); mobility and transportation.

Part V: Automating machine learning, optimization, and feature engineering; machine learning based simulations and knowledge discovery; recommender systems and behavior modeling; natural language processing; remote sensing, image and video processing; social media.

Table of Contents

Frontmatter

Generative Models

Frontmatter
Non-exhaustive Learning Using Gaussian Mixture Generative Adversarial Networks

Supervised learning, while deployed in real-life scenarios, often encounters instances of unknown classes. Conventional algorithms for training a supervised learning model do not provide an option to detect such instances, so they miss-classify such instances with 100% probability. Open Set Recognition (OSR) and Non-Exhaustive Learning (NEL) are potential solutions to overcome this problem. Most existing methods of OSR first classify members of existing classes and then identify instances of new classes. However, many of the existing methods of OSR only makes a binary decision, i.e., they only identify the existence of the unknown class. Hence, such methods cannot distinguish test instances belonging to incremental unseen classes. On the other hand, the majority of NEL methods often make a parametric assumption over the data distribution, which either fail to return good results, due to the reason that real-life complex datasets may not follow a well-known data distribution. In this paper, we propose a new online non-exhaustive learning model, namely, Non-Exhaustive Gaussian Mixture Generative Adversarial Networks (NE-GM-GAN) to address these issues. Our proposed model synthesizes Gaussian mixture based latent representation over a deep generative model, such as GAN, for incremental detection of instances of emerging classes in the test data. Extensive experimental results on several benchmark datasets show that NE-GM-GAN significantly outperforms the state-of-the-art methods in detecting instances of novel classes in streaming data.

Jun Zhuang, Mohammad Al Hasan
Unsupervised Learning of Joint Embeddings for Node Representation and Community Detection

In graph analysis community detection and node representation learning are two highly correlated tasks. In this work, we propose an efficient generative model called J-ENC for learning Joint Embedding for Node representation and Community detection. J-ENC learns a community-aware node representation, i.e., learning of the node embeddings are constrained in such a way that connected nodes are not only “closer” to each other but also share similar community assignments. This joint learning framework leverages community-aware node embeddings for better performance on these tasks: node classification, overlapping community detection and non-overlapping community detection. We demonstrate on several graph datasets that J-ENC effectively outperforms many competitive baselines on these tasks. Furthermore, we show that J-ENC not only has quite robust performance with varying hyperparameters but also is computationally efficient than its competitors.

Rayyan Ahmad Khan, Muhammad Umer Anwaar, Omran Kaddah, Zhiwei Han, Martin Kleinsteuber
GraphAnoGAN: Detecting Anomalous Snapshots from Attributed Graphs

Finding anomalous snapshots from a graph has garnered huge attention recently. Existing studies address the problem using shallow learning mechanisms such as subspace selection, ego-network, or community analysis. These models do not take into account the multifaceted interactions between the structure and attributes in the network. In this paper, we propose GraphAnoGAN, an anomalous snapshot ranking framework, which consists of two core components – generative and discriminative models. Specifically, the generative model learns to approximate the distribution of anomalous samples from the candidate set of graph snapshots, and the discriminative model detects whether the sampled snapshot is from the ground-truth or not. Experiments on 4 real-world networks show that GraphAnoGAN outperforms 6 baselines with a significant margin ( $$28.29\%$$ 28.29 % and $$22.01\%$$ 22.01 % higher precision and recall, respectively compared to the best baseline, averaged across all datasets).

Siddharth Bhatia, Yiwei Wang, Bryan Hooi, Tanmoy Chakraborty
The Bures Metric for Generative Adversarial Networks

Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples. However, under certain circumstances, the training of GANs can lead to mode collapse or mode dropping. To address this problem, we use the last layer of the discriminator as a feature map to study the distribution of the real and the fake data. During training, we propose to match the real batch diversity to the fake batch diversity by using the Bures distance between covariance matrices in this feature space. The computation of the Bures distance can be conveniently done in either feature space or kernel space in terms of the covariance and kernel matrix respectively. We observe that diversity matching reduces mode collapse substantially and has a positive effect on sample quality. On the practical side, a very simple training procedure is proposed and assessed on several data sets.

Hannes De Meulemeester, Joachim Schreurs, Michaël Fanuel, Bart De Moor, Johan A. K. Suykens
Generative Max-Mahalanobis Classifiers for Image Classification, Generation and More

Joint Energy-based Model (JEM) of [11] shows that a standard softmax classifier can be reinterpreted as an energy-based model (EBM) for the joint distribution $$p(\boldsymbol{x}, y)$$ p ( x , y ) ; the resulting model can be optimized to improve calibration, robustness and out-of-distribution detection, while generating samples rivaling the quality of recent GAN-based approaches. However, the softmax classifier that JEM exploits is inherently discriminative and its latent feature space is not well formulated as probabilistic distributions, which may hinder its potential for image generation and incur training instability. We hypothesize that generative classifiers, such as Linear Discriminant Analysis (LDA), might be more suitable for image generation since generative classifiers model the data generation process explicitly. This paper therefore investigates an LDA classifier for image classification and generation. In particular, the Max-Mahalanobis Classifier (MMC) [30], a special case of LDA, fits our goal very well. We show that our Generative MMC (GMMC) can be trained discriminatively, generatively or jointly for image classification and generation. Extensive experiments on multiple datasets show that GMMC achieves state-of-the-art discriminative and generative performances, while outperforming JEM in calibration, adversarial robustness and out-of-distribution detection by a significant margin. Our source code is available at https://github.com/sndnyang/GMMC .

Xiulong Yang, Hui Ye, Yang Ye, Xiang Li, Shihao Ji
Gaussian Process Encoders: VAEs with Reliable Latent-Space Uncertainty

Variational autoencoders are a versatile class of deep latent variable models. They learn expressive latent representations of high dimensional data. However, the latent variance is not a reliable estimate of how uncertain the model is about a given input point. We address this issue by introducing a sparse Gaussian process encoder. The Gaussian process leads to more reliable uncertainty estimates in the latent space. We investigate the implications of replacing the neural network encoder with a Gaussian process in light of recent research. We then demonstrate how the Gaussian Process encoder generates reliable uncertainty estimates while maintaining good likelihood estimates on a range of anomaly detection problems. Finally, we investigate the sensitivity to noise in the training data and show how an appropriate choice of Gaussian process kernel can lead to automatic relevance determination.

Judith Bütepage, Lucas Maystre, Mounia Lalmas
Variational Hyper-encoding Networks

We propose a framework called HyperVAE for encoding distributions of distributions. When a target distribution is modeled by a VAE, its neural network parameters are sampled from a distribution in the model space modeled by a hyper-level VAE. We propose a variational inference framework to implicitly encode the parameter distributions into a low dimensional Gaussian distribution. Given a target distribution, we predict the posterior distribution of the latent code, then use a matrix-network decoder to generate a posterior distribution for the parameters. HyperVAE can encode the target parameters in full in contrast to common hyper-networks practices, which generate only the scale and bias vectors to modify the target-network parameters. Thus HyperVAE preserves information about the model for each task in the latent space. We derive the training objective for HyperVAE using the minimum description length (MDL) principle to reduce the complexity of HyperVAE. We evaluate HyperVAE in density estimation tasks, outlier detection and discovery of novel design classes, demonstrating its efficacy.

Phuoc Nguyen, Truyen Tran, Sunil Gupta, Santu Rana, Hieu-Chi Dam, Svetha Venkatesh
Principled Interpolation in Normalizing Flows

Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples as they are outside the data manifold. This observation suggests that changing the way of interpolating should generally result in better interpolations, but it is not clear how to do that in an unambiguous way. In this paper, we solve this issue by enforcing a specific manifold and, hence, change the base distribution, to allow for a principled way of interpolation. Specifically, we use the Dirichlet and von Mises-Fisher base distributions on the probability simplex and the hypersphere, respectively. Our experimental results show superior performance in terms of bits per dimension, Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) scores for interpolation, while maintaining the generative performance.

Samuel G. Fadel, Sebastian Mair, Ricardo da S. Torres, Ulf Brefeld
CycleGAN Through the Lens of (Dynamical) Optimal Transport

Unsupervised Domain Translation (UDT) is the problem of finding a meaningful correspondence between two given domains, without explicit pairings between elements of the domains. Following the seminal CycleGAN model, variants and extensions have been used successfully for a wide range of applications. However, although there have been some attempts, they remain poorly understood, and lack theoretical guarantees. In this work, we explore the implicit biases present in current approaches and demonstrate where and why they fail. By expliciting these biases, we show that UDT can be reframed as an Optimal Transport (OT) problem. Using the dynamical formulation of Optimal Transport, this allows us to improve the CycleGAN model into a simple and practical formulation which comes with theoretical guarantees and added robustness. Finally, we show how our improved model behaves on the CelebA dataset in a standard then in a more challenging setting, thus paving the way for new applications of UDT. Supplementary material is available at https://arxiv.org/pdf/1906.01292 .

Emmanuel de Bézenac, Ibrahim Ayed, Patrick Gallinari
Decoupling Sparsity and Smoothness in Dirichlet Belief Networks

The Dirichlet Belief Network (DirBN) has been proposed as a promising deep generative model that uses Dirichlet distributions to form layer-wise connections and thereby construct a multi-stochastic layered deep architecture. However, the DirBN cannot simultaneously achieve both sparsity, whereby the generated latent distributions place weights on a subset of components, and smoothness, which requires that the posterior distribution should not be dominated by the data. To address this limitation we introduce the sparse and smooth Dirichlet Belief Network (ssDirBN) which can achieve both sparsity and smoothness simultaneously, thereby increasing modelling flexibility over the DirBN. This gain is achieved by introducing binary variables to indicate whether each entity’s latent distribution at each layer uses a particular component. As a result, each latent distribution may use only a subset of components in each layer, and smoothness is enforced on this subset. Extra efforts on modifying the models are also made to fix the issues which is caused by introducing these binary variables. Extensive experimental results on real-world data show significant performance improvements of ssDirBN over state-of-the-art models in terms of both enhanced model predictions and reduced model complexity.

Yaqiong Li, Xuhui Fan, Ling Chen, Bin Li, Scott A. Sisson

Algorithms and Learning Theory

Frontmatter
Self-bounding Majority Vote Learning Algorithms by the Direct Minimization of a Tight PAC-Bayesian C-Bound

In the PAC-Bayesian literature, the C-Bound refers to an insightful relation between the risk of a majority vote classifier (under the zero-one loss) and the first two moments of its margin (i.e., the expected margin and the voters’ diversity). Until now, learning algorithms developed in this framework minimize the empirical version of the C-Bound, instead of explicit PAC-Bayesian generalization bounds. In this paper, by directly optimizing PAC-Bayesian guarantees on the C-Bound, we derive self-bounding majority vote learning algorithms. Moreover, our algorithms based on gradient descent are scalable and lead to accurate predictors paired with non-vacuous guarantees.

Paul Viallard, Pascal Germain, Amaury Habrard, Emilie Morvant
Midpoint Regularization: From High Uncertainty Training Labels to Conservative Classification Decisions

Label Smoothing (LS) improves model generalization through penalizing models from generating overconfident output distributions. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non-ground truth classes. We extend this technique by considering example pairs, coined PLS. PLS first creates midpoint samples by averaging random sample pairs and then learns a smoothing distribution during training for each of these midpoint samples, resulting in midpoints with high uncertainty labels for training. We empirically show that PLS significantly outperforms LS, achieving up to 30% of relative classification error reduction. We also visualize that PLS produces very low winning softmax scores for both in and out of distribution samples.

Hongyu Guo
Learning Weakly Convex Sets in Metric Spaces

We introduce the notion of weak convexity in metric spaces, a generalization of ordinary convexity commonly used in machine learning. It is shown that weakly convex sets can be characterized by a closure operator and have a unique decomposition into a set of pairwise disjoint connected blocks. We give two generic efficient algorithms, an extensional and an intensional one for learning weakly convex concepts and study their formal properties. Our experimental results concerning vertex classification clearly demonstrate the excellent predictive performance of the extensional algorithm. Two non-trivial applications of the intensional algorithm to polynomial PAC-learnability are presented. The first one deals with learning k-convex Boolean functions, which are already known to be efficiently PAC-learnable. It is shown how to derive this positive result in a fairly easy way by the generic intensional algorithm. The second one is concerned with the Euclidean space equipped with the Manhattan distance. For this metric space, weakly convex sets form a union of pairwise disjoint axis-aligned hyperrectangles. We show that a weakly convex set that is consistent with a set of examples and contains a minimum number of hyperrectangles can be found in polynomial time. In contrast, this problem is known to be NP-complete if the hyperrectangles may be overlapping.

Eike Stadtländer, Tamás Horváth, Stefan Wrobel
Disparity Between Batches as a Signal for Early Stopping

We propose a metric for evaluating the generalization ability of deep neural networks trained with mini-batch gradient descent. Our metric, called gradient disparity, is the $$\ell _2$$ ℓ 2 norm distance between the gradient vectors of two mini-batches drawn from the training set. It is derived from a probabilistic upper bound on the difference between the classification errors over a given mini-batch, when the network is trained on this mini-batch and when the network is trained on another mini-batch of points sampled from the same dataset. We empirically show that gradient disparity is a very promising early-stopping criterion (i) when data is limited, as it uses all the samples for training and (ii) when available data has noisy labels, as it signals overfitting better than the validation data. Furthermore, we show in a wide range of experimental settings that gradient disparity is strongly related to the generalization error between the training and test sets, and that it is also very informative about the level of label noise.

Mahsa Forouzesh, Patrick Thiran
Learning from Noisy Similar and Dissimilar Data

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although the basic version of this problem has been studied recently, it is still unclear how to learn from such supervision under label noise, which is very common when the supervision is, for instance, crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled pairs. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy SD data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data. Finally, we perform experiments on synthetic and real-world datasets and show our noise-informed algorithms outperform existing baselines in learning from noisy pairwise data.

Soham Dan, Han Bao, Masashi Sugiyama
Knowledge Distillation with Distribution Mismatch

Knowledge distillation (KD) is one of the most efficient methods to compress a large deep neural network (called teacher) to a smaller network (called student). Current state-of-the-art KD methods assume that the distributions of training data of teacher and student are identical to maintain the student’s accuracy close to the teacher’s accuracy. However, this strong assumption is not met in many real-world applications where the distribution mismatch happens between teacher’s training data and student’s training data. As a result, existing KD methods often fail in this case. To overcome this problem, we propose a novel method for KD process, which is still effective when the distribution mismatch happens. We first learn a distribution based on student’s training data, from which we can sample images well-classified by the teacher. By doing this, we can discover the data space where the teacher has good knowledge to transfer to the student. We then propose a new loss function to train the student network, which achieves better accuracy than the standard KD loss function. We conduct extensive experiments to demonstrate that our method works well for KD tasks with or without distribution mismatch. To the best of our knowledge, our method is the first method addressing the challenge of distribution mismatch when performing KD process.

Dang Nguyen, Sunil Gupta, Trong Nguyen, Santu Rana, Phuoc Nguyen, Truyen Tran, Ky Le, Shannon Ryan, Svetha Venkatesh
Certification of Model Robustness in Active Class Selection

Active class selection provides machine learning practitioners with the freedom to actively choose the class proportions of their training data. While this freedom can improve the model performance and decrease the data acquisition cost, it also puts the practical value of the trained model into question: is this model really appropriate for the class proportions that are handled during deployment? What if the deployment class proportions are uncertain or change over time? We address these questions by certifying supervised models that are trained through active class selection. Specifically, our certificate declares a set of class proportions for which the certified model induces a training-to-deployment gap that is small with a high probability. This declaration is theoretically justified by PAC bounds. We apply our proposed certification method in astro-particle physics, where a simulation generates telescope recordings from actively chosen particle classes.

Mirko Bunse, Katharina Morik

Graphs and Networks

Frontmatter
Inter-domain Multi-relational Link Prediction

Multi-relational graph is a ubiquitous and important data structure, allowing flexible representation of multiple types of interactions and relations between entities. Similar to other graph-structured data, link prediction is one of the most important tasks on multi-relational graphs and is often used for knowledge completion. When related graphs coexist, it is of great benefit to build a larger graph via integrating the smaller ones. The integration requires predicting hidden relational connections between entities belonged to different graphs (inter-domain link prediction). However, this poses a real challenge to existing methods that are exclusively designed for link prediction between entities of the same graph only (intra-domain link prediction). In this study, we propose a new approach to tackle the inter-domain link prediction problem by softly aligning the entity distributions between different domains with optimal transport and maximum mean discrepancy regularizers. Experiments on real-world datasets show that optimal transport regularizer is beneficial and considerably improves the performance of baseline methods.

Luu Huu Phuc, Koh Takeuchi, Seiji Okajima, Arseny Tolmachev, Tomoyoshi Takebayashi, Koji Maruhashi, Hisashi Kashima
GraphSVX: Shapley Value Explanations for Graph Neural Networks

Graph Neural Networks (GNNs) achieve significant performance for various learning tasks on geometric data due to the incorporation of graph structure into the learning of node representations, which renders their comprehension challenging. In this paper, we first propose a unified framework satisfied by most existing GNN explainers. Then, we introduce GraphSVX, a post hoc local model-agnostic explanation method specifically designed for GNNs. GraphSVX is a decomposition technique that captures the “fair” contribution of each feature and node towards the explained prediction by constructing a surrogate model on a perturbed dataset. It extends to graphs and ultimately provides as explanation the Shapley Values from game theory. Experiments on real-world and synthetic datasets demonstrate that GraphSVX achieves state-of-the-art performance compared to baseline models while presenting core theoretical and human-centric properties.

Alexandre Duval, Fragkiskos D. Malliaros
Multi-view Self-supervised Heterogeneous Graph Embedding

Graph mining tasks often suffer from the lack of supervision from labeled information due to the intrinsic sparseness of graphs and the high cost of manual annotation. To alleviate this issue, inspired by recent advances of self-supervised learning (SSL) on computer vision and natural language processing, graph self-supervised learning methods have been proposed and achieved remarkable performance by utilizing unlabeled information. However, most existing graph SSL methods focus on homogeneous graphs, ignoring the ubiquitous heterogeneity of real-world graphs where nodes and edges are of multiple types. Therefore, directly applying existing graph SSL methods to heterogeneous graphs can not fully capture the rich semantics and their correlations in heterogeneous graphs. In light of this, we investigate self-supervised learning on heterogeneous graphs and propose a novel model named Multi-View Self-supervised heterogeneous graph Embedding (MVSE). By encoding information from different views defined by meta-paths and optimizing both intra-view and inter-view contrastive learning tasks, MVSE comprehensively utilizes unlabeled information and learns node embeddings. Extensive experiments are conducted on various tasks to show the effectiveness of the proposed framework.

Jianan Zhao, Qianlong Wen, Shiyu Sun, Yanfang Ye, Chuxu Zhang
Semantic-Specific Hierarchical Alignment Network for Heterogeneous Graph Adaptation

Node classification has been substantially improved with the advent of Heterogeneous Graph Neural Networks (HGNNs). However, collecting numerous labeled data is expensive and time-consuming in many applications. Domain Adaptation (DA) tackles this problem by transferring knowledge from a label-rich domain to a label-scarce one. However the heterogeneity and rich semantic information bring great challenges for adapting HGNN for DA. In this paper, we propose a novel semantic-specific hierarchical alignment network for heterogeneous graph adaptation, called HGA. HGA designs a sharing-parameters HGNN aggregating path-based neighbors and hierarchical domain alignment strategies with the MMD and $$L_1$$ L 1 normalization term. Extensive experiments on four datasets demonstrate that the proposed model can achieve remarkable results on node classification.

YuanXin Zhuang, Chuan Shi, Cheng Yang, Fuzhen Zhuang, Yangqiu Song
The KL-Divergence Between a Graph Model and its Fair I-Projection as a Fairness Regularizer

Learning and reasoning over graphs is increasingly done by means of probabilistic models, e.g. exponential random graph models, graph embedding models, and graph neural networks. When graphs are modeling relations between people, however, they will inevitably reflect biases, prejudices, and other forms of inequity and inequality. An important challenge is thus to design accurate graph modeling approaches while guaranteeing fairness according to the specific notion of fairness that the problem requires. Yet, past work on the topic remains scarce, is limited to debiasing specific graph modeling methods, and often aims to ensure fairness in an indirect manner.We propose a generic approach applicable to most probabilistic graph modeling approaches. Specifically, we first define the class of fair graph models corresponding to a chosen set of fairness criteria. Given this, we propose a fairness regularizer defined as the KL-divergence between the graph model and its I-projection onto the set of fair models. We demonstrate that using this fairness regularizer in combination with existing graph modeling approaches efficiently trades-off fairness with accuracy, whereas the state-of-the-art models can only make this trade-off for the fairness criterion that they were specifically designed for.

Maarten Buyl, Tijl De Bie
On Generalization of Graph Autoencoders with Adversarial Training

Adversarial training is an approach for increasing model’s resilience against adversarial perturbations. Such approaches have been demonstrated to result in models with feature representations that generalize better. However, limited works have been done on adversarial training of models on graph data. In this paper, we raise such a question – does adversarial training improve the generalization of graph representations. We formulate $$L_{2}$$ L 2 and $$L_{\infty }$$ L ∞ versions of adversarial training in two powerful node embedding methods: graph autoencoder (GAE) and variational graph autoencoder (VGAE). We conduct extensive experiments on three main applications, i.e. link prediction, node clustering, graph anomaly detection of GAE and VGAE, and demonstrate that both $$L_{2}$$ L 2 and $$L_{\infty }$$ L ∞ adversarial training boost the generalization of GAE and VGAE.

Tianjin Huang, Yulong Pei, Vlado Menkovski, Mykola Pechenizkiy
Inductive Link Prediction with Interactive Structure Learning on Attributed Graph

Link prediction is one of the most important tasks in graph machine learning, which aims at predicting whether two nodes in a network have an edge. Real-world graphs typically contain abundant node and edge attributes, thus how to perform link prediction by simultaneously learning structure and attribute information from both interactions/paths between two associated nodes and local neighborhood among node’s ego subgraph is intractable.To address this issue, we develop a novel Path-aware Graph Neural Network (PaGNN) method for link prediction, which incorporates interaction and neighborhood information into graph neural networks via broadcasting and aggregating operations. And a cache strategy is developed to accelerate the inference process. Extensive experiments show a superior performance of our proposal over state-of-the-art methods on real-world link prediction tasks.

Shuo Yang, Binbin Hu, Zhiqiang Zhang, Wang Sun, Yang Wang, Jun Zhou, Hongyu Shan, Yuetian Cao, Borui Ye, Yanming Fang, Quan Yu
Representation Learning on Multi-layered Heterogeneous Network

Network data can often be represented in a multi-layered structure with rich semantics. One example is e-commerce data, containing user-user social network layer and item-item context layer, with cross-layer user-item interactions. Given the dual characters of homogeneity within each layer and heterogeneity across layers, we seek to learn node representations from such a multi-layered heterogeneous network while jointly preserving structural information and network semantics. In contrast, previous works on network embedding mainly focus on single-layered or homogeneous networks with one type of nodes and links. In this paper we propose intra- and cross-layer proximity concepts. Intra-layer proximity simulates propagation along homogeneous nodes to explore latent structural similarities. Cross-layer proximity captures network semantics by extending heterogeneous neighborhood across layers. Through extensive experiments on four datasets, we demonstrate that our model achieves substantial gains in different real-world domains over state-of-the-art baselines.

Delvin Ce Zhang, Hady W. Lauw
Adaptive Node Embedding Propagation for Semi-supervised Classification

Graph Convolutional Networks (GCNs) are state-of-the-art approaches for semi-supervised node classification task. By increasing the number of layers, GCNs utilize high-order relations between nodes that are more than two hops away from each other. However, GCNs with many layers face three drawbacks: (1) over-fitting due to the increasing number of parameters, (2) over-smoothing in which embeddings converge to similar values, and (3) the difficulty in selecting the appropriate number of propagation hops. In this paper, we propose ANEPN that effectively utilizes high-order relations between nodes by overcoming the above drawbacks of GCNs. First, we introduce Embedding Propagation Loss which increases the number of propagation hops while keeping the number of parameters constant for mitigating over-fitting. Second, we propose Anti-Smoothness Loss (ASL) that prevents embeddings from converging to similar values for avoiding over-smoothing. Third, we introduce a metric for predicted class labels for adaptively controlling the number of propagation hops. We show that ANEPN outperforms ten state-of-the-art approaches on three standard datasets.

Yuya Ogawa, Seiji Maekawa, Yuya Sasaki, Yasuhiro Fujiwara, Makoto Onizuka
Probing Negative Sampling for Contrastive Learning to Learn Graph Representations

Graph representation learning has long been an important yet challenging task for various real-world applications. However, its downstream tasks are mainly performed in the settings of supervised or semi-supervised learning. Inspired by recent advances in unsupervised contrastive learning, this paper is thus motivated to investigate how the node-wise contrastive learning could be performed. Particularly, we respectively resolve the class collision issue and the imbalanced negative data distribution issue. Extensive experiments are performed on three real-world datasets and the proposed approach achieves the SOTA model performance.

Shiyi Chen, Ziao Wang, Xinni Zhang, Xiaofeng Zhang, Dan Peng
Beyond Low-Pass Filters: Adaptive Feature Propagation on Graphs

Graph neural networks (GNNs) have been extensively studied for prediction tasks on graphs. As pointed out by recent studies, most GNNs assume local homophily, i.e., strong similarities in local neighborhoods. This assumption however limits the generalizability power of GNNs. To address this limitation, we propose a flexible GNN model, which is capable of handling any graphs without being restricted by their underlying homophily. At its core, this model adopts a node attention mechanism based on multiple learnable spectral filters; therefore, the aggregation scheme is learned adaptively for each graph in the spectral domain. We evaluated the proposed model on node classification tasks over eight benchmark datasets. The proposed model is shown to generalize well to both homophilic and heterophilic graphs. Further, it outperforms all state-of-the-art baselines on heterophilic graphs and performs comparably with them on homophilic graphs.

Shouheng Li, Dongwoo Kim, Qing Wang
Zero-Shot Scene Graph Relation Prediction Through Commonsense Knowledge Integration

Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e., zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning, i.e., the ability to associate similar entities and infer similar relations based on general understanding of the world. To fill this gap, we propose CommOnsense-integrAted sCene grapH rElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative case studies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach. The code is available at https://github.com/Wayfear/Coacher .

Xuan Kan, Hejie Cui, Carl Yang
Graph Fraud Detection Based on Accessibility Score Distributions

Graph fraud detection approaches traditionally present frauds as subgraphs and focus on characteristics of the fraudulent subgraphs: unexpectedly high densities or sparse connections with the rest of the graph. However, frauds can easily circumvent such approaches by manipulating their subgraph density or making connections to honest user groups. We focus on a trait that is hard for fraudsters to manipulate: the unidirectionality of communication between honest users and fraudsters. We define an accessibility score to quantify the unidirectionality, then prove the unidirectionality induces skewed accessibility score distributions for fraudsters. We propose SkewA, a novel fraud detection method that measures the skewness in accessibility score distributions and uses it as an honesty metric. SkewA is (a) robust to frauds with low density and various types of camouflages, (b) theoretically sound: we analyze how the unidirectionality brings skewed accessibility score distributions, and (c) effective: showing up to 95.6% accuracy in real-world data where all competitors fail to detect any fraud.

Minji Yoon
Correlation Clustering with Global Weight Bounds

Given a set of objects and nonnegative real weights expressing “positive” and “negative” feeling of clustering any two objects together, min-disagreement correlation clustering partitions the input object set so as to minimize the sum of the intra-cluster negative-type weights plus the sum of the inter-cluster positive-type weights. Min-disagreement correlation clustering is $$\mathbf {APX}$$ APX -hard, but efficient constant-factor approximation algorithms exist if the weights are bounded in some way. The weight bounds so far studied in the related literature are mostly local, as they are required to hold for every object-pair. In this paper, we introduce the problem of min-disagreement correlation clustering with global weight bounds, i.e., constraints to be satisfied by the input weights altogether. Our main result is a sufficient condition that establishes when any algorithm achieving a certain approximation under the probability constraint keeps the same guarantee on an input that violates the constraint. This extends the range of applicability of the most prominent existing correlation-clustering algorithms, including the popular Pivot, thus providing benefits, both theoretical and practical. Experiments demonstrate the usefulness of our approach, in terms of both worthiness of employing existing efficient algorithms, and guidance on the definition of weights from feature vectors in a task of fair clustering.

Domenico Mandaglio, Andrea Tagarelli, Francesco Gullo
Modeling Multi-factor and Multi-faceted Preferences over Sequential Networks for Next Item Recommendation

Attributes of items carry useful information for accurate recommendations. Existing methods which tried to use items’ attributes relied on either 1) feature-level compression which may introduce much noise information of irrelevant attributes, or 2) item- and attribute- level transition modeling which ignored the mutual effects of multi-factor for users’ behaviors. In addition, these methods failed to capture multi-faceted preferences of users, therefore, the prediction for the next behavior may be affected or misled by the irrelevant facets of preferences. To address these problems, we propose a Sequential Network based Recommendation model, named SNR, to extract and utilize users’ multi-factor and multi-faceted preferences for next item recommendation. To model users’ multi-factor preferences, we organize the item- and attribute- level sequences of users’ behaviors as unified sequential networks, and propose an attentional gated Graph Convolutional Network model to explore the mutual effects of the preference factors contained in sequential networks. To capture users’ multi-faceted preferences, we propose a multi-faceted preference learning model to simulate the decision-making process of users with the Gumbel sotfmax trick. Finally, we fuse the multi-factor and multi-faceted preferences in a unified latent space for next item recommendation. Extensive experiments on four real-world data sets show that the proposed model SNR consistently outperforms several state-of-the-art methods.

Yingpeng Du, Hongzhi Liu, Zhonghai Wu
PATHATTACK: Attacking Shortest Paths in Complex Networks

Shortest paths in complex networks play key roles in many applications. Examples include routing packets in a computer network, routing traffic on a transportation network, and inferring semantic distances between concepts on the World Wide Web. An adversary with the capability to perturb the graph might make the shortest path between two nodes route traffic through advantageous portions of the graph (e.g., a toll road he owns). In this paper, we introduce the Force Path Cut problem, in which there is a specific route the adversary wants to promote by removing a low-cost set of edges in the graph. We show that Force Path Cut is NP-complete. It can be recast as an instance of the Weighted Set Cover problem, enabling the use of approximation algorithms. The size of the universe for the set cover problem is potentially factorial in the number of nodes. To overcome this hurdle, we propose the PATHATTACK algorithm, which via constraint generation considers only a small subset of paths—at most 5% of the number of edges in 99% of our experiments. Across a diverse set of synthetic and real networks, the linear programming formulation of Weighted Set Cover yields the optimal solution in over 98% of cases. We also demonstrate running time vs. cost tradeoff using two approximation algorithms and greedy baseline methods. This work expands the area of adversarial graph mining beyond recent work on node classification and embedding.

Benjamin A. Miller, Zohair Shafi, Wheeler Ruml, Yevgeniy Vorobeychik, Tina Eliassi-Rad, Scott Alfeld
Embedding Knowledge Graphs Attentive to Positional and Centrality Qualities

Knowledge graphs embeddings (KGE) are lately at the center of many artificial intelligence studies due to their applicability for solving downstream tasks, including link prediction and node classification. However, most Knowledge Graph embedding models encode, into the vector space, only the local graph structure of an entity, i.e., information of the 1-hop neighborhood. Capturing not only local graph structure but global features of entities are crucial for prediction tasks on Knowledge Graphs. This work proposes a novel KGE method named Graph Feature Attentive Neural Network (GFA-NN) that computes graphical features of entities. As a consequence, the resulting embeddings are attentive to two types of global network features. First, nodes’ relative centrality is based on the observation that some of the entities are more “prominent” than the others. Second, the relative position of entities in the graph. GFA-NN computes several centrality values per entity, generates a random set of reference nodes’ entities, and computes a given entity’s shortest path to each entity in the reference set. It then learns this information through optimization of objectives specified on each of these features. We investigate GFA-NN on several link prediction benchmarks in the inductive and transductive setting and show that GFA-NN achieves on-par or better results than state-of-the-art KGE solutions.

Afshin Sadeghi, Diego Collarana, Damien Graux, Jens Lehmann

Interpretation, Explainability, Transparency, Safety

Frontmatter
Reconnaissance for Reinforcement Learning with Safety Constraints

As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.

Shin-ichi Maeda, Hayato Watahiki, Yi Ouyang, Shintarou Okada, Masanori Koyama, Prabhat Nagarajan
VeriDL: Integrity Verification of Outsourced Deep Learning Services

Deep neural networks (DNNs) are prominent due to their superior performance in many fields. The deep-learning-as-a-service (DLaaS) paradigm enables individuals and organizations (clients) to outsource their DNN learning tasks to the cloud-based platforms. However, the DLaaS server may return incorrect DNN models due to various reasons (e.g., Byzantine failures). This raises the serious concern of how to verify if the DNN models trained by potentially untrusted DLaaS servers are indeed correct. To address this concern, in this paper, we design VeriDL, a framework that supports efficient correctness verification of DNN models in the DLaaS paradigm. The key idea of VeriDL is the design of a small-size cryptographic proof of the training process of the DNN model, which is associated with the model and returned to the client. Through the proof, VeriDL can verify the correctness of the DNN model returned by the DLaaS server with a deterministic guarantee and cheap overhead. Our experiments on four real-world datasets demonstrate the efficiency and effectiveness of VeriDL.

Boxiang Dong, Bo Zhang, Hui (Wendy) Wang
A Unified Batch Selection Policy for Active Metric Learning

Active metric learning is the problem of incrementally selecting high-utility batches of training data (typically, ordered triplets) to annotate, in order to progressively improve a learned model of a metric over some input domain as rapidly as possible. Standard approaches, which independently assess the informativeness of each triplet in a batch, are susceptible to highly correlated batches with many redundant triplets and hence low overall utility. While a recent work [20] proposes batch-decorrelation strategies for metric learning, they rely on ad hoc heuristics to estimate the correlation between two triplets at a time. We present a novel batch active metric learning method that leverages the Maximum Entropy Principle to learn the least biased estimate of triplet distribution for a given set of prior constraints. To avoid redundancy between triplets, our method collectively selects batches with maximum joint entropy, which simultaneously captures both informativeness and diversity. We take advantage of the submodularity of the joint entropy function to construct a tractable solution using an efficient greedy algorithm based on Gram-Schmidt orthogonalization that is provably $$\left( 1 - \frac{1}{e} \right) $$ 1 - 1 e -optimal. Our approach is the first batch active metric learning method to define a unified score that balances informativeness and diversity for an entire batch of triplets. Experiments with several real-world datasets demonstrate that our algorithm is robust, generalizes well to different applications and input modalities, and consistently outperforms the state-of-the-art.

K. Priyadarshini, Siddhartha Chaudhuri, Vivek Borkar, Subhasis Chaudhuri
Off-Policy Differentiable Logic Reinforcement Learning

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

Li Zhang, Xin Li, Mingzhong Wang, Andong Tian
Causal Explanation of Convolutional Neural Networks

In this paper we introduce an explanation technique for Convolutional Neural Networks (CNNs) based on the theory of causality by Halpern and Pearl [12]. The causal explanation technique (CexCNN) is based on measuring the filter importance to a CNN decision, which is measured through counterfactual reasoning. In addition, we employ extended definitions of causality, which are responsibility and blame to weight the importance of such filters and project their contribution on input images. Since CNNs form a hierarchical structure, and since causal models can be hierarchically abstracted, we employ this similarity to perform the most important contribution of this paper, which is localizing the important features in the input image that contributed the most to a CNN’s decision. In addition to its ability in localization, we will show that CexCNN can be useful as well for model compression through pruning the less important filters. We tested CexCNN on several CNNs architectures and datasets. (The code is available on https://github.com/HichemDebbi/CexCNN )

Hichem Debbi
Interpretable Counterfactual Explanations Guided by Prototypes

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the search for counterfactual instances and result in more interpretable explanations. We quantitatively evaluate interpretability of the generated counterfactuals to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). Additionally, we propose a principled approach to handle categorical variables and illustrate our method on the Adult (Census) dataset. Our method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for black box models.

Arnaud Van Looveren, Janis Klaise
Finding High-Value Training Data Subset Through Differentiable Convex Programming

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the “value” of individual training datapoints have been proposed for explaining trained models. However, the value of a training datapoint also depends on other selected training datapoints - a notion which is not explicitly captured by existing methods. In this paper, we study the problem of selecting high-value subsets of training data. The key idea is to design a learnable framework for online subset selection, which can be learned using mini-batches of training data, thus making our method scalable. This results in a parameterised convex subset selection problem that is amenable to a differentiable convex programming paradigm, thus allowing us to learn the parameters of the selection model in an end-to-end training. Using this framework, we design an online alternating minimization based algorithm for jointly learning the parameters of the selection model and ML model. Extensive evaluation on a synthetic dataset, and three standard datasets, show that our algorithm finds consistently higher value subsets of training data, compared to the recent state of the art methods, sometimes $$\sim 20\%$$ ∼ 20 % higher value than existing methods. The subsets are also useful in finding mislabelled training data. Our algorithm takes running time comparable to the existing valuation functions.

Soumi Das, Arshdeep Singh, Saptarshi Chatterjee, Suparna Bhattacharya, Sourangshu Bhattacharya
Consequence-Aware Sequential Counterfactual Generation

Counterfactuals have become a popular technique nowadays for interacting with black-box machine learning models and understanding how to change a particular instance to obtain a desired outcome from the model. However, most existing approaches assume instant materialization of these changes, ignoring that they may require effort and a specific order of application. Recently, methods have been proposed that also consider the order in which actions are applied, leading to the so-called sequential counterfactual generation problem.In this work, we propose a model-agnostic method for sequential counterfactual generation. We formulate the task as a multi-objective optimization problem and present a genetic algorithm approach to find optimal sequences of actions leading to the counterfactuals. Our cost model considers not only the direct effect of an action, but also its consequences. Experimental results show that compared to state-of-the-art, our approach generates less costly solutions, is more efficient and provides the user with a diverse set of solutions to choose from.

Philip Naumann, Eirini Ntoutsi
Studying and Exploiting the Relationship Between Model Accuracy and Explanation Quality

Many explanation methods have been proposed to reveal insights about the internal procedures of black-box models like deep neural networks. Although these methods are able to generate explanations for individual predictions, little research has been conducted to investigate the relationship of model accuracy and explanation quality, or how to use explanations to improve model performance. In this paper, we evaluate explanations using a metric based on area under the ROC curve (AUC), treating expert-provided image annotations as ground-truth explanations, and quantify the correlation between model accuracy and explanation quality when performing image classifications with deep neural networks. The experiments are conducted using two image datasets: the CUB-200-2011 dataset and a Kahikatea dataset that we publish with this paper. For each dataset, we compare and evaluate seven different neural networks with four different explainers in terms of both accuracy and explanation quality. We also investigate how explanation quality evolves as loss metrics change through the training iterations of each model. The experiments suggest a strong correlation between model accuracy and explanation quality. Based on this observation, we demonstrate how explanations can be exploited to benefit the model selection process—even if simply maximising accuracy on test data is the primary goal.

Yunzhe Jia, Eibe Frank, Bernhard Pfahringer, Albert Bifet, Nick Lim
Explainable Multiple Instance Learning with Instance Selection Randomized Trees

Multiple Instance Learning (MIL) aims at extracting patterns from a collection of samples, where individual samples (called bags) are represented by a group of multiple feature vectors (called instances) instead of a single feature vector. Grouping instances into bags not only helps to formulate some learning problems more naturally, it also significantly reduces label acquisition costs as only the labels for bags are needed, not for the inner instances. However, in application domains where inference transparency is demanded, such as in network security, the sample attribution requirements are often asymmetric with respect to the training/application phase. While in the training phase it is very convenient to supply labels only for bags, in the application phase it is generally not enough to just provide decisions on the bag-level because the inferred verdicts need to be explained on the level of individual instances. Unfortunately, the majority of recent MIL classifiers does not focus on this real-world need. In this paper, we address this problem and propose a new tree-based MIL classifier able to identify instances responsible for positive bag predictions. Results from an empirical evaluation on a large-scale network security dataset also show that the classifier achieves superior performance when compared with prior art methods.

Tomáš Komárek, Jan Brabec, Petr Somol
Adversarial Representation Learning with Closed-Form Solvers

Adversarial representation learning aims to learn data representations for a target task while removing unwanted sensitive information at the same time. Existing methods learn model parameters iteratively through stochastic gradient descent-ascent, which is often unstable and unreliable in practice. To overcome this challenge, we adopt closed-form solvers for the adversary and target task. We model them as kernel ridge regressors and analytically determine an upper-bound on the optimal dimensionality of representation. Our solution, dubbed OptNet-ARL, reduces to a stable one one-shot optimization problem that can be solved reliably and efficiently. OptNet-ARL can be easily generalized to the case of multiple target tasks and sensitive attributes. Numerical experiments, on both small and large scale datasets, show that, from an optimization perspective, OptNet-ARL is stable and exhibits three to five times faster convergence. Performance wise, when the target and sensitive attributes are dependent, OptNet-ARL learns representations that offer a better trade-off front between (a) utility and bias for fair classification and (b) utility and privacy by mitigating leakage of private information than existing solutions.Code is available at https://github.com/human-analysis .

Bashir Sadeghi, Lan Wang, Vishnu Naresh Boddeti
Learning Unbiased Representations via Rényi Minimization

In recent years, significant work has been done to include fairness constraints in the training objective of machine learning algorithms. Differently from classical prediction retreatment algorithms, we focus on learning fair representations of the inputs. The challenge is to learn representations that capture most relevant information to predict the targeted output Y, while not containing any information about a sensitive attribute S. We leverage recent work which has been done to estimate the Hirschfeld-Gebelein-Renyi (HGR) maximal correlation coefficient by learning deep neural network transformations and use it as a min-max game to penalize the intrinsic bias in a multi dimensional latent representation. Compared to other dependence measures, the HGR coefficient captures more information about the non-linear dependencies, making the algorithm more efficient in mitigating bias. After providing a theoretical analysis of the consistency of the estimator and its desirable properties for bias mitigation, we empirically study its impact at various levels of neural architectures. We show that acting at intermediate levels of neural architectures provides best expressiveness/generalization abilities for bias mitigation, and that using an HGR based loss is more efficient than more classical adversarial approaches from the literature.

Vincent Grari, Oualid El Hajouji, Sylvain Lamprier, Marcin Detyniecki
Diversity-Aware k-median: Clustering with Fair Center Representation

We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C, a set of facilities , a collection $$\mathscr {F}=\{F_1,\dots ,F_t\}$$ F = { F 1 , ⋯ , F t } of facility groups , a budget k, and a set of lower-bound thresholds $$R=\{r_1,\dots ,r_t\}$$ R = { r 1 , ⋯ , r t } , one for each group in $$\mathscr {F}$$ F . The diversity-aware k-median problem asks to find a set S of k facilities in such that $$|S \cap F_i| \ge r_i$$ | S ∩ F i | ≥ r i , that is, at least $$r_i$$ r i centers in S are from group $$F_i$$ F i , and the k-median cost $$\sum _{c \in C} \min _{s \in S} d(c,s)$$ ∑ c ∈ C min s ∈ S d ( c , s ) is minimized. We show that in the general case where the facility groups may overlap, the diversity-aware k-median problem is $$\mathbf {NP}$$ NP -hard, fixed-parameter intractable with respect to parameter k, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and red-blue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets.

Suhas Thejaswi, Bruno Ordozgoiti, Aristides Gionis
Sibling Regression for Generalized Linear Models

Field observations form the basis of many scientific studies, especially in ecological and social sciences. Despite efforts to conduct such surveys in a standardized way, observations can be prone to systematic measurement errors. The removal of systematic variability introduced by the observation process, if possible, can greatly increase the value of this data. Existing non-parametric techniques for correcting such errors assume linear additive noise models. This leads to biased estimates when applied to generalized linear models (GLM). We present an approach based on residual functions to address this limitation. We then demonstrate its effectiveness on synthetic data and show it reduces systematic detection variability in moth surveys.

Shiv Shankar, Daniel Sheldon
Privacy Amplification via Iteration for Shuffled and Online PNSGD

In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates. A limitation in the existing literature is that only the early stopped PNSGD has been studied, while no result has been proved on the more widely-used PNSGD applied on a shuffled dataset. Moreover, no scheme has been yet proposed regarding how to decrease the injected noise when new data are received in an online fashion. In this work, we first prove a privacy guarantee for shuffled PNSGD, which is investigated asymptotically when the noise is fixed for each sample size n but reduced at a predetermined rate when n increases, in order to achieve the convergence of privacy loss. We then analyze the online setting and provide a faster decaying scheme for the magnitude of the injected noise that also guarantees the convergence of privacy loss.

Matteo Sordello, Zhiqi Bu, Jinshuo Dong
Backmatter
Metadata
Title
Machine Learning and Knowledge Discovery in Databases. Research Track
Editors
Nuria Oliver
Fernando Pérez-Cruz
Stefan Kramer
Jesse Read
Jose A. Lozano
Copyright Year
2021
Electronic ISBN
978-3-030-86520-7
Print ISBN
978-3-030-86519-1
DOI
https://doi.org/10.1007/978-3-030-86520-7

Premium Partner