Top

2023 | Book

Read chapter Read first chapter

Machine Learning and Knowledge Discovery in Databases

European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part I

Editors: Massih-Reza Amini, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, Grigorios Tsoumakas

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

The multi-volume set LNAI 13713 until 13718 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2022, which took place in Grenoble, France, in September 2022.

The 236 full papers presented in these proceedings were carefully reviewed and selected from a total of 1060 submissions. In addition, the proceedings include 17 Demo Track contributions.

The volumes are organized in topical sections as follows:

Part I: Clustering and dimensionality reduction; anomaly detection; interpretability and explainability; ranking and recommender systems; transfer and multitask learning;

Part II: Networks and graphs; knowledge graphs; social network analysis; graph neural networks; natural language processing and text mining; conversational systems;

Part III: Deep learning; robust and adversarial machine learning; generative models; computer vision; meta-learning, neural architecture search;

Part IV: Reinforcement learning; multi-agent reinforcement learning; bandits and online learning; active and semi-supervised learning; private and federated learning; .

Part V: Supervised learning; probabilistic inference; optimal transport; optimization; quantum, hardware; sustainability;

Part VI: Time series; financial machine learning; applications; applications: transportation; demo track.

Frontmatter

Clustering and Dimensionality Reduction

Frontmatter

Pass-Efficient Randomized SVD with Boosted Accuracy

Singular value decomposition (SVD) is a widely used tool in data analysis and numerical linear algebra. Computing truncated SVD of a very large matrix encounters difficulty due to excessive time and memory cost. In this work, we aim to tackle this difficulty and enable accurate SVD computation for the large data which cannot be loaded into memory. We first propose a randomized SVD algorithm with fewer passes over the matrix. It reduces the passes in the basic randomized SVD by half, almost not sacrificing accuracy. Then, a shifted power iteration technique is proposed to improve the accuracy of result, where a dynamic scheme of updating the shift value in each power iteration is included. Finally, collaborating the proposed techniques with several accelerating skills, we develop a Pass-efficient randomized SVD (PerSVD) algorithm for efficient and accurate treatment of large data stored on hard disk. Experiments on synthetic and real-world data validate that the proposed techniques largely improve the accuracy of randomized SVD with same number of passes over the matrix. With 3 or 4 passes over the data, PerSVD is able to reduce the error of SVD result by three or four orders of magnitude compared with the basic randomized SVD and single-pass SVD algorithms, with similar or less runtime and less memory usage.

Xu Feng, Wenjian Yu, Yuyang Xie

CDPS: Constrained DTW-Preserving Shapelets

The analysis of time series for clustering and classification is becoming ever more popular because of the increasingly ubiquitous nature of IoT, satellite constellations, and handheld and smart-wearable devices, etc. The presence of phase shift, differences in sample duration, and/or compression and dilation of a signal means that Euclidean distance is unsuitable in many cases. As such, several similarity measures specific to time-series have been proposed, Dynamic Time Warping (DTW) being the most popular. Nevertheless, DTW does not respect the axioms of a metric and therefore Learning DTW-Preserving Shapelets (LDPS) have been developed to regain these properties by using the concept of shapelet transform. LDPS learns an unsupervised representation that models DTW distances using Euclidean distance in shapelet space. This article proposes constrained DTW-preserving shapelets (CDPS), in which a limited amount of user knowledge is available in the form of must link and cannot link constraints, to guide the representation such that it better captures the user’s interpretation of the data rather than the algorithm’s bias. Subsequently, any unconstrained algorithm can be applied, e.g. K-means clustering, k-NN classification, etc, to obtain a result that fulfils the constraints (without explicit knowledge of them). Furthermore, this representation is generalisable to out-of-sample data, overcoming the limitations of standard transductive constrained-clustering algorithms. CLDPS is shown to outperform the state-of-the-art constrained-clustering algorithms on multiple time-series datasets. An open-source implementation based on PyTorch is available From: https://git.unistra.fr/helamouri/constrained-dtw-preserving-shapelets , which takes full advantage of GPU acceleration.

Hussein El Amouri, Thomas Lampert, Pierre Gançarski, Clément Mallet

Structured Nonlinear Discriminant Analysis

Many traditional machine learning and pattern recognition algorithms—as for example linear discriminant analysis (LDA) or principal component analysis (PCA)—optimize data representation with respect to an information theoretic criterion. For time series analysis these traditional techniques are typically insufficient. In this work we propose an extension to linear discriminant analysis that allows to learn a data representation based on an algebraic structure that is tailored for time series. Specifically we propose a generalization of LDA towards shift-invariance that is based on cyclic structures. We expand this framework towards more general structures, that allow to incorporate previous knowledge about the data at hand within the representation learning step. The effectiveness of this proposed approach is demonstrated on synthetic and real-world data sets. Finally, we show the interrelation of our approach to common machine learning and signal processing techniques.

Christopher Bonenberger, Wolfgang Ertel, Markus Schneider, Friedhelm Schwenker

LSCALE: Latent Space Clustering-Based Active Learning for Node Classification

Node classification on graphs is an important task in many practical domains. It usually requires labels for training, which can be difficult or expensive to obtain in practice. Given a budget for labelling, active learning aims to improve performance by carefully choosing which nodes to label. Previous graph active learning methods learn representations using labelled nodes and select some unlabelled nodes for label acquisition. However, they do not fully utilize the representation power present in unlabelled nodes. We argue that the representation power in unlabelled nodes can be useful for active learning and for further improving performance of active learning for node classification. In this paper, we propose a latent space clustering-based active learning framework for node classification (LSCALE), where we fully utilize the representation power in both labelled and unlabelled nodes. Specifically, to select nodes for labelling, our framework uses the K-Medoids clustering algorithm on a latent space based on a dynamic combination of both unsupervised features and supervised features. In addition, we design an incremental clustering module to avoid redundancy between nodes selected at different steps. Extensive experiments on five datasets show that our proposed framework LSCALE consistently and significantly outperforms the state-of-the-art approaches by a large margin.

Juncheng Liu, Yiwei Wang, Bryan Hooi, Renchi Yang, Xiaokui Xiao

Open Access

Powershap: A Power-Full Shapley Feature Selection Method

Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) many filter methods not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present powershap, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. Powershap is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, powershap provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, powershap is implemented as a plug-and-play and open-source sklearn component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the powershap algorithm, allowing to use the algorithm without any configuration needed.

Jarne Verhaeghe, Jeroen Van Der Donckt, Femke Ongenae, Sofie Van Hoecke

PDF View full text

Automated Cancer Subtyping via Vector Quantization Mutual Information Maximization

Cancer subtyping is crucial for understanding the nature of tumors and providing suitable therapy. However, existing labelling methods are medically controversial, and have driven the process of subtyping away from teaching signals. Moreover, cancer genetic expression profiles are high-dimensional, scarce, and have complicated dependence, thereby posing a serious challenge to existing subtyping models for outputting sensible clustering. In this study, we propose a novel clustering method for exploiting genetic expression profiles and distinguishing subtypes in an unsupervised manner. The proposed method adaptively learns categorical correspondence from latent representations of expression profiles to the subtypes output by the model. By maximizing the problem-agnostic mutual information between input expression profiles and output subtypes, our method can automatically decide a suitable number of subtypes. Through experiments, we demonstrate that our proposed method can refine existing controversial labels, and, by further medical analysis, this refinement is proven to have a high correlation with cancer survival rates.

Zheng Chen, Lingwei Zhu, Ziwei Yang, Takashi Matsubara

Open Access

Wasserstein t-SNE

Scientific datasets often have hierarchical structure: for example, in surveys, individual participants (samples) might be grouped at a higher level (units) such as their geographical region. In these settings, the interest is often in exploring the structure on the unit level rather than on the sample level. Units can be compared based on the distance between their means, however this ignores the within-unit distribution of samples. Here we develop an approach for exploratory analysis of hierarchical datasets using the Wasserstein distance metric that takes into account the shapes of within-unit distributions. We use t-SNE to construct 2D embeddings of the units, based on the matrix of pairwise Wasserstein distances between them. The distance matrix can be efficiently computed by approximating each unit with a Gaussian distribution, but we also provide a scalable method to compute exact Wasserstein distances. We use synthetic data to demonstrate the effectiveness of our Wasserstein t-SNE, and apply it to data from the 2017 German parliamentary election, considering polling stations as samples and voting districts as units. The resulting embedding uncovers meaningful structure in the data.

Fynn Bachmann, Philipp Hennig, Dmitry Kobak

PDF View full text

Nonparametric Bayesian Deep Visualization

Visualization methods such as t-SNE [1] have helped in knowledge discovery from high-dimensional data; however, their performance may degrade when the intrinsic structure of observations is in low-dimensional space, and they cannot estimate clusters that are often useful to understand the internal structure of a dataset. A solution is to visualize the latent coordinates and clusters estimated using a neural clustering model. However, they require a long computational time since they have numerous weights to train and must tune the layer width, the number of latent dimensions and clusters to appropriately model the latent space. Additionally, the estimated coordinates may not be suitable for visualization since such a model and visualization method are applied independently. We utilize neural network Gaussian processes (NNGP) [2] equivalent to a neural network whose weights are marginalized to eliminate the necessity to optimize weights and layer widths. Additionally, to determine latent dimensions and the number of clusters without tuning, we propose a latent variable model that combines NNGP with automatic relevance determination [3] to extract necessary dimensions of latent space and infinite Gaussian mixture model [4] to infer the number of clusters. We integrate this model and visualization method into nonparametric Bayesian deep visualization (NPDV) that learns latent and visual coordinates jointly to render latent coordinates optimal for visualization. Experimental results on images and document datasets show that NPDV shows superior accuracy to existing methods, and it requires less training time than the neural clustering model because of its lower tuning cost. Furthermore, NPDV can reveal plausible latent clusters without labels.

Haruya Ishizuka, Daichi Mochihashi

FastDEC: Clustering by Fast Dominance Estimation

k-Nearest Neighbors (k-NN) graph is essential for the various graph mining tasks. In this work, we study the density-based clustering on the k-NN graph and propose FastDEC, a clustering framework by fast dominance estimation. The nearest density higher (NDH) relation and dominance-component (DC), more specifically their integration with the k-NN graph, are formally defined and theoretically analyzed. FastDEC includes two extensions to satisfy different clustering scenarios: FastDEC $$_D$$ D for partitioning data into clusters with arbitrary shapes, and FastDEC $$_K$$ K for K-Way partition. Firstly, a set of DCs is detected as the results of FastDEC $$_D$$ D by segmenting the given k-NN graph. Then, the K-Way partition is generated by selecting the top-K DCs in terms of the inter-dominance (ID) as the seeds, and assigning the remaining DCs to their nearest dominators.FastDEC can be viewed as a much faster, more robust, and k-NN based variant of the classical density-based clustering algorithm: Density Peak Clustering (DPC). DPC estimates the significance of data points from the density and geometric distance factors, while FastDEC innovatively uses the global rank of the dominator as an additional factor in the significance estimation. FastDEC naturally holds several critical characteristics: (1) excellent clustering performance; (2) easy to interpret and implement; (3) efficiency and robustness. Experiments on both the artificial and real datasets demonstrate that FastDEC outperforms the state-of-the-art density methods including DPC.

Geping Yang, Hongzhang Lv, Yiyang Yang, Zhiguo Gong, Xiang Chen, Zhifeng Hao

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for concept drift, while also being prohibitively expensive for clustering data streams. We therefore propose SECLEDS, a streaming variant of the k-medoids algorithm with constant memory footprint. SECLEDS has two unique properties: i) it uses multiple medoids per cluster, producing stable high-quality clusters, and ii) it handles concept drift using an intuitive Medoid Voting scheme for approximating cluster distances. Unlike existing adaptive algorithms that create new clusters for new concepts, SECLEDS follows a fundamentally different approach, where the clusters themselves evolve with an evolving stream. Using real and synthetic datasets, we empirically demonstrate that SECLEDS produces high-quality clusters regardless of drift, stream size, data dimensionality, and number of clusters. We compare against three popular stream and batch clustering algorithms. The state-of-the-art BanditPAM is used as an offline benchmark. SECLEDS achieves comparable F1 score to BanditPAM while reducing the number of required distance computations by 83.7%. Importantly, SECLEDS outperforms all baselines by 138.7% when the stream contains drift. We also cluster real network traffic, and provide evidence that SECLEDS can support network bandwidths of up to 1.08 Gbps while using the (expensive) dynamic time warping distance.

Azqa Nadeem, Sicco Verwer

Knowledge Integration in Deep Clustering

Constrained clustering that integrates knowledge in the form of constraints in a clustering process has been studied for more than two decades. Popular clustering algorithms such as K-means, spectral clustering and recent deep clustering already have their constrained versions, but they usually lack of expressiveness in the form of constraints. In this paper we consider prior knowledge expressing relations between some data points and their assignments to clusters in propositional logic and we show how a deep clustering framework can be extended to integrate this knowledge. To achieve this, we define an expert loss based on the weighted models of the logical formulas; the weights depend on the soft assignment of points to clusters dynamically computed by the deep learner. This loss is integrated in the deep clustering method. We show how it can be computed efficiently using Weighted Model Counting and decomposition techniques. This method has the advantages of both integrating general knowledge and being independent of the neural architecture. Indeed, we have integrated the expert loss into two well-known deep clustering algorithms (IDEC and SCAN). Experiments have been conducted to compare our systems IDEC-LK and SCAN-LK to state-of-the-art methods for pairwise and triplet constraints in terms of computational cost, clustering quality and constraint satisfaction. We show that IDEC-LK can achieve comparable results with these systems, which are tailored for these specific constraints. To show the flexibility of our approach to learn from high-level domain constraints, we have integrated implication constraints, and a new constraint, called span-limited constraint that limits the number of clusters a set of points can belong to. Some experiments are also performed showing that constraints on some points can be extrapolated to other similar points.

Nguyen-Viet-Dung Nghiem, Christel Vrain, Thi-Bich-Hanh Dao

Anomaly Detection

Frontmatter

ARES: Locally Adaptive Reconstruction-Based Anomaly Scoring

How can we detect anomalies: that is, samples that significantly differ from a given set of high-dimensional data, such as images or sensor data? This is a practical problem with numerous applications and is also relevant to the goal of making learning algorithms more robust to unexpected inputs. Autoencoders are a popular approach, partly due to their simplicity and their ability to perform dimension reduction. However, the anomaly scoring function is not adaptive to the natural variation in reconstruction error across the range of normal samples, which hinders their ability to detect real anomalies. In this paper, we empirically demonstrate the importance of local adaptivity for anomaly scoring in experiments with real data. We then propose our novel Adaptive Reconstruction Error-based Scoring approach, which adapts its scoring based on the local behaviour of reconstruction error over the latent space. We show that this improves anomaly detection performance over relevant baselines in a wide variety of benchmark datasets.

Adam Goodge, Bryan Hooi, See Kiong Ng, Wee Siong Ng

R2-AD2: Detecting Anomalies by Analysing the Raw Gradient

Neural networks follow a gradient-based learning scheme, adapting their mapping parameters by back-propagating the output loss. Samples unlike the ones seen during training cause a different gradient distribution. Based on this intuition, we design a novel semi-supervised anomaly detection method called R2-AD2. By analysing the temporal distribution of the gradient over multiple training steps, we reliably detect point anomalies in strict semi-supervised settings. Instead of domain dependent features, we input the raw gradient caused by the sample under test to an end-to-end recurrent neural network architecture. R2-AD2 works in a purely data-driven way, thus is readily applicable in a variety of important use cases of anomaly detection.

Jan-Philipp Schulze, Philip Sperl, Ana Răduțoiu, Carla Sagebiel, Konstantin Böttinger

Hop-Count Based Self-supervised Anomaly Detection on Attributed Networks

A number of approaches for anomaly detection on attributed networks have been proposed. However, most of them suffer from two major limitations: (1) they rely on unsupervised approaches which are intrinsically less effective due to the lack of supervisory signals of what information is relevant for capturing anomalies, and (2) they rely only on using local, e.g., one- or two-hop away node neighbourhood information, but ignore the more global context. Since anomalous nodes differ from normal nodes in structures and attributes, it is intuitive that the distance between anomalous nodes and their neighbors should be larger than that between normal nodes and their (also normal) neighbors if we remove the edges connecting anomalous and normal nodes. Thus, estimating hop counts based on both global and local contextual information can help us to construct an anomaly indicator. Following this intuition, we propose a hop-count based model (HCM) that achieves that. Our approach includes two important learning components: (1) Self-supervised learning task of predicting the shortest path length between a pair of nodes, and (2) Bayesian learning to train HCM for capturing uncertainty in learned parameters and avoiding overfitting. Extensive experiments on real-world attributed networks demonstrate that HCM consistently outperforms state-of-the-art approaches.

Tianjin Huang, Yulong Pei, Vlado Menkovski, Mykola Pechenizkiy

Deep Learning Based Urban Anomaly Prediction from Spatiotemporal Data

Urban anomalies are unusual occurrences like congestion, crowd gathering, road accidents, natural disasters, crime, etc., that cause disturbance in society and, in worst cases, may cause loss to property or life. Prediction of these anomalies at the early stages may prevent significant loss and help the government to maintain urban sustainability. However, predicting different kinds of urban anomaly is difficult because of its dynamic nature (e.g., holiday versus weekday, office versus shopping mall) and presence in various forms (e.g., road congestion may be caused by blocked driveway or accident). This work proposes a novel integrated framework UrbanAnom that utilizes a data fusion approach to predict urban anomaly data using gated graph convolution and recurrent unit. To evaluate our urban anomaly prediction framework, we utilize multi-stream datasets of New York City’s urban anomalies, points of interest (POI), roads, calendar, and weather that were collected via smart devices in the city. The extensive experiments show that our proposed framework outperforms baseline and state-of-the-art models.

Bhumika, Debasis Das

Detecting Anomalies with Autoencoders on Data Streams

Autoencoders have achieved impressive results in anomaly detection tasks by identifying anomalous data as instances that do not match their learned representation of normality. To this end, autoencoders are typically trained on large amounts of previously collected data before being deployed. However, in an online learning scenario, where a predictor has to operate on an evolving data stream and therefore continuously adapt to new instances, this approach is inadequate. Despite their success in offline anomaly detection, there has been little research leveraging autoencoders as anomaly detectors in such a setting. Therefore, in this work, we propose an approach for online anomaly detection with autoencoders and demonstrate its competitiveness against established online anomaly detection algorithms on multiple real-world datasets. We further address the issue of autoencoders gradually adapting to anomalies and thereby reducing their sensitivity to such data by introducing a simple modification to the models’ training approach. Our experimental results indicate that our solution achieves a larger gap between the losses on anomalous and normal instances than a conventional training procedure.

Lucas Cazzonelli, Cedric Kulbach

Anomaly Detection via Few-Shot Learning on Normality

One of the basic ideas for anomaly detection is to describe an enclosing boundary of normal data in order to identify cases outside as anomalies. In practice, however, normal data can consist of multiple classes, in which case the anomalies may appear not only outside such an enclosure but also in-between ‘normal’ classes. This paper addresses deep anomaly detection aimed at embedding ‘normal’ classes to individually close but mutually distant proximities. We introduce a problem setting where a limited number of labeled examples from each ‘normal’ class is available for training. Preparing such examples is much more feasible in practice than collecting examples of anomalies or labeling large-scale, normal data. We utilize the labeled examples in a margin-based loss reflecting the inter-class and the intra-class distances among the embedded labeled data. The two terms and their relations are derived from an information-theoretic principle. In an empirical study using image benchmark datasets, we show the advantage of the proposed method over existing deep anomaly detection models. We also show case studies using low-dimensional mappings to analyze the behavior of the proposed method.

Shin Ando, Ayaka Yamamoto

Interpretability and Explainability

Frontmatter

Interpretations of Predictive Models for Lifestyle-related Diseases at Multiple Time Intervals

Health screening is practiced in many countries to find asymptotic patients of diseases. There is a possibility that applying machine learning to health screening datasets enables predicting future medical conditions. We extend this approach by introducing interpretable machine learning and determining health screening items (attributes) that contribute to detecting lifestyle-related diseases in their early stages. Furthermore, we determine how contributing attributes change within one to four years of time. We target diabetes and chronic kidney disease (CKD), which are among the most common lifestyle-related diseases. We trained predictive models using XGBoost and estimated each attribute’s contribution levels using SHapley Additive exPlanations (SHAP). The results indicated that numerous attributes drastically change their levels of contribution over time. Many of the results matched our medical knowledge, but we also obtained unexpected outcomes. For example, we found that for predicting HbA1c and creatinine, which are indicators of diabetes and CKD, respectively, the contribution from alanine transaminase goes up as the time interval lengthens. Such findings can provide insights into the underlying mechanisms of how lifestyle-related diseases aggravate.

Yuki Oba, Taro Tezuka, Masaru Sanuki, Yukiko Wagatsuma

Fair and Efficient Alternatives to Shapley-based Attribution Methods

Interpretability of predictive machine learning models is critical for numerous application contexts that require decisions to be understood by end-users. It can be studied through the lens of local explainability and attribution methods that focus on explaining a specific decision made by a model for a given input, by evaluating the contribution of input features to the results, e.g. probability assigned to a class. Many attribution methods rely on a game-theoretic formulation of the attribution problem based on an approximation of the popular Shapley value, even if the underlying rationale motivating the use of this specific value is today questioned. In this paper we introduce the FESP - Fair-Efficient-Symmetric-Perturbation - attribution method as an alternative approach sharing relevant axiomatic properties with the Shapley value, and the Equal Surplus value (ES) commonly applied in cooperative games. Our results show that FESP and ES produce better attribution maps compared to state-of-the-art approaches in image and text classification settings.

Charles Condevaux, Sébastien Harispe, Stéphane Mussard

SMACE: A New Method for the Interpretability of Composite Decision Systems

Interpretability is a pressing issue for decision systems. Many post hoc methods have been proposed to explain the predictions of a single machine learning model. However, business processes and decision systems are rarely centered around a unique model. These systems combine multiple models that produce key predictions, and then apply rules to generate the final decision. To explain such decisions, we propose the Semi-Model-Agnostic Contextual Explainer (SMACE), a new interpretability method that combines a geometric approach for decision rules with existing interpretability methods for machine learning models to generate an intuitive feature ranking tailored to the end user. We show that established model-agnostic approaches produce poor results on tabular data in this setting, in particular giving the same importance to several features, whereas SMACE can rank them in a meaningful way.

Gianluigi Lopardo, Damien Garreau, Frédéric Precioso, Greger Ottosson

Calibrate to Interpret

Trustworthy Machine learning (ML) is driving a large number of ML community works in order to improve ML acceptance and adoption. The main aspect of trustworthy ML are the followings: fairness, uncertainty, robustness, explainability and formal guaranties. Each of these individual domains gains the ML community interest, visible by the number of related publications. However few works tackle the interconnection between these fields. In this paper we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. As the calibration of a given model changes the way it scores samples, and interpretation approaches often rely on these scores, it seems safe to assume that the confidence-calibration of a model interacts with our ability to interpret such model. In this paper, we show, in the context of networks trained on image classification tasks, to what extent interpretations are sensitive to confidence-calibration. It leads us to suggest a simple practice to improve the interpretation outcomes: Calibrate to Interpret.

Gregory Scafarto, Nicolas Posocco, Antoine Bonnefoy

Knowledge-Driven Interpretation of Convolutional Neural Networks

Since the widespread adoption of deep learning solutions in critical environments, the interpretation of artificial neural networks has become a significant issue. To this end, numerous approaches currently try to align human-level concepts with the activation patterns of artificial neurons. Nonetheless, they often understate two related aspects: the distributed nature of neural representations and the semantic relations between concepts. We explicitly tackled this interrelatedness by defining a novel semantic alignment framework to align distributed activation patterns and structured knowledge. In particular, we detailed a solution to assign to both neurons and their linear combinations one or more concepts from the WordNet semantic network. Acknowledging semantic links also enabled the clustering of neurons into semantically rich and meaningful neural circuits. Our empirical analysis of popular convolutional networks for image classification found evidence of the emergence of such neural circuits. Finally, we discovered neurons in neural circuits to be pivotal for the network to perform effectively on semantically related tasks. We also contribute by releasing the code that implements our alignment framework.

Riccardo Massidda, Davide Bacciu

Neural Networks with Feature Attribution and Contrastive Explanations

Interpretability is becoming an expected and even essential characteristic in GDPR Europe. In the majority of existing work on natural language processing (NLP), interpretability has focused on the problem of explanatory responses to questions like “Why p?” (identifying the causal attributes that support the prediction of "p.)” This type of local explainability focuses on explaining a single prediction made by a model for a single input, by quantifying the contribution of each feature to the predicted output class. Most of these methods are based on post-hoc approaches. In this paper, we propose a technique to learn centroid vectors concurrently while building the black-box in order to support answers to “Why p?” and “Why p and not q?,” where “q” is another class that is contrastive to “p.” Across multiple datasets, our approach achieves better results than traditional post-hoc methods.

Housam K. B. Babiker, Mi-Young Kim, Randy Goebel

Explaining Predictions by Characteristic Rules

Characteristic rules have been advocated for their ability to improve interpretability over discriminative rules within the area of rule learning. However, the former type of rule has not yet been used by techniques for explaining predictions. A novel explanation technique, called CEGA (Characteristic Explanatory General Association rules), is proposed, which employs association rule mining to aggregate multiple explanations generated by any standard local explanation technique into a set of characteristic rules. An empirical investigation is presented, in which CEGA is compared to two state-of-the-art methods, Anchors and GLocalX, for producing local and aggregated explanations in the form of discriminative rules. The results suggest that the proposed approach provides a better trade-off between fidelity and complexity compared to the two state-of-the-art approaches; CEGA and Anchors significantly outperform GLocalX with respect to fidelity, while CEGA and GLocalX significantly outperform Anchors with respect to the number of generated rules. The effect of changing the format of the explanations of CEGA to discriminative rules and using LIME and SHAP as local explanation techniques instead of Anchors are also investigated. The results show that the characteristic explanatory rules still compete favorably with rules in the standard discriminative format. The results also indicate that using CEGA in combination with either SHAP or Anchors consistently leads to a higher fidelity compared to using LIME as the local explanation technique.

Amr Alkhatib, Henrik Boström, Michalis Vazirgiannis

Session-Based Recommendation Along with the Session Style of Explanation

Explainability of recommendation algorithms is becoming an important characteristic in GDPR Europe. There are algorithms that try to provide explanations over graphs along with recommendations, but without focusing in user session information. In this paper, we study the problem of news recommendations using a heterogeneous graph and try to infer similarities between entities (i.e., sessions, articles, etc.) for predicting the next user click inside a user session. Moreover, we exploit meta paths to reveal semantic context about the session-article interactions and provide more accurate article recommendations along with robust explanations. We have experimentally compared our method against state-of-the-art algorithms on three real-life datasets. Our method outperforms its competitors in both accuracy and explainability. Finally, we have run a user study to measure the users’ satisfaction over different explanation styles and to find which explanations really help users to make more accurate decisions.

Panagiotis Symeonidis, Lidija Kirjackaja, Markus Zanker

Open Access

ProtoMIL: Multiple Instance Learning with Prototypical Parts for Whole-Slide Image Classification

The rapid development of histopathology scanners allowed the digital transformation of pathology. Current devices fastly and accurately digitize histology slides on many magnifications, resulting in whole slide images (WSI). However, direct application of supervised deep learning methods to WSI highest magnification is impossible due to hardware limitations. That is why WSI classification is usually analyzed using standard Multiple Instance Learning (MIL) approaches, that do not explain their predictions, which is crucial for medical applications. In this work, we fill this gap by introducing ProtoMIL, a novel self-explainable MIL method inspired by the case-based reasoning process that operates on visual prototypes. Thanks to incorporating prototypical features into objects description, ProtoMIL unprecedentedly joins the model accuracy and fine-grained interpretability, as confirmed by the experiments conducted on five recognized whole-slide image datasets.

Dawid Rymarczyk, Adam Pardyl, Jarosław Kraus, Aneta Kaczyńska, Marek Skomorowski, Bartosz Zieliński

PDF View full text

VCNet: A Self-explaining Model for Realistic Counterfactual Generation

Counterfactual explanation is a common class of methods to make local explanations of machine learning decisions. For a given instance, these methods aim to find the smallest modification of feature values that changes the predicted decision made by a machine learning model. One of the challenges of counterfactual explanation is the efficient generation of realistic counterfactuals. To address this challenge, we propose VCNet – Variational Counter Net – a model architecture that combines a predictor and a counterfactual generator that are jointly trained, for regression or classification tasks. VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem. Our contribution is the generation of counterfactuals that are close to the distribution of the predicted class. This is done by learning a variational autoencoder conditionally to the output of the predictor in a join-training fashion. We present an empirical evaluation on tabular datasets and across several interpretability metrics. The results are competitive with the state-of-the-art method.

Victor Guyomard, Françoise Fessant, Thomas Guyet, Tassadit Bouadi, Alexandre Termier

Ranking and Recommender Systems

Frontmatter

A Recommendation System for CAD Assembly Modeling Based on Graph Neural Networks

In computer-aided design (CAD), software tools support design engineers during the modeling of assemblies, i.e., products that consist of multiple components. Selecting the right components is a cumbersome task for design engineers as they have to pick from a large number of possibilities. Therefore, we propose to analyze a data set of past assemblies composed of components from the same component catalog, represented as connected, undirected graphs of components, in order to suggest the next needed component. In terms of graph machine learning, we formulate this as graph classification problem where each class corresponds to a component ID from a catalog and the models are trained to predict the next required component. In addition to pretraining of component embeddings, we recursively decompose the graphs to obtain data instances in a self-supervised fashion without imposing any node insertion order. Our results indicate that models based on graph convolution networks and graph attention networks achieve high predictive performance, reducing the cognitive load of choosing among 2,000 and 3,000 components by recommending the ten most likely components with 82–92% accuracy, depending on the chosen catalog.

Carola Gajek, Alexander Schiendorfer, Wolfgang Reif

AD-AUG: Adversarial Data Augmentation for Counterfactual Recommendation

Collaborative filtering (CF) has become one of the most popular and widely used methods in recommender systems, but its performance degrades sharply in practice due to the sparsity and bias of the real-world user feedback data. In this paper, we propose a novel counterfactual data augmentation framework AD-AUG to mitigate the impact of the imperfect training data and empower CF models. The key idea of AD-AUG is to answer the counterfactual question: “what would be a user’s feedback if his previous purchase history had been different?”. Our framework is composed of an augmenter model and a recommender model. The augmenter model aims to generate counterfactual user feedback based on the observed ones, while the recommender leverages the original and counterfactual user feedback data to provide the final recommendation. In particular, we design two adversarial learning-based methods from both “bottom-up” data-oriented and “top-down” model-oriented perspectives for counterfactual learning. Extensive experiments on three real-world datasets show that the AD-AUG can greatly enhance a wide range of CF models, demonstrating our framework’s effectiveness and generality.

Yifan Wang, Yifang Qin, Yu Han, Mingyang Yin, Jingren Zhou, Hongxia Yang, Ming Zhang

Bi-directional Contrastive Distillation for Multi-behavior Recommendation

Multi-behavior recommendation leverages auxiliary behaviors (e.g., view, add-to-cart) to improve the prediction for target behaviors (e.g., buy). Most existing works are built upon the assumption that all the auxiliary behaviors are positively correlated with target behaviors. However, we empirically find that such an assumption may not hold in real-world datasets. In fact, some auxiliary feedback is too noisy to be helpful, and it is necessary to restrict its influence for better performance. To this end, in this paper we propose a Bi-directional Contrastive Distillation (BCD) model for multi-behavior recommendation, aiming to distill valuable knowledge (about user preference) from the interplay of multiple user behaviors. Specifically, we design a forward distillation to distill the knowledge from auxiliary behaviors to help model target behaviors, and then a backward distillation to distill the knowledge from target behaviors to enhance the modelling of auxiliary behaviors. Through this circular learning, we can better extract the common knowledge from multiple user behaviors, where noisy auxiliary behaviors will not be involved. The experimental results on two real-world datasets show that our approach outperforms other counterparts in accuracy.

Yabo Chu, Enneng Yang, Qiang Liu, Yuting Liu, Linying Jiang, Guibing Guo

Improving Micro-video Recommendation by Controlling Position Bias

As the micro-video apps become popular, the numbers of micro-videos and users increase rapidly, which highlights the importance of micro-video recommendation. Although the micro-video recommendation can be naturally treated as the sequential recommendation, the previous sequential recommendation models do not fully consider the characteristics of micro-video apps, and in their inductive biases, the role of positions is not in accord with the reality in the micro-video scenario. Therefore, in the paper, we present a model named PDMRec (Position Decoupled Micro-video Recommendation). PDMRec applies separate self-attention modules to model micro-video information and the positional information and then aggregate them together, avoid the noisy correlations between micro-video semantics and positional information being encoded into the sequence embeddings. Moreover, PDMRec proposes contrastive learning strategies which closely match with the characteristics of the micro-video scenario, thus reducing the interference from micro-video positions in sequences. We conduct the extensive experiments on two real-world datasets. The experimental results shows that PDMRec outperforms existing multiple state-of-the-art models and achieves significant performance improvements.

Yisong Yu, Beihong Jin, Jiageng Song, Beibei Li, Yiyuan Zheng, Wei Zhuo

Mitigating Confounding Bias for Recommendation via Counterfactual Inference

Recommender systems usually face the bias problem, which creates a gap between recommendation results and the actual user preference. Existing works track this problem by assuming a specific bias and then develop a method to mitigate it, which lack universality. In this paper, we attribute the root reason of the bias problem to a causality concept: confounders, which are the variables that influence both which items the user will interact with and how they rate them. Meanwhile, the theory around causality says that some confounders may remain unobserved and are hard to calculate. Accordingly, we propose a novel Counterfactual Inference for Deconfounded Recommendation (CIDR) framework that enables the analysis of causes of biases from a causal perspective. We firstly analyze the causal-effect of confounders, and then utilize the biased observational data to capture a substitute of confounders on both user side and item side. Finally, we boost counterfactual inference to eliminate the causal-effect of such confounders in order to achieve a satisfactory recommendation with the help of user and item side information (e.g., user post-click feedback data, item multi-model data). For evaluation, we compare our method with several state-of-the-art debias methods on three real-world datasets, in addition to new causal-based approaches. Extensive experiments demonstrate the effectiveness of our proposed method.

Ming He, Xinlei Hu, Changshu Li, Xin Chen, Jiwen Wang

Recommending Related Products Using Graph Neural Networks in Directed Graphs

Related product recommendation (RPR) is pivotal to the success of any e-commerce service. In this paper, we deal with the problem of recommending related products i.e., given a query product, we would like to suggest top-k products that have high likelihood to be bought together with it. Our problem implicitly assumes asymmetry i.e., for a phone, we would like to recommend a suitable phone case, but for a phone case, it may not be apt to recommend a phone because customers typically would purchase a phone case only while owning a phone. We also do not limit ourselves to complementary or substitute product recommendation. For example, for a specific night wear t-shirt, we can suggest similar t-shirts as well as track pants. So, the notion of relatedness is subjective to the query product and dependent on customer preferences. Further, various factors such as product price, availability lead to presence of selection bias in the historical purchase data, that needs to be controlled for while training related product recommendations model. These challenges are orthogonal to each other deeming our problem non-trivial. To address these, we propose DAEMON, a novel Graph Neural Network (GNN) based framework for related product recommendation, wherein the problem is formulated as a node recommendation task on a directed product graph. In order to capture product asymmetry, we employ an asymmetric loss function and learn dual embeddings for each product, by appropriately aggregating features from its neighborhood. DAEMON leverages multi-modal data sources such as catalog metadata, browse behavioral logs to mitigate selection bias and generate recommendations for cold-start products. Extensive offline experiments show that DAEMON outperforms state-of-the-art baselines by 30–160% in terms of HitRate and MRR for the node recommendation task. In the case of link prediction task, DAEMON presents 4–16% AUC gains over state-of-the-art baselines. DAEMON delivers significant improvement in revenue and sales as measured through an A/B experiment.

Srinivas Virinchi, Anoop Saladi, Abhirup Mondal

A U-Shaped Hierarchical Recommender by Multi-resolution Collaborative Signal Modeling

Items (users) in a recommender system inherently exhibit hierarchical structures with respect to interactions. Although explicit hierarchical structures are often missing in real-world recommendation scenarios, recent research shows that exploring implicit hierarchical structures for items (users) would largely benefit recommender systems. In this paper, we model user (item) implicit hierarchical structures to capture user-item relationships at various resolution scales resulting in better preferences customization. Specifically, we propose a U-shaped Graph Convolutional Network-based recommender system, namely UGCN, that adopts a hierarchical encoding-decoding process with a message-passing mechanism to construct user (item) implicit hierarchical structures and capture multi-resolution relationships simultaneously. To verify the effectiveness of the UGCN recommender, we conduct experiments on three public datasets. Results have confirmed that the UGCN recommender achieves overall prediction improvements over state-of-the-art models, simultaneously demonstrating a higher recommendation coverage ratio and better-personalized results.

Peng Yi, Xiongcai Cai, Ziteng Li

Basket Booster for Prototype-based Contrastive Learning in Next Basket Recommendation

Next basket recommendation seeks to model the correlation of items and mine users’ interests hidden in basket sequences, and tries to infer a set of items that tend to be adopted in the next session with the mined information. However, the feedback provided by users often involves only a small fraction of millions to billions of items. Sparse data makes it hard for model to infer high-quality representations for basket sequences, which further leads to poor recommendation. Inspired by the recent success of representation learning in some fields, e.g., computer vision and clustering, we propose a basket booster for prototype-based contrastive learning (BPCL) in next basket recommendation. A correlative basket booster is designed to mine self-supervised signals just from raw data and make augmentation for baskets. To our best knowledge, this is the first work to promote learning of prototype representation through basket augmentation, which helps overcome the difficulties caused by data sparsity and leads to a better next basket recommendation performance. Extensive experiments on three public real-world datasets demonstrate that the proposed BPCL method achieves better performance than the existing state-of-the-art methods.

Ting-Ting Su, Zhen-Yu He, Man-Sheng Chen, Chang-Dong Wang

Graph Contrastive Learning with Adaptive Augmentation for Recommendation

Graph Convolutional Network (GCN) has been one of the most popular technologies in recommender systems, as it can effectively model high-order relationships. However, these methods usually suffer from two problems: sparse supervision signal and noisy interactions. To address these problems, graph contrastive learning is applied for GCN-based recommendation. The general framework of graph contrastive learning is first to perform data augmentation on the input graph to get two graph views and then maximize the agreement of representations in these views. Despite the effectiveness, existing methods ignore the differences in the impact of nodes and edges when performing data augmentation, which will degrade the quality of the learned representations. Meanwhile, they usually adopt manual data augmentation schemes, limiting the generalization of models. We argue that the data augmentation scheme should be learnable and adaptive to the inherent patterns in the graph structure. Thus, the model can learn representations that remain invariant to perturbations of unimportant structures while demanding fewer resources. In this work, we propose a novel Graph Contrastive learning framework with Adaptive data augmentation for Recommendation (GCARec). Specifically, for adaptive augmentation, we first calculate the retaining probability of each edge based on the attention mechanism and then sample edges according to the probability with a Gumbel Softmax. In addition, the adaptive data augmentation scheme is based on the neural network and requires no domain knowledge, making it learnable and generalizable. Extensive experiments on three real-world datasets show that GCARec outperforms state-of-the-art baselines.

Mengyuan Jing, Yanmin Zhu, Tianzi Zang, Jiadi Yu, Feilong Tang

Multi-interest Extraction Joint with Contrastive Learning for News Recommendation

News recommendation techniques aim to recommend candidate news to target user that he may be interested in, according to his browsed historical news. At present, existing works usually tend to represent user reading interest using a single vector during the modeling procedure. Actually, it is obviously that users usually have multiple and diverse interest in reality, such as sports, entertainment and so on. Thus it is irrational to represent user sophisticated semantic interest simply utilizing a single vector, which may conceal fine-grained information. In this work, we propose a novel method combining multi-interest extraction with contrastive learning, named MIECL, to tackle the above problem. Specifically, first, we construct several interest prototypes and design a multi-interest user encoder to learn multiple user representations under different interest conditions simultaneously. Then we adopt a graph-enhanced user encoder to enrich user corresponding semantic representation under each interest background through aggregating relevant information from neighbors. Finally, we contrast user multi-interest representations and interest prototype vectors to optimize the user representations themselves, in order to promote dissimilar semantic interest away from each other. We conduct experiments on two real-world news recommendation datasets MIND-Large, MIND-Small and empirical results demonstrate the effectiveness of our approach from multiple perspectives.

Shicheng Wang, Shu Guo, Lihong Wang, Tingwen Liu, Hongbo Xu

Transfer and Multitask Learning

Frontmatter

On the Relationship Between Disentanglement and Multi-task Learning

One of the main arguments behind studying disentangled representations is the assumption that they can be easily reused in different tasks. At the same time finding a joint, adaptable representation of data is one of the key challenges in the multi-task learning setting. In this paper, we take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing. We perform a thorough empirical study of the representations obtained by neural networks trained on automatically generated supervised tasks. Using a set of standard metrics we show that disentanglement appears naturally during the process of multi-task neural network training.

Łukasz Maziarka, Aleksandra Nowak, Maciej Wołczyk, Andrzej Bedychaj

InCo: Intermediate Prototype Contrast for Unsupervised Domain Adaptation

Unsupervised domain adaptation aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Recently, self-supervised learning (e.g. contrastive learning) has been extended to cross-domain scenarios for reducing domain discrepancy in either instance-to-instance or instance-to-prototype manner. Although achieving remarkable progress, when the domain discrepancy is large, these methods would not perform well as a large shift leads to incorrect initial pseudo labels. To mitigate the performance degradation caused by large domain shifts, we propose to construct multiple intermediate prototypes for each class and perform cross-domain instance-to-prototype based contrastive learning with these constructed intermediate prototypes. Compared with direct cross-domain self-supervised learning, the intermediate prototypes could contain more accurate label information and achieve better performance. Besides, to learn discriminative features and perform domain-level distribution alignment, we perform intra-domain contrastive learning and domain adversarial training. Thus, the model could learn both discriminative and invariant features. Extensive experiments are conducted on three public benchmarks (ImageCLEF, Office-31, and Office-Home), and the results show that the proposed method outperforms baseline methods.

Yuntao Du, Hongtao Luo, Haiyang Yang, Juan Jiang, Chongjun Wang

Fast and Accurate Importance Weighting for Correcting Sample Bias

Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data.

Antoine de Mathelin, Francois Deheeger, Mathilde Mougeot, Nicolas Vayatis

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods. The codes are publicly available at https://github.com/yunfei-teng/DCO .

Yunfei Teng, Anna Choromanska, Murray Campbell, Songtao Lu, Parikshit Ram, Lior Horesh

Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular for improved prediction and efficient use of limited resources. Fine-tuning requires identification of best models to transfer-learn from and quantifying transferability prevents expensive re-training on all of the candidate models/tasks pairs. In this paper, we show that the statistical problems with covariance estimation drive the poor performance of H-score—a common baseline for newer metrics—and propose shrinkage-based estimator. This results in up to $$80\%$$ 80 % absolute gain in H-score correlation performance, making it competitive with the state-of-the-art LogME measure. Our shrinkage-based H-score is 3–10 times faster to compute compared to LogME. Additionally, we look into a less common setting of target (as opposed to source) task selection. We demonstrate previously overlooked problems in such settings with different number of labels, class-imbalance ratios etc. for some recent metrics e.g., NCE, LEEP that resulted in them being misrepresented as leading measures. We propose a correction and recommend measuring correlation performance against relative accuracy in such settings. We support our findings with $$\sim $$ ∼ 164,000 (fine-tuning trials) experiments on both vision models and graph neural networks.

Shibal Ibrahim, Natalia Ponomareva, Rahul Mazumder

Learning to Teach Fairness-Aware Deep Multi-task Learning

Fairness-aware learning mainly focuses on single task learning (STL). The fairness implications of multi-task learning (MTL) have only recently been considered and a seminal approach has been proposed that considers the fairness-accuracy trade-off for each task and the performance trade-off among different tasks. Instead of a rigid fairness-accuracy trade-off formulation, we propose a flexible approach that learns how to be fair in a MTL setting by selecting which objective (accuracy or fairness) to optimize at each step. We introduce the L2T-FMT algorithm that is a teacher-student network trained collaboratively; the student learns to solve the fair MTL problem while the teacher instructs the student to learn from either accuracy or fairness, depending on what is harder to learn for each task. Moreover, this dynamic selection of which objective to use at each step for each task reduces the number of trade-off weights from 2T to T, where T is the number of tasks. Our experiments on three real datasets show that L2T-FMT improves on both fairness (12–19%) and accuracy (up to 2%) over state-of-the-art approaches.

Arjun Roy, Eirini Ntoutsi

Backmatter

Title: Machine Learning and Knowledge Discovery in Databases
Editors: Massih-Reza Amini
Stéphane Canu
Asja Fischer
Tias Guns
Petra Kralj Novak
Grigorios Tsoumakas
Publisher: Springer International Publishing
Electronic ISBN: 978-3-031-26387-3
Print ISBN: 978-3-031-26386-6
DOI: https://doi.org/10.1007/978-3-031-26387-3

Springer Professional

About this book

Table of Contents

Frontmatter

Clustering and Dimensionality Reduction

Frontmatter

Pass-Efficient Randomized SVD with Boosted Accuracy

CDPS: Constrained DTW-Preserving Shapelets

Structured Nonlinear Discriminant Analysis

LSCALE: Latent Space Clustering-Based Active Learning for Node Classification

Powershap: A Power-Full Shapley Feature Selection Method

Automated Cancer Subtyping via Vector Quantization Mutual Information Maximization

Wasserstein t-SNE

Nonparametric Bayesian Deep Visualization

FastDEC: Clustering by Fast Dominance Estimation

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Knowledge Integration in Deep Clustering

Anomaly Detection

Frontmatter

ARES: Locally Adaptive Reconstruction-Based Anomaly Scoring

R2-AD2: Detecting Anomalies by Analysing the Raw Gradient

Hop-Count Based Self-supervised Anomaly Detection on Attributed Networks

Deep Learning Based Urban Anomaly Prediction from Spatiotemporal Data

Detecting Anomalies with Autoencoders on Data Streams

Anomaly Detection via Few-Shot Learning on Normality

Interpretability and Explainability

Frontmatter

Interpretations of Predictive Models for Lifestyle-related Diseases at Multiple Time Intervals

Fair and Efficient Alternatives to Shapley-based Attribution Methods

SMACE: A New Method for the Interpretability of Composite Decision Systems

Calibrate to Interpret

Knowledge-Driven Interpretation of Convolutional Neural Networks

Neural Networks with Feature Attribution and Contrastive Explanations

Explaining Predictions by Characteristic Rules

Session-Based Recommendation Along with the Session Style of Explanation

ProtoMIL: Multiple Instance Learning with Prototypical Parts for Whole-Slide Image Classification

VCNet: A Self-explaining Model for Realistic Counterfactual Generation

Ranking and Recommender Systems

Frontmatter

A Recommendation System for CAD Assembly Modeling Based on Graph Neural Networks

AD-AUG: Adversarial Data Augmentation for Counterfactual Recommendation

Bi-directional Contrastive Distillation for Multi-behavior Recommendation

Improving Micro-video Recommendation by Controlling Position Bias

Mitigating Confounding Bias for Recommendation via Counterfactual Inference

Recommending Related Products Using Graph Neural Networks in Directed Graphs

A U-Shaped Hierarchical Recommender by Multi-resolution Collaborative Signal Modeling

Basket Booster for Prototype-based Contrastive Learning in Next Basket Recommendation

Graph Contrastive Learning with Adaptive Augmentation for Recommendation

Multi-interest Extraction Joint with Contrastive Learning for News Recommendation

Transfer and Multitask Learning

Frontmatter

On the Relationship Between Disentanglement and Multi-task Learning

InCo: Intermediate Prototype Contrast for Unsupervised Domain Adaptation

Fast and Accurate Importance Weighting for Correcting Sample Bias

Overcoming Catastrophic Forgetting via Direction-Constrained Optimization

Newer is Not Always Better: Rethinking Transferability Metrics, Their Peculiarities, Stability and Performance

Learning to Teach Fairness-Aware Deep Multi-task Learning

Backmatter

Premium Partner