Skip to main content

2021 | Buch

Advances in Knowledge Discovery and Data Mining

25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part III

herausgegeben von: Prof. Kamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Dr. Jaideep Srivastava, Assist. Prof. Tanmoy Chakraborty

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The 3-volume set LNAI 12712-12714 constitutes the proceedings of the 25th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2021, which was held during May 11-14, 2021.

The 157 papers included in the proceedings were carefully reviewed and selected from a total of 628 submissions. They were organized in topical sections as follows:

Part I: Applications of knowledge discovery and data mining of specialized data;

Part II: Classical data mining; data mining theory and principles; recommender systems; and text analytics;

Part III: Representation learning and embedding, and learning from data.

Inhaltsverzeichnis

Frontmatter
Correction to: Rule Injection-Based Generative Adversarial Imitation Learning for Knowledge Graph Reasoning

In the originally published version of chapter 27, the name of the author Xiaoying Chen was spelled incorrectly. This has been corrected.

Sheng Wang, Xiaoying Chen, Shengwu Xiong

Representation Learning and Embedding

Frontmatter
Episode Adaptive Embedding Networks for Few-Shot Learning

Few-shot learning aims to learn a classifier using a few labelled instances for each class. Metric-learning approaches for few-shot learning embed instances into a high-dimensional space and conduct classification based on distances among instance embeddings. However, such instance embeddings are usually shared across all episodes and thus lack the discriminative power to generalize classifiers according to episode-specific features. In this paper, we propose a novel approach, namely Episode Adaptive Embedding Network (EAEN), to learn episode-specific embeddings of instances. By leveraging the probability distributions of all instances in an episode at each channel-pixel embedding dimension, EAEN can not only alleviate the overfitting issue encountered in few-shot learning tasks, but also capture discriminative features specific to an episode. To empirically verify the effectiveness and robustness of EAEN, we have conducted extensive experiments on three widely used benchmark datasets, under various combinations of different generic embedding backbones and different classifiers. The results show that EAEN significantly improves classification accuracy about 10–20% in different settings over the state-of-the-art methods.

Fangbing Liu, Qing Wang
Universal Representation for Code

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets – spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.

Linfeng Liu, Hoan Nguyen, George Karypis, Srinivasan Sengamedu
Self-supervised Adaptive Aggregator Learning on Graph

Neighborhood aggregation is a key operation in most of the graph neural network-based embedding solutions. Each type of aggregator typically has its best application domain. The single type of aggregator for aggregation adopted by most existing embedding solutions may inevitably result in information loss. To keep the diversity of information during aggregation, it is mandatory to use the most appropriate different aggregators for specific graphs or subgraphs. However, when and what aggregators to be used remain mostly unsolved. To tackle this problem, we introduce a general contrastive learning framework called Cooker, which supports self-supervised adaptive aggregator learning. Specifically, we design three pretext tasks for self-supervised learning and apply multiple aggregators in our model. By doing so, our algorithm can keep the peculiar features of different aggregators in node embeddings and minimize the information loss. Experiment results on node classification and link prediction tasks show that Cooker outperforms the state of the art baselines in all three compared datasets. A set of ablation experiments also demonstrate that the integration of more types of aggregators generally improves the algorithm’s performance and stability.

Bei Lin, Binli Luo, Jiaojiao He, Ning Gui
A Fast Algorithm for Simultaneous Sparse Approximation

Simultaneous sparse approximation problems arise in several domains, such as signal processing and machine learning. Given a dictionary matrix X of size $$m {\times } n$$ m × n and a target matrix Y of size $$m {\times } N$$ m × N , we consider the classical problem of selecting k columns from X that can be used to linearly approximate the entire matrix Y. The previous fastest nontrivial algorithms for this problem have a running time of O(mnN). We describe a significantly faster algorithm with a running time of $$O(km(n+N))$$ O ( k m ( n + N ) ) with accuracy that compares favorably with the slower algorithms. We also derive bounds on the accuracy of the selections computed by our algorithm. These bounds show that our results are typically within a few percentage points of the optimal solution.

Guihong Wan, Haim Schweitzer
STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation Learning

In this paper, we present a novel multi-modal deep neural network architecture that uses speech and text entanglement for learning phonetically sound spoken-word representations. STEPs-RL is trained in a supervised manner to predict the phonetic sequence of a target spoken-word using its contextual spoken word’s speech and text, such that the model encodes its meaningful latent representations. Unlike existing work, we have used text along with speech for auditory representation learning to capture semantical and syntactical information along with the acoustic and temporal information. The latent representations produced by our model were not only able to predict the target phonetic sequences with an accuracy of 89.47% but were also able to achieve competitive results to textual word representation models, Word2Vec & FastText (trained on textual transcripts), when evaluated on four widely used word similarity benchmark datasets. In addition, investigation of the generated vector space also demonstrated the capability of the proposed model to capture the phonetic structure of the spoken-words. To the best of our knowledge, none of the existing works use speech and text entanglement for learning spoken-word representation, which makes this work the first of its kind.

Prakamya Mishra
RW-GCN: Training Graph Convolution Networks with Biased Random Walk for Semi-supervised Classification

Graph convolution networks (GCN) have recently been one of the most powerful methods in various tasks such as node classification and graph clustering. In the present study, we propose RW-GCN which utilizes biased random walk to assist in feature aggregation and GCN training process. RW-GCN employs biased random walks to generate node pairs. These pairs can be utilized to build a symmetric matrix to replace the adjacent matrix for GCN training. With these pairs generated above, we train the latent representation vectors by skip-gram. Our experiments demonstrate that compared to GCN, our model generates better results on node classification tasks performed on multiple datasets. In this way, both homophily and structural equivalence can be considered. Results of experiments on three datasets are presented to prove the availability of our method.

Yinzhe Li, Zhijie Ban
Loss-Aware Pattern Inference: A Correction on the Wrongly Claimed Limitations of Embedding Models

Knowledge graph embedding models (KGEs) are actively utilized in many of the AI-based tasks, especially link prediction. Despite achieving high performances, one of the crucial aspects of KGEs is their capability of inferring relational patterns, such as symmetry, antisymmetry, inversion, and composition. Among the many reasons, the inference capability of embedding models is highly affected by the used loss function. However, most of the existing models failed to consider this aspect in their inference capabilities. In this paper, we show that disregarding loss functions results in inaccurate or even wrong interpretation from the capability of the models. We provide deep theoretical investigations of the already exiting KGE models on the example of the TransE model. To the best of our knowledge, so far, this has not been comprehensively investigated. We show that by a proper selection of the loss function for training a KGE e.g., TransE, the main inference limitations are mitigated. The provided theories together with the experimental results confirm the importance of loss functions for training KGE models and improving their performance.

Mojtaba Nayyeri, Chengjin Xu, Yadollah Yaghoobzadeh, Sahar Vahdati, Mirza Mohtashim Alam, Hamed Shariat Yazdi, Jens Lehmann
SST-GNN: Simplified Spatio-Temporal Traffic Forecasting Model Using Graph Neural Network

To capture spatial relationships and temporal dynamics in traffic data, spatio-temporal models for traffic forecasting have drawn significant attention in recent years. Most of the recent works employed graph neural networks(GNN) with multiple layers to capture the spatial dependency. However, road junctions with different hop-distance can carry distinct traffic information which should be exploited separately but existing multi-layer GNNs are incompetent to discriminate between their impact. Again, to capture the temporal interrelationship, recurrent neural networks are common in state-of-the-art approaches that often fail to capture long-range dependencies. Furthermore, traffic data shows repeated patterns in a daily or weekly period which should be addressed explicitly. To address these limitations, we have designed a Simplified Spatio-temporal Traffic forecasting GNN(SST-GNN) that effectively encodes the spatial dependency by separately aggregating different neighborhood representations rather than with multiple layers and capture the temporal dependency with a simple yet effective weighted spatio-temporal aggregation mechanism. We capture the periodic traffic patterns by using a novel position encoding scheme with historical and current data in two different models. With extensive experimental analysis, we have shown that our model (Code is available at github.com/AmitRoy7781/SST-GNN ) has significantly outperformed the state-of-the-art models on three real-world traffic datasets from the Performance Measurement System (PeMS).

Amit Roy, Kashob Kumar Roy, Amin Ahsan Ali, M. Ashraful Amin, A. K. M. Mahbubur Rahman
VIKING: Adversarial Attack on Network Embeddings via Supervised Network Poisoning

Learning low-level node embeddings using techniques from network representation learning is useful for solving downstream tasks such as node classification and link prediction. An important consideration in such applications is the robustness of the embedding algorithms against adversarial attacks, which can be examined by performing perturbation on the original network. An efficient perturbation technique can degrade the performance of network embeddings on downstream tasks. In this paper, we study network embedding algorithms from an adversarial point of view and observe the effect of poisoning the network on downstream tasks. We propose VIKING, a supervised network poisoning strategy that outperforms the state-of-the-art poisoning methods by up to $$18\%$$ 18 % on the original network structure. We also extend VIKING to a semi-supervised attack setting and show that it is comparable to its supervised counterpart.

Viresh Gupta, Tanmoy Chakraborty
Self-supervised Graph Representation Learning with Variational Inference

Graph representation learning aims to convert the graph-structured data into a low dimensional space in which the graph structural information and graph properties are maximumly preserved. Graph Neural Networks (GNN)-based methods have shown to be effective in dealing with the graph representation learning task. However, most GNN-based methods belong to supervised learning, which depends heavily on the data labels that are difficult to access in real-world scenarios. In addition, the inherent incompleteness in data will further degrade the performance of GNN-based models. In this paper, we propose a novel self-supervised graph representation learning model with variational inference. First, we strengthen the semantic relation between node and graph level in a self-supervised manner to alleviate the issue of over-dependence on data labels. Second, we utilize the variational inference technique to capture the general pattern underlying the data, thus guaranteeing the model robustness under some data missing circumstances. Extensive experiments on three widely used citation network datasets show that our proposed method has achieved or matched state-of-the-art results on link prediction and node classification tasks.

Zihan Liao, Wenxin Liang, Han Liu, Jie Mu, Xianchao Zhang
Manifold Approximation and Projection by Maximizing Graph Information

Graph representation learning is an effective method to represent graph data in a low dimensional space, which facilitates graph analytic tasks. The existing graph representation learning algorithms suffer from certain constraints. Random walk based methods and graph convolutional neural networks, tend to capture graph local information and fail to preserve global structural properties of graphs. We present MAPPING (Manifold APproximation and Projection by maximizINg Graph information), an unsupervised deep efficient method for learning node representations, which is capable of synchronously capturing both local and global structural information of graphs. In line with applying graph convolutional networks to construct initial representation, the proposed approach employs an information maximization process to attain representations to capture global graph structures. Furthermore, in order to preserve graph local information, we extend a novel manifold learning technique to the field of graph learning. The output of MAPPING can be easily exploited by downstream machine learning models on graphs. We demonstrate our competitive performance on three citation benchmarks. Our approach outperforms the baseline methods significantly.

Bahareh Fatemi, Soheila Molaei, Hadi Zare, Shirui Pan
Learning Attention-Based Translational Knowledge Graph Embedding via Nonlinear Dynamic Mapping

Knowledge graph embedding has become a promising method for knowledge graph completion. It aims to learn low-dimensional embeddings in continuous vector space for each entity and relation. It remains challenging to learn accurate embeddings for complex multi-relational facts. In this paper, we propose a new translation-based embedding method named ATransD-NL to address the following two observations. First, most existing translational methods do not consider contextual information that have been proved useful for improving performance of link prediction. Our method learns attention-based embeddings for each triplet taking into account influence of one-hop or potentially multi-hop neighbourhood entities. Second, we apply nonlinear dynamic projection of head and tail entities to relational space, to capture nonlinear correlations among entities and relations due to complex multi-relational facts. As an extension of TransD, our model only introduces one more extra parameter, giving a good tradeoff between model complexity and the state-of-the-art predictive accuracy. Compared with state-of-the-art translation-based methods and the neural-network based methods, experiment results show that our method delivers substantial improvements over baselines on the MeanRank metric of link prediction, e.g., an improvement of 35.6% over the attention-based graph embedding method KBGAT and an improvement of 64% over the translational method TransMS on WN18 database, with comparable performance on the Hits@10 metric.

Zhihao Wang, Honggang Xu, Xin Li, Yuxin Deng
Multi-Grained Dependency Graph Neural Network for Chinese Open Information Extraction

Recent neural Open Information Extraction (OpenIE) models have improved traditional rule-based systems significantly for Chinese OpenIE tasks. However, these neural models are mainly word-based, suffering from word segmentation errors in Chinese. They utilize dependency information in a shallow way, making multi-hop dependencies hard to capture. This paper proposes a Multi-Grained Dependency Graph Neural Network (MGD-GNN) model to address these problems. MGD-GNN constructs a multi-grained dependency (MGD) graph with dependency edges between words and soft-segment edges between words and characters. Our model makes predictions based on character features while still has word boundary knowledge through word-character soft-segment edges. MGD-GNN updates node representations using a deep graph neural network to fully exploit the topology structure of the MGD graph and capture multi-hop dependencies. Experiments on a large-scale Chinese OpenIE dataset SpanSAOKE shows that our model could alleviate the propagation of word segmentation errors and use dependency information more effectively, giving significant improvements over previous neural OpenIE models.

Zhiheng Lyu, Kaijie Shi, Xin Li, Lei Hou, Juanzi Li, Binheng Song
Human-Understandable Decision Making for Visual Recognition

The widespread use of deep neural networks has achieved substantial success in many tasks. However, there still exists a huge gap between the operating mechanism of deep learning models and human-understandable decision making, so that humans cannot fully trust the predictions made by these models. To date, little work has been done on how to align the behaviors of deep learning models with human perception in order to train a human-understandable model. To fill this gap, we propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process. Our proposed model mimics the process of perceiving conceptual parts from images and assessing their relative contributions towards the final recognition. The effectiveness of our proposed model is evaluated on two classical visual recognition tasks. The experimental results and analysis confirm our model is able to provide interpretable explanations for its predictions, but also maintain competitive recognition accuracy.

Xiaowei Zhou, Jie Yin, Ivor Tsang, Chen Wang
LightCAKE: A Lightweight Framework for Context-Aware Knowledge Graph Embedding

Knowledge graph embedding (KGE) models learn to project symbolic entities and relations into a continuous vector space based on the observed triplets. However, existing KGE models cannot make a proper trade-off between the graph context and the model complexity, which makes them still far from satisfactory. In this paper, we propose a lightweight framework named LightCAKE for context-aware KGE. LightCAKE explicitly models the graph context without introducing redundant trainable parameters, and uses an iterative aggregation strategy to integrate the context information into the entity/relation embeddings. As a generic framework, it can be used with many simple KGE models to achieve excellent results. Finally, extensive experiments on public benchmarks demonstrate the efficiency and effectiveness of our framework.

Zhiyuan Ning, Ziyue Qiao, Hao Dong, Yi Du, Yuanchun Zhou
Transferring Domain Knowledge with an Adviser in Continuous Tasks

Recent advances in Reinforcement Learning (RL) have surpassed human-level performance in many simulated environments. However, existing reinforcement learning techniques are incapable of explicitly incorporating already known domain-specific knowledge into the learning process. Therefore, the agents have to explore and learn the domain knowledge independently through a trial and error approach, which consumes both time and resources to make valid responses. Hence, we adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser, which allows integrating domain knowledge in the form of pre-learned policies or pre-defined relationships to enhance the agent’s learning process. Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.

Rukshan Wijesinghe, Kasun Vithanage, Dumindu Tissera, Alex Xavier, Subha Fernando, Jayathu Samarawickrama
Inferring Hierarchical Mixture Structures: A Bayesian Nonparametric Approach

We present a Bayesian Nonparametric model for Hierarchical Clustering (HC). Such a model has two main components. The first component is the random walk process from parent to child in the hierarchy and we apply nested Chinese Restaurant Process (nCRP). Then, the second part is the diffusion process from parent to child where we employ Hierarchical Dirichlet Process Mixture Model (HDPMM). This is different from the common choice which is Gaussian-to-Gaussian. We demonstrate the properties of the model and propose a Markov Chain Monte Carlo procedure with elegantly analytical updating steps for inferring the model variables. Experiments on the real-world datasets show that our method obtains reasonable hierarchies and remarkable empirical results according to some well known metrics.

Weipeng Huang, Nishma Laitonjam, Guangyuan Piao, Neil J. Hurley
Quality Control for Hierarchical Classification with Incomplete Annotations

Hierarchical classification requires annotations with hierarchical class structures. Although crowdsourcing services are inexpensive ways to collect annotations for hierarchical classification, the results are often incomplete because of the workers’ limited abilities that unable to label all classes, and crowdsourcing platforms also allow suspensions during the labeling flow. Unfortunately, existing quality control approaches for refining low-quality annotations discard those incomplete annotations, and this limits the quality improvement of the results. We propose a quality control method for hierarchical classification that leverages incomplete annotations and the similarity between classes in the hierarchy for estimating the true leaf classes. Our method probabilistically models the labeling process and estimates the true leaf classes by considering the class-likelihood of samples and workers’ class-dependent expertise. Our method embeds the class hierarchy into a latent space and represents samples as well as the worker’s prototypical samples for classes (prototypes) as vectors in this space. The similarities between the vectors in the latent space are used to estimate the true leaf classes. The experimental results on both real-world and synthetic datasets demonstrate the effectiveness of our method and its superiority over the baseline methods.

Masafumi Enomoto, Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada, Takeshi Okadome

Learning from Data

Frontmatter
Learning Discriminative Features Using Multi-label Dual Space

Multi-label learning handles instances associated with multiple class labels. The original label space is a logical matrix with entries from the Boolean domain $$\in \left\{ 0,1 \right\} $$ ∈ 0 , 1 . Logical labels are not able to show the relative importance of each semantic label to the instances. The vast majority of existing methods map the input features to the label space using linear projections with taking into consideration the label dependencies using logical label matrix. However, the discriminative features are learned using one-way projection from the feature representation of an instance into a logical label space. Given that there is no manifold in the learning space of logical labels, which limits the potential of learned models. In this work, inspired from a real-world example in image annotation to reconstruct an image from the label importance and feature weights. We propose a novel method in multi-label learning to learn the projection matrix from the feature space to semantic label space and projects it back to the original feature space using encoder-decoder deep learning architecture. The key intuition which guides our method is that the discriminative features are identified due to map the features back and forth using two linear projections. To the best of our knowledge, this is one of the first attempts to study the ability to reconstruct the original features from the label manifold in multi-label learning. We show that the learned projection matrix identifies a subset of discriminative features across multiple semantic labels. Extensive experiments on real-world datasets show the superiority of the proposed method.

Ali Braytee, Wei Liu
AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering

Automated clustering automatically builds appropriate clustering models. The existing automated clustering methods are widely based on meta-learning. However, it still faces specific challenges: lacking comprehensive meta-features for meta-learning and general clustering validation index (CVI) as objective function. Therefore, we propose a novel automated clustering method named AutoCluster to address these problems, which is mainly composed of Clustering-oriented Meta-feature Extraction (CME) and Multi-CVIs Clustering Ensemble Construction (MC $$^2$$ 2 EC). CME captures the meta-features from spatial randomness and different learning properties of clustering algorithms to enhance meta-learning. MC $$^2$$ 2 EC develops a collaborative mechanism based on clustering ensemble to balance the measuring criterion of different CVIs and construct more appropriate clustering model for given datasets. Extensive experiments are conducted on 150 datasets from OpenML to create meta-data and 33 test datasets from three clustering benchmarks to validate the superiority of AutoCluster. The results show the superiority of AutoCluster for building an appropriate clustering model compared with classical clustering algorithms and CASH method.

Yue Liu, Shuang Li, Wenjie Tian
BanditRank: Learning to Rank Using Contextual Bandits

We propose an extensible deep learning method that uses reinforcement learning to train neural networks for offline ranking in information retrieval (IR). We call our method BanditRank as it treats ranking as a contextual bandit problem. In the domain of learning to rank for IR, current deep learning models are trained on objective functions different from the measures they are evaluated on. Since most evaluation measures are discrete quantities, they cannot be used by gradient descent algorithms without approximation. BanditRank bridges this gap by directly optimizing a task specific measure, such as mean average precision (MAP). Specifically, a contextual bandit whose action is to rank input documents is trained using a policy gradient algorithm to directly maximize a reward. The reward can be a single measure, such as MAP, or a combination of several measures. The notion of ranking is also inherent in BanditRank, similar to the current listwise approaches. To evaluate the effectiveness of BanditRank by answering five research questions, we conducted a series of experiments on datasets related to three different tasks, i.e., non-factoid, and factoid question answering and web search. We found that BanditRank performed better than strong baseline methods in respective tasks.

Phanideep Gampa, Sumio Fujita
A Compressed and Accelerated SegNet for Plant Leaf Disease Segmentation: A Differential Evolution Based Approach

SegNet is a Convolution Neural Network (CNN) architecture consisting of encoder and decoder for pixel-wise classification of input images. It was found to give better results than state of the art pixel-wise segmentation of images. In proposed work, a compressed version of SegNet has been developed using Differential Evolution for segmenting the diseased regions in leaf images. The compressed model has been evaluated on publicly available street scene images and potato late blight leaf images from PlantVillage dataset. Using the proposed method a compression of 25x times is achieved on original SegNet and inference time is reduced by 1.675x times without loss in mean IOU accuracy.

Mohit Agarwal, Suneet Kr. Gupta, K. K. Biswas
Meta-context Transformers for Domain-Specific Response Generation

Transformer-based models, such as GPT-2, have revolutionized the landscape of dialogue generation by capturing the long-range structures through language modeling. Though these models have exhibited excellent language coherence, they often lack relevance and terms when used for domain-specific response generation. In this paper, we present DSRNet (Domain Specific Response Network), a transformer-based model for dialogue response generation by reinforcing domain-specific attributes. In particular, we extract meta attributes from context and joinly model with the dialogue context utterances for better attention over domain-specific keyterms and relevance. We study the use of DSRNet in a multi-turn multi-interlocutor environment for domain-specific response generation. In our experiments, we evaluate DSRNet on Ubuntu dialogue datasets, which are mainly composed of various technical domain related dialogues for IT domain issue resolutions and also on CamRest676 dataset, which contains restaurant domain conversations. We observe that the responses produced by our model carry higher relevance due to the presence of domain-specific key attributes that exhibit better overlap with the attributes of the context. Our analysis shows that the performance improvement is mostly due to the infusion of key terms along with dialogues which result in better attention over domain-relevant terms.

Debanjana Kar, Suranjana Samanta, Amar Prakash Azad
A Multi-task Kernel Learning Algorithm for Survival Analysis

Survival analysis aims to predict the occurring times of certain events of interest. Most existing methods for survival analysis either assume specific forms for the underlying stochastic processes or linear hypotheses. To cope with non-linearity in data, we propose a unified framework that combines multi-task and kernel learning for survival analysis. We also develop optimization methods based on the Pegasos (Primal estimated sub-gradient solver for SVM) algorithm for learning. Experiment results demonstrate the effectiveness of the proposed method for survival analysis, on synthetic and real-world data sets.

Zizhuo Meng, Jie Xu, Zhidong Li, Yang Wang, Fang Chen, Zhiyong Wang
Meta-data Augmentation Based Search Strategy Through Generative Adversarial Network for AutoML Model Selection

Automated machine learning (AutoML) attempts to automatically build appropriate learning model for given dataset. Despite the recent progress of meta-learning to find good instantiations for AutoML framework, it is still difficult and time-consuming to collect sufficient meta-data with high quality. Therefore, we propose a novel method named Meta-data Augmentation based Search Strategy (MDASS) for AutoML model selection, which is mainly composed of Meta-GAN Surrogate model (MetaGAN) and Self-Adaptive Meta-model (SAM). MetaGAN employs Generative Adversarial Network as surrogate model to collect effective meta-data based on the limited meta-data, which can alleviate the dilemma of meta-overfitting in meta-learning. Based on augmented meta-data, SAM self-adaptively builds multi-objective meta-model, which can select the algorithms with proper trade-off between learning performance and computational budget. Furthermore, for new datasets, MDASS combines promising algorithms and hyperparameter optimization to perform automated model selection under time constraint. Finally, the experiments on various classification datasets from OpenML and algorithms from scikit-learn are conducted. The results show that GAN is promising to incorporate with AutoML and MDASS can perform better than the competing approaches with time budget.

Yue Liu, Wenjie Tian, Shuang Li
Tree-Capsule: Tree-Structured Capsule Network for Improving Relation Extraction

Relation extraction benefits a variety of applications requiring relational understanding of unstructured texts, such as question answering. Recently, capsule network-based models have been proposed for improving relation extraction with better capability of modeling complex entity relations. However, they fail to capture the syntactic structure information of a sentence which has proven to be useful for relation extraction. In this paper, we propose a Tree-structured Capsule network based model for improving sentence-level Relation Extraction (TCRE), which seamlessly incorporates the syntax tree (Generally, syntax trees include constituent trees and dependency trees.) information (constituent tree is used in this work). Particularly, we design a novel tree-structured capsule network (Tree-Capsule network) to encode the constituent tree. Additionally, an entity-aware routing algorithm for Tree-Capsule network is proposed to pay attention to the critical relevant information, further improving the relation extraction of the target entities. Experimental results on standard datasets demonstrate that our TCRE significantly improves the performance of relation extraction by incorporating the syntactic structure information.

Tianchi Yang, Linmei Hu, Luhao Zhang, Chuan Shi, Cheng Yang, Nan Duan, Ming Zhou
Rule Injection-Based Generative Adversarial Imitation Learning for Knowledge Graph Reasoning

Knowledge graph reasoning is a crucial part of knowledge discovery and knowledge graph completion tasks. The solution based on generative adversarial imitation learning (GAIL) has made great progress in recent researches and solves the problem of relying heavily on the design of the reward function in reinforcement learning-based reasoning methods. However, only the semantic feature is considered in existing GAIL-based methods, which is not enough to assess the quality of reasoning paths. While logical rules contain rich factual logic that can be used for reasoning. Thus, we introduce the first-order predicate logic rule in our model called Rule Injection-based Generative Adversarial Path Reasoning. The key idea is to train the generator to learn reasoning strategies by imitating the demonstration from both semantic and rule levels. Particularly, we design a path discriminator and a logic rule discriminator to distinguish paths respectively from these two levels. Furthermore, both discriminator feedback to the generator a self-adaptively reward by assessing the quality of the generated reasoning path. Extensively experiments on two benchmarks show that our method improves the performance than the state-of-the-art baseline and some cases study also confirmed the explainability of our model.

Sheng Wang, Xiaoying Chen, Shengwu Xiong
Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Wearable sensor based human activity recognition is a challenging problem due to difficulty in modeling spatial and temporal dependencies of sensor signals. Recognition models in closed-set assumption are forced to yield members of known activity classes as prediction. However, activity recognition models can encounter an unseen activity due to body-worn sensor malfunction or disability of the subject performing the activities. This problem can be addressed through modeling solution according to the assumption of open-set recognition. Hence, the proposed self attention based approach combines data hierarchically from different sensor placements across time to classify closed-set activities and it obtains notable performance improvement over state-of-the-art models on five publicly available datasets. The decoder in this autoencoder architecture incorporates self-attention based feature representations from encoder to detect unseen activity classes in open-set recognition setting. Furthermore, attention maps generated by the hierarchical model demonstrate explainable selection of features in activity recognition. We conduct extensive leave one subject out validation experiments that indicate significantly improved robustness to noise and subject specific variability in body-worn sensor signals. The source code is available at: github.com/saif-mahmud/hierarchical-attention-HAR .

M. Tanjid Hasan Tonmoy, Saif Mahmud, A. K. M. Mahbubur Rahman, M. Ashraful Amin, Amin Ahsan Ali
Reinforced Natural Language Inference for Distantly Supervised Relation Classification

Distant supervision (DS) has the advantage of automatically annotate large amounts of data and has been widely used for relation classification. Despite its efficiency, it often suffers from the label noise problem, which would impair the performance of relation classification. Recently, there are two ways to solve the label noise problem. The first way is to use multi-instance learning to consider the noises of instances, but they do not perform well for sentence-level prediction. The second way is to use reinforcement learning or adversarial learning to directly find noisy label instances but with high computational overhead and poor performance. In this paper, we propose to use the natural language inference (NLI) model to evaluate the quality of the instances directly, and select the high-quality instances as refined training data for sentence-level relation classification. Due to the lack of high-quality supervised data, we use reinforcement learning to train the NLI model. Experimental results on two human re-annotated NYT datasets show the effectiveness and efficiency of our method at the sentence-level relation classification. The source code of this paper can be found in https://github.com/xubodhu/RLRC .

Bo Xu, Xiangsan Zhao, Chaofeng Sha, Minjun Zhang, Hui Song
SaGCN: Structure-Aware Graph Convolution Network for Document-Level Relation Extraction

Document-level Relation Extraction(DocRE) aims at extracting semantic relations among entities in documents. However, current models lack long-range dependency information and the reasoning ability to extract essential structure information from the text. In this paper, we propose SaGCN, a Structure-aware Graph Convolution Network, extracting relation with explicit and implicit dependency structure. Specifically, we generate the implicit graph by sampling from a discrete and continuous distribution, then dynamically fuse the implicit soft structure with the dependent hard structure. Experimental results of SaGCN outperform the performance achieved by current state-of-the-art various baseline models on the DocRED dataset.

Shuangji Yang, Taolin Zhang, Danning Su, Nan Hu, Wei Nong, Xiaofeng He
Addressing the Class Imbalance Problem in Medical Image Segmentation via Accelerated Tversky Loss Function

Image segmentation in the medical domain has gained a lot of research interest in recent years with the advancements in deep learning algorithms and related technologies. Medical image datasets are often imbalanced and to handle the imbalance problem, deep learning models are equipped with modified loss functions to effectively penalize the training weights for false predictions and conduct unbiased learning. Recent works have introduced various loss functions suitable for certain scenarios of segmentation. In this paper, we have explored the existing loss functions that are widely used for medical image segmentation, following which an accelerated Tversky loss (ATL) function is proposed that uses log cosh function to better optimize the gradients. The no-new U-Net (nn-Unet) model is adopted as the base model to validate the behaviour of the loss functions by using the standard benchmark segmentation performance metrics. To establish the robustness and effectiveness of the loss functions, multiple datasets are adopted, where ATL function illustrated better performance with faster convergence and better mask generation.

Nikhil Nasalwai, Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali Agarwal
Incorporating Relational Knowledge in Explainable Fake News Detection

The greater public has become aware of the rising prevalence of untrustworthy information in online media. Extensive adaptive detection methods have been proposed for mitigating the adverse effect of fake news. Computational methods for detecting fake news based on the news content have several limitations, such as: 1) Encoding semantics from original texts is limited to the structure of the language in the text, making both bag-of-words and embedding-based features deceptive in the representation of a fake news, and 2) Explainable methods often neglect relational contexts in fake news detection. In this paper, we design a knowledge graph enhanced framework for effectively detecting fake news while providing relational explanation. We first build a credential-based multi-relation knowledge graph by extracting entity relation tuples from our training data and then apply a compositional graph convolutional network to learn the node and relation embeddings accordingly. The pre-trained graph embeddings are then incorporated into a graph convolutional network for fake news detection. Through extensive experiments on three real-world datasets, we demonstrate the proposed knowledge graph enhanced framework has significant improvement in terms of fake news detection as well as structured explainability.

Kun Wu, Xu Yuan, Yue Ning
Incorporating Syntactic Information into Relation Representations for Enhanced Relation Extraction

Relation Extraction (RE) is a premier task of information extraction (IE) and crucial to many applications including knowledge graph completion (KGC). In recent years, some RE models have employed the topic knowledge of relations through topic words to enrich relation representations, demonstrating better performance than traditional distantly supervised paradigms. However, these models have not taken different syntactic information of relations into account, which have been proven significant in many NLP tasks. In this paper, we propose a novel RE pipeline which incorporates syntactic information into relation representations to enhance RE performance. Representations of sentence and relation in our pipeline are generated by a modified multi-head self-attention structure respectively, where the sentence is represented based on its words and the relation is represented based on the relation-specific embeddings of its topic words. Furthermore, all sentences labeled with the input relation are used to construct an entire weighted directed graph based on their dependency trees. Then, the relation-specific embeddings of words (nodes) in the graph are learned by a GCN-based model. Our extensive experiments have justified that our pipeline significantly outperforms other RE models thanks to the incorporation of syntactic information.

Li Cui, Deqing Yang, Jiayang Cheng, Yanghua Xiao
Backmatter
Metadaten
Titel
Advances in Knowledge Discovery and Data Mining
herausgegeben von
Prof. Kamal Karlapalem
Hong Cheng
Naren Ramakrishnan
R. K. Agrawal
P. Krishna Reddy
Dr. Jaideep Srivastava
Assist. Prof. Tanmoy Chakraborty
Copyright-Jahr
2021
Electronic ISBN
978-3-030-75768-7
Print ISBN
978-3-030-75767-0
DOI
https://doi.org/10.1007/978-3-030-75768-7