Skip to main content
Top

2021 | Book

Advances in Knowledge Discovery and Data Mining

25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, May 11–14, 2021, Proceedings, Part II

Editors: Prof. Kamal Karlapalem, Hong Cheng, Naren Ramakrishnan, R. K. Agrawal, P. Krishna Reddy, Dr. Jaideep Srivastava, Assist. Prof. Tanmoy Chakraborty

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The 3-volume set LNAI 12712-12714 constitutes the proceedings of the 25th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2021, which was held during May 11-14, 2021.

The 157 papers included in the proceedings were carefully reviewed and selected from a total of 628 submissions. They were organized in topical sections as follows:

Part I: Applications of knowledge discovery and data mining of specialized data;

Part II: Classical data mining; data mining theory and principles; recommender systems; and text analytics;

Part III: Representation learning and embedding, and learning from data.

Table of Contents

Frontmatter

Classical Data Mining

Frontmatter
Mining Frequent Patterns from Hypergraph Databases

Hypergraph is a complex data structure capable of expressing associations among any number of data entities. Overcoming the limitations of traditional graphs, hypergraphs are useful to model real-life problems. Frequent pattern mining is one of the most popular problems in data mining with a lot of applications. To the best of our knowledge, there exists no flexible frequent pattern mining framework for hypergraph databases decomposing associations among data entities. In this work, we propose a flexible and complete framework for mining frequent patterns from a collection of hypergraphs. We also develop an algorithm for mining frequent subhypergraphs by introducing a canonical labeling technique for isomorphic subhypergraphs. Experiments conducted on real-life hypergraph databases demonstrate both the efficiency of the algorithm and the effectiveness of the proposed framework.

Md. Tanvir Alam, Chowdhury Farhan Ahmed, Md. Samiullah, Carson K. Leung
Discriminating Frequent Pattern Based Supervised Graph Embedding for Classification

Graph is used to represent various complex relationships among objects and data entities. One of the emerging and important problems is graph classification that has tremendous impacts on various real-life applications. A good number of approaches have been proposed for graph classification using various techniques where graph embedding is one of them. Here we propose an approach for classifying graphs by mining discriminating frequent patterns from graphs to learn vector representation of the graphs. The proposed supervised embedding technique produces high-quality entire graph embedding for classification utilizing the knowledge from the labeled examples available. The experimental analyses, conducted on various real-life benchmark datasets, found that the proposed approach is significantly better in terms of accuracy in comparison to the state-of-the-art techniques.

Md. Tanvir Alam, Chowdhury Farhan Ahmed, Md. Samiullah, Carson K. Leung
Mining Sequential Patterns in Uncertain Databases Using Hierarchical Index Structure

In this uncertain world, data uncertainty is inherent in many applications and its importance is growing drastically due to the rapid development of modern technologies. Nowadays, researchers have paid more attention to mine patterns in uncertain databases. A few recent works attempt to mine frequent uncertain sequential patterns. Despite their success, they are incompetent to reduce the number of false-positive pattern generation in their mining process and maintain the patterns efficiently. In this paper, we propose multiple theoretically tightened pruning upper bounds that remarkably reduce the mining space. A novel hierarchical structure is introduced to maintain the patterns in a space-efficient way. Afterward, we develop a versatile framework for mining uncertain sequential patterns that can effectively handle weight constraints as well. Besides, with the advent of incremental uncertain databases, existing works are not scalable. There exist several incremental sequential pattern mining algorithms, but they are limited to mine in precise databases. Therefore, we propose a new technique to adapt our framework to mine patterns when the database is incremental. Finally, we conduct extensive experiments on several real-life datasets and show the efficacy of our framework in different applications.

Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, Carson K. Leung
Similarity Forests Revisited: A Swiss Army Knife for Machine Learning

Random Forests are one of the most reliable and robust general-purpose machine learning algorithms. They provide very competitive baselines for more complex algorithms. Recently, a new algorithm has been introduced into the family of decision tree learners – Similarity Forests, aiming at mitigating some of the well-known deficiencies of Random Forests. In this paper we extend the originally proposed Similarity Forests algorithm to one-class classification, multi-class classification, regression and metric learning tasks. We also introduce two new criteria for split evaluation in regression learning. The results of conducted experiments show that Similarity Forests can be a competitive alternative to Random Forests, in particular, when high quality data representation is difficult to obtain.

Stanisław Czekalski, Mikołaj Morzy
Discriminative Representation Learning for Cross-Domain Sentiment Classification

Cross-domain sentiment classification aims to solve the lack of labeled data in the target domain by using the knowledge of the source domain. Most existing approaches mainly focus on learning transferable feature representations for knowledge transfer across domains. Few of them pay attention to the feature discriminability, which contributes to distinguish different sentiment polarity and improves the classification accuracy. In this work, we propose discriminative representation learning, which extracts transferable and discriminative features. Specifically, we use spectral clustering to reduce the negative effect of low prediction accuracy on the target domain. Centroid alignment enforces samples of the same polarity with smaller distance in the feature space and enlarges the difference between samples of different polarities. Then intra-class compactness benefits true centroid by reducing samples distributed at the edges of the clusters. Experiments on the multiple public datasets demonstrate that discriminative representation learning outperforms state-of-the-art methods.

Shaokang Zhang, Lei Jiang, Huailiang Peng, Qiong Dai, Jianlong Tan
SAGCN: Towards Structure-Aware Deep Graph Convolutional Networks on Node Classification

Graph Convolutional Networks (GCNs) have recently achiev-ed impressive performance in different classification tasks. However, over-smoothing remains a fundamental burden to achieve deep GCNs for node classification. This paper proposes Structure-Aware Deep Graph Convolutional Networks (SAGCN), a novel model to overcome this burden. At its core, SAGCN separates the initial node features from propagation and directly maps them to the output at each layer. Furthermore, SAGCN selectively aggregates the information from different propagation layers to generate structure-aware node representations, where the attention mechanism is exploited to adaptively balance the information from local and global neighborhoods for each node. Our experiments verify that the SAGCN model achieves state-of-the-art performance in various semi-supervised and full-supervised node classification tasks. More importantly, it outperforms many other backbone models, by using half the number of layers, or even fewer layers.

Ming He, Tianyu Ding, Tianshuo Han
Hierarchical Learning of Dependent Concepts for Human Activity Recognition

In multi-class classification tasks, like human activity recognition, it is often assumed that classes are separable. In real applications, this assumption becomes strong and generates inconsistencies. Besides, the most commonly used approach is to learn classes one-by-one against the others. This computational simplification principle introduces strong inductive biases on the learned theories. In fact, the natural connections among some classes, and not others, deserve to be taken into account. In this paper, we show that the organization of overlapping classes (multiple inheritances) into hierarchies considerably improves classification performances. This is particularly true in the case of activity recognition tasks featured in the SHL dataset. After theoretically showing the exponential complexity of possible class hierarchies, we propose an approach based on transfer affinity among the classes to determine an optimal hierarchy for the learning process. Extensive experiments show improved performances and a reduction in the number of examples needed to learn.

Aomar Osmani, Massinissa Hamidi, Pegah Alizadeh
Improving Short Text Classification Using Context-Sensitive Representations and Content-Aware Extended Topic Knowledge

Most existing short text classification models suffer from poor performance because of the information sparsity of short texts and the polysemous class-bearing words. To alleviate these issues, we propose a context-sensitive topic memory network (cs-TMN) by learning context-sensitive text representations and content-aware extended topic knowledge. Different from TMN that utilizes context-independent word embedding and extended topic knowledge, we further employ context-sensitive word embedding, comprised of local context representation and global context representation to alleviate the polysemous issue. Besides, extended topic knowledge matched by context-sensitive word embedding is proven content-aware in comparison with previous works. Empirical results demonstrate the effectiveness of our cs-TMN, outperforming state-of-the-art models on short text classification on four public datasets.

Zhihao Ye, Rui Wen, Xi Chen, Ye Liu, Ziheng Zhang, Zhiyong Li, Ke Nai, Yefeng Zheng
A Novel Method for Offline Handwritten Chinese Character Recognition Under the Guidance of Print

In this paper, we present a new method that views offline handwritten chinese character recognition (HCCR) as a Re-identification (ReID) task. We introduce a print dataset as the target that needs to be retrieved, and make the test set of offline HCCR as the object of interest. According to ReID’s scene, the goal is to find the most similar print sample as the prediction result for each object of interest. We also employ triplet loss for metric learning, and train model together with cross-entropy loss, which has a good effect on improving performance. Compared with the classification model, the experimental results show that our method achieves much better results in few-shot learning, whose dataset is randomly selected from overall datasets. When the training set used is 5% of HWDB1.1, the gap between them even reached 9.8%. At the same time, it also obtains an accuracy of 97.69% on ICDAR-2013 offline HCCR competition dataset.

Keping Yan, Jun Guo, Weiqing Zhou
Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition

Speech emotion recognition (SER) plays a vital role in natural interaction between humans and machines. However, due to the complexity of human emotions, the features learned in existing researches contain a large amount of redundant information that has nothing to do with emotions, which reduces the performance of SER. To alleviate the problem, in this paper we propose a novel model, named as Upgraded Attention-based Local Feature Learning Block (UA-LFLB). Concretely, the LFLB is used to extract deep local sequence features and as input to the UA mechanism to capture the salient features of the discourse level with contextual information. In doing this, more accurate and discriminative features can be learned, which greatly reduces redundant information in the features. To evaluate the feasibility of the proposed model, We conduct experiments on a widely used emotional database. Experimental results show that the proposed model outperforms the state-of-the-art methods on the IEMOCAP database and achieving 9% improvement in terms of average accuracy.

Huan Zhao, Yingxue Gao, Yufeng Xiao
Memorization in Deep Neural Networks: Does the Loss Function Matter?

Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether choice of loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function as opposed to either cross entropy or squared error loss results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.

Deep Patel, P. S. Sastry
Gaussian Soft Decision Trees for Interpretable Feature-Based Classification

How can we accurately classify feature-based data such that the learned model and results are more interpretable? Interpretability is beneficial in various perspectives, such as in checking for compliance with exiting knowledge and gaining insights from decision processes. To gain in both accuracy and interpretability, we propose a novel tree-structured classifier called Gaussian Soft Decision Trees (GSDT). GSDT is characterized by multi-branched structures, Gaussian mixture-based decisions, and a hinge loss with path regularization. The three key features make it learn short trees where the weight vector of each node is a prototype for data that mapped to the node. We show that GSDT results in the best average accuracy compared to eight baselines. We also perform an ablation study of the various structures of covariance matrix in the Gaussian mixture nodes in GSDT and demonstrate the interpretability of GSDT in a case study of classification in a breast cancer dataset.

Jaemin Yoo, Lee Sael
Efficient Nodes Representation Learning with Residual Feature Propagation

Graph Convolutional Networks (GCN) and their variants have achieved brilliant results in graph representation learning. However, most existing methods cannot be utilized for deep architectures and can only capture the low order proximity in networks. In this paper, we have proposed a Residual Simple Graph Convolutional Network (RSGCN), which can aggregate information from distant neighbor node features without over-smoothing and vanishing gradients. Given that node features of the same class have certain similarity, a weighted feature propagation is considered to ensure effective information aggregation by giving higher weights to similar neighbor nodes. Experimental results on several datasets of node classification demonstrate the proposed methods outperform the state-of-the-art methods in terms of effectiveness and efficiency.

Fan Wu, Duantengchuan Li, Ke Lin, Huawei Zhang
Progressive AutoSpeech: An Efficient and General Framework for Automatic Speech Classification

Speech classification has been widely used in many speech-related applications. However, the complexity of speech classification tasks often exceeds the scope of non-experts, the off-the-shelf speech classification methods are urgently needed. Recently, the automatic speech classification (AutoSpeech) without any human intervention has attracted more and more attention. The practical AutoSpeech solution should be general and can automatically handle classification tasks from different domains. Moreover, AutoSpeech should improve not only the final performance but also the any-time performance especially when the time budget is limited. To address these issues, we propose a three-stage any-time learning algorithm framework called Progressive AutoSpeech for automatic speech classification under a given time budget. Progressive AutoSpeech consists of the fast stage, enhancement stage, and exploration stage. Each stage uses different models and features to ensure generalization. Additionally, we automatically construct ensembles of top-k prediction results to improve the robustness. The experimental results reveal that Progressive AutoSpeech is effective and efficient for a wide range of speech classification tasks and can achieve the best ALC score.

Guanghui Zhu, Feng Cheng, Mengchuan Qiu, Zhuoer Xu, Wenjie Wang, Chunfeng Yuan, Yihua Huang
CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data

Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or differing annotations. Co-teaching methods have shown promising improvements for computer vision problems with noisy labels by employing two classifiers trained on each others’ confident samples in each batch. Inspired by the idea of separating confident and uncertain samples during the training process, we extend it for the crowdsourcing problem. Our model, CrowdTeacher, uses the idea that perturbation in the input space model can improve the robustness of the classifier for noisy labels. Treating crowdsourcing annotations as a source of noisy labeling, we perturb samples based on the certainty from the aggregated annotations. The perturbed samples are fed to a Co-teaching algorithm tuned to also accommodate smaller tabular data. We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets across various label density settings. Our experiments reveal that our proposed approach beats baselines modeling individual annotations and then combining them, methods simultaneously learning a classifier and inferring truth labels, and the Co-teaching algorithm with aggregated labels through common truth inference methods.

Mani Sotoodeh, Li Xiong, Joyce Ho
Effective and Adaptive Refined Multi-metric Similarity Graph Fusion for Multi-view Clustering

Multi-view graph-based clustering aims to partition samples via fusing similarity graphs from different views into a unified graph. The clustering performance relies on the accuracy of similarity measurement. However, most existing methods utilize a single metric whose similarity measurement can be easily corrupted by noises thus lacking high accuracy and generalization capability. We propose an effective multi-metric similarity graph refinement and fusion method for multi-view clustering. We construct multiple similarity graphs for each view by different metric, exploit a novel refined similarity through symmetric conditional probability to preserve the important similarity information and finally adaptively fuse multiple refined similarity graphs to an informative unified one. Extensive experiments on eight benchmark datasets have validated the effectiveness and superiority of our proposed method comparing to thirteen state-of-the-art methods.

Wentao Rong, Enhong Zhuo, Guihua Tao, Hongmin Cai
aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks

For deep neural networks (DNNs), a high model accuracy is usually the main focus. However, millions of model parameters commonly lead to high space overheads, especially parameter redundancy. By maintaining network weights with less bit-widths, network quantization has been used to compress DNNs for lower space costs. However, existing quantization methods cannot well optimally balance the model size and the accuracy, thus they suffer from the accuracy loss more or less. Besides, though few of existing quantization techniques can adaptively determine layers quantization bit-widths, they either give little consideration on the relations of different DNN layers, or are designed for special hardware environment that are not universal in broad computer fields. To overcome these issues, we propose an adaptive Hierarchical Clustering based Quantization (aHCQ) framework. The aHCQ can find a largely compressed model from the quantization of each layer and take only little loss on the model accuracy. It is shown from the experiments that the aHCQ can achieve $$11.4\times $$ 11.4 × and $$8.2\times $$ 8.2 × model compression rates with only around $$0.5\%$$ 0.5 % drop of the model accuracy.

Jiaxin Hu, Weixiong Rao, Qinpei Zhao
Maintaining Consistency with Constraints: A Constrained Deep Clustering Method

Constrained clustering has been intensively explored in the data mining. Popular clustering algorithms such as k-means and spectral clustering are combined with prior knowledge to guide the clustering process. Recently, constrained clustering with deep neural network gains superior performance by jointly learning cluster-oriented feature representations and cluster assignments simultaneously. However, these methods face a common issue that they have poor performance when only minimal constraints are available because of their single way to mine constraint information. In this paper, we propose an end-to-end clustering method that learns unsupervised information and constraint information in two consecutive modules: an unsupervised clustering module to obtain feature representations and cluster assignments followed by a constrained clustering module to tune them. The constrained clustering module is composed of a Siamese or triplet network to maintain consistency with constraints. To capture more information from minimal constraints, the consistency is maintained from two perspective simultaneously: embedding space distance and cluster assignments. Extensive experiments on both pairwise and triplet constrained clustering validate the effectiveness of the proposed algorithm.

Yi Cui, Xianchao Zhang, Linlin Zong, Jie Mu

Data Mining Theory and Principles

Frontmatter
Towards Multi-label Feature Selection by Instance and Label Selections

In multi-label learning, feature and instance selection represent two effective dimensionality reduction techniques, which remove noise, irrelevant and redundant entries from original data for easy later analysis, such as clustering and classification. Label selection also plays a fundamental role in the pre-processing step since label-noises could negatively affect the performance of the underlying learning algorithms. The literature has been mainly limited to feature and/or instance selection, but has somewhat overlooked label selection. In this paper, we introduce, for the first time, a combination of the three selection techniques (feature, instance and label) for multi-label learning. We propose an efficient convex optimization based algorithm that evaluates the usefulness of features, instances and labels in order to select the most relevant ones, simultaneously. Experimental results on some known benchmark datasets are presented to demonstrate the performance of the proposed method.

Dou El Kefel Mansouri, Khalid Benabdeslem
FARF: A Fair and Adaptive Random Forests Classifier

As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between accuracy and fairness that needs to be accounted for, but current methods often have multiple hyper-parameters with non-trivial interaction to achieve fairness. In this paper, we propose a flexible ensemble algorithm for fair decision-making in the more challenging context of evolving online settings. This algorithm, called FARF (Fair and Adaptive Random Forests), is based on using online component classifiers and updating them according to the current distribution, that also accounts for fairness and a single hyper-parameters that alters fairness-accuracy balance. Experiments on real-world discriminated data streams demonstrate the utility of FARF.

Wenbin Zhang, Albert Bifet, Xiangliang Zhang, Jeremy C. Weiss, Wolfgang Nejdl
Sparse Spectrum Gaussian Process for Bayesian Optimization

We propose a novel sparse spectrum approximation of Gaussian process (GP) tailored for Bayesian optimization (BO). Whilst the current sparse spectrum methods provide desired approximations for regression problems, it is observed that this particular form of sparse approximations generates an overconfident GP, i.e., it produces less epistemic uncertainty than the original GP. Since the balance between the predictive mean and variance is the key determinant to the success of BO, the current methods are less suitable for BO. We derive a new regularized marginal likelihood for finding the optimal frequencies to fix this overconfidence issue, particularly for BO. The regularizer trades off the accuracy in the model fitting with targeted increase in the predictive variance of the resultant GP. Specifically, we use the entropy of the global maximum distribution (GMD) from the posterior GP as the regularizer that needs to be maximized. Since the GMD cannot be calculated analytically, we first propose a Thompson sampling based approach and then a more efficient sequential Monte Carlo based approach to estimate it. Later, we also show that the Expected Improvement acquisition function can be used as a proxy for it, thus making the process further efficient.

Ang Yang, Cheng Li, Santu Rana, Sunil Gupta, Svetha Venkatesh
Densely Connected Graph Attention Network Based on Iterative Path Reasoning for Document-Level Relation Extraction

Document-level relation extraction is a challenging task in Natural Language Processing, which extracts relations expressed with one or multiple sentences. It plays an important role in data mining and information retrieval. The key challenge comes from the indirect relations expressed across sentences. Graph-based neural networks have been proved effective for modeling structural information among the document. Existing methods enhance the graph models by using either the attention mechanism or the iterative path reasoning, which is not enough to capture all the effective structural information. In this paper, we propose a densely connected graph attention network based on iterative path reasoning (IPR-DCGAT) for document-level relation extraction. Our approach uses densely connected graph attention network to model the local and global information among the document. In addition, we propose to learn dynamic path weights for reasoning relations across sentences. Extensive experiments on three datasets demonstrate the effectiveness of our approach. Our model achieves 84% F1 score on CDR, which is about 16.3%–22.5% higher than previous models with a significant margin. Meanwhile, the results of our approach are also comparably superior to the state-of-the-art results on the GDA and DocRED dataset.

Hongya Zhang, Zhen Huang, Zhenzhen Li, Dongsheng Li, Feng Liu
Causal Inference Using Global Forecasting Models for Counterfactual Prediction

This research proposes a global forecasting and inference method based on recurrent neural networks (RNN) to predict policy interventions’ causal effects on an outcome over time through the counterfactual approach. The traditional univariate methods that operate within the well-established synthetic control method have strong linearity assumptions in the covariates. This has recently been addressed by successfully using univariate RNNs for this task. We use an RNN trained not univariately per series but globally across all time series, which allows us to model treated and control time series simultaneously over the pre-treatment period. Therewith, we do not need to make equivalence assumptions between distributions of the control and treated outcomes in the pre-treatment period. This allows us to achieve better accuracy and precisely isolate the effect of an intervention. We compare our novel approach with local univariate approaches on two real-world datasets on 1) how policy changes in Alcohol outlet licensing affect emergency service calls, and 2) how COVID19 lockdown measures affect emergency services use. Our results show that our novel method can outperform the accuracy of state-of-the-art predictions, thereby estimating the size of a causal effect more accurately. The experimental results are statistically significant, indicating our framework generates better counterfactual predictions.

Priscila Grecov, Kasun Bandara, Christoph Bergmeir, Klaus Ackermann, Sam Campbell, Deborah Scott, Dan Lubman
CED-BGFN: Chinese Event Detection via Bidirectional Glyph-Aware Dynamic Fusion Network

Event Detection is an essential task in information extraction. However, most existing studies on event detection are designed for English text. There is still a lack of efficient algorithm for Chinese event detection, which is expected to be greatly improved. Recent work has shown that enhanced text representation, such as introducing glyph information, can significantly improve downstream tasks in natural language processing. In this paper, we propose a novel method for Chinese Event Detection via Bidirectional Glyph-aware Dynamic Fusion Network, called CED-BGFN. We use two representations: glyph-aware information and pre-trained language model. To integrate the heterogeneous representation modules, we propose a creative fusion network Bidirectional Glyph-aware Fusion Network, named BGFN. Considering the dynamic interaction of the two expressions, BGFN adaptively learns the fusion weights for the downstream event detection task. We conduct extensive experiments to investigate the validity of the proposed method on the ACE 2005 Chinese corpus. Results demonstrate that compared with the previous state-of-the-art methods, our approach obtains transcendent performance in both event trigger identification task and classification task, with an increase of 5.48 (7.46%) and 5.03 (7.1%) in F1-score, respectively.

Qi Zhai, Zhigang Kan, Sen Yang, Linbo Qiao, Feng Liu, Dongsheng Li
Learning Finite Automata with Shuffle

Learning finite automata has been a popular topic. Shuffle has been applied in information systems. Since shuffle introduced into finite automata makes the membership problem NP-hard, and there are no learning algorithms for finite automata supporting shuffle so far, it is an essential work to devise effective and precise algorithms for learning finite automata supporting shuffle. In this paper, finite automata are learned from sets of positive samples. First, we define finite automata with shuffle (FA(&)s), for which both the uniform and the non-uniform membership problem are decidable in polynomial time. Then, we learn an FA(&) from a given finite sample step by step. Our algorithm can ensure that the learned FA(&) is a precise representation of the given finite sample. Experimental results demonstrate that, FA(&) is more efficient in membership checking, and our algorithm can obtain a more concise automaton.

Xiaofan Wang
Active Learning Based Similarity Filtering for Efficient and Effective Record Linkage

The limited analytical value of using individual databases on their own increasingly requires the integration of large and complex databases for advanced data analytics. Linking personal medical records with travel and immigration data, for example, will allow the effective management of pandemics such as the current COVID-19 outbreak by tracking potentially infected individuals and their contacts. One major challenge for accurate linkage of large databases is the quadratic or even higher computational complexities of many advanced linkage algorithms. In this paper we present a novel approach that, based on the expected number of true matches between two databases, applies active learning to remove compared record pairs that are likely non-matches before a computationally expensive classification or clustering algorithm is employed to classify record pairs. Unlike blocking and indexing techniques that are used to reduce the number of record pairs to be compared, using recursive binning on a data dimension such as time or space, our approach removes likely non-matching record pairs in each bin after their comparison. Experiments on two real-world databases show that similarity filtering can substantially reduce run time and improve precision, at the costs of a small reduction in recall, of the final linkage results.

Charini Nanayakkara, Peter Christen, Thilina Ranbaduge
Stratified Sampling for Extreme Multi-label Data

Extreme multi-label classification (XML) is becoming increasingly relevant in the era of big data. Yet, there is no method for effectively generating stratified partitions of XML datasets. Instead, researchers typically rely on provided test-train splits that, 1) aren’t always representative of the entire dataset, and 2) are missing many of the labels. This can lead to poor generalization ability and unreliable performance estimates, as has been established in the binary and multi-class settings. As such, this paper presents a new and simple algorithm that can efficiently generate stratified partitions of XML datasets with millions of unique labels. We also examine the label distributions of prevailing benchmark splits, and investigate the issues that arise from using unrepresentative subsets of data for model development. The results highlight the difficulty of stratifying XML data, and demonstrate the importance of using stratified partitions for training and evaluation.

Maximillian Merrillees, Lan Du
Vertical Federated Learning for Higher-Order Factorization Machines

In the real world, multiple parties sometimes have different data of common instances, e.g., a customer of a supermarket can be a patient of a hospital. In other words, datasets are sometimes vertically partitioned into multiple parties. In such a situation, it is natural for those parties to collaborate to obtain more accurate prediction models; however, sharing their raw data should be prohibitive from the point of view of privacy-preservation. Federated learning has recently attracted the attention of machine learning researchers as a framework for efficiently collaborative learning of predictive models among multiple parties with privacy-preservation. In this paper, we propose a lossless vertical federated learning (VFL) method for higher-order factorization machines (HOFMs). HOFMs take into feature combinations efficiently and effectively and have succeeded in many tasks, especially recommender systems, link predictions, and natural language processing. Although it is intuitively difficult to evaluate and learn HOFMs without sharing raw feature vectors, our generalized recursion of ANOVA kernels enables us to do it. We also propose a more efficient and robust VFL method for HOFMs based on anonymization by clustering. Experimental results on three real-world datasets show the effectiveness of the proposed method .

Kyohei Atarashi, Masakazu Ishihata
dK-Projection: Publishing Graph Joint Degree Distribution with Node Differential Privacy

Network data has great significance for commercial and research purposes. However, most networks contain sensitive information about individuals, thereby requiring privacy-preserving mechanisms to publish network data while preserving data utility. In this paper, we study the problem of publishing higher-order network statistics, i.e., joint degree distribution, under strong mathematical guarantees of node differential privacy. This problem is known to be challenging, since even simple network statistics (e.g., edge count) can be highly sensitive to adding or removing a single node in a network. To address this challenge, we propose a general framework of publishing dK-distributions under node differential privacy, and develop a novel graph projection algorithm to transform graphs to $$\theta $$ θ -bounded graphs for controlled sensitivity. We have conducted experiments to verify the utility enhancement and privacy guarantee of our proposed framework on four real-world networks. To the best of our knowledge, this is the first study to publish higher-order network statistics under node differential privacy, while enhancing network data utility.

Masooma Iftikhar, Qing Wang

Recommender Systems

Frontmatter
Improving Sequential Recommendation with Attribute-Augmented Graph Neural Networks

Many practical recommender systems provide item recommendation for different users only via mining user-item interactions but totally ignoring the rich attribute information of items that users interact with. In this paper, we propose an attribute-augmented graph neural network model named Murzim. Murzim takes as input the graphs constructed from the user-item interaction sequences and corresponding item attribute sequences. By combining the GNNs with node aggregation and an attention network, Murzim can capture user preference patterns, generate embeddings for user-item interaction sequences, and then generate recommendations through next-item prediction. We conduct extensive experiments on multiple datasets. Experimental results show that Murzim outperforms several state-of-the-art methods in terms of recall and MRR, which illustrates that Murzim can make use of item attribute information to produce better recommendations. At present, Murzim has been deployed in MX Player, one of India’s largest streaming platforms, and is recommending videos for tens of thousands of users.

Xinzhou Dong, Beihong Jin, Wei Zhuo, Beibei Li, Taofeng Xue
Exploring Implicit Relationships in Social Network for Recommendation Systems

Online social platforms have provided a large amount of available information to recommendation systems. With this intuition, social recommendation systems emerged and have attracted increasing attention over the past years. Most existing social recommendation methods only use explicit social relationships among users. However, implicit social relationships can effectively improve the quality of recommendation when users only have few social relationships. To this end, the discovery of implicit relations among users plays a central role in advancing social recommendation. In this paper, we propose a novel approach to fuse direct and indirect friends toward discovering more accurate social recommendation method. We learn users’ preferences by carefully integrating users’ direct and indirect friends. In particular, we construct item rankings based on the feedback from users’ direct and indirect friends on the item. Furthermore, to distinguish the impact of users’ direct friends and indirect friends, we also extend the ranking assumption in item domain to user domain, so that information from user rankings can be leveraged to further improve the recommendation performance. Extensive experiments on two real-world datasets demonstrate the effectiveness of the proposed method.

Yunhe Wei, Huifang Ma, Ruoyi Zhang, Zhixin Li, Liang Chang
Transferable Contextual Bandits with Prior Observations

Cross-domain recommendations have long been studied in traditional recommender systems, especially to solve the cold-start problem. Although recent approaches to dynamic personalized recommendation have leveraged the power of contextual bandits to benefit from the exploitation-exploration paradigm, very few works have been conducted on cross-domain recommendation in this setting. We propose a novel approach to solve the cold-start problem under the contextual bandit setting through the cross-domain approach. Our developed algorithm, T-LinUCB, takes advantage of prior recommendation observations from multiple domains to initialize the new arms’ parameters so as to circumvent the lack of data arising from the cold-start problem. Our bandits therefore possess knowledge upon starting which yields better recommendation and faster convergence. We provide both a regret analysis and an experimental evaluation. Our approach outperforms the baseline, LinUCB, and experiment results demonstrate the benefits of our model.

Kevin Labille, Wen Huang, Xintao Wu
Modeling Hierarchical Intents and Selective Current Interest for Session-Based Recommendation

Session-based recommendation is a challenging problem due to the limited session data. In the real scene, there are two insights in sessions: (1) Hierarchical intents: the implicit hierarchy in user preference is a common phenomenon, since users usually click a specific item with a general intent. (2) The influence of the current interest: the items that users click in order have sequence dependencies, and the next item is affected by the current operation. However, recent approaches are all inherently flat and neglect the hierarchical intents. Besides, they neglect the truly related subsequence for modeling the current interest. This can lead to inaccurate user intents, and fail when the user’s next click tendency falls into the more general intent. In this paper, we propose a method modeling from both Hierarchical Intents and Selective Sequential Interests (HISSI). Methodologically, we design a general intent abstractor to extract the common features and transmit general intents through the hierarchy to form fine-to-coarse grained intents. In addition, a selector-GRU is proposed to model the user’s subsequence behavior that is related to the last click without noises. Extensive experiments on three real-world datasets verify our model’s effectiveness.

Mengfei Zhang, Cheng Guo, Jiaqi Jin, Mao Pan, Jinyun Fang
A Finetuned Language Model for Recommending cQA-QAs for Enriching Textbooks

Textbooks play a vital role in any educational system, despite their clarity and information, students tend to use community question answers (cQA) forums to acquire more knowledge. Due to the high data volume, the quality of Question-Answers (QA) of cQA forums can differ greatly, so it takes additional effort to go through all possible QA pairs for a better insight. This paper proposes an “sentence-level text enrichment system” where the fine-tuned BERT (Bidirectional Encoder Representations from Transformers) summarizer understands the given text, picks out the important sentence, and then rearranged them to give the overall summary of the text document. For each important sentence, we recommend the relevant QA pairs from cQA to make the learning more effective. In this work, we fine-tuned the pre-trained BERT model to extract the relevant QA sets that are most relevant for enriching important sentences of the textbook. We notice that fine-tuning the BERT model significantly improves the performance for QA selection and find that it outperforms existing RNN-based models for such tasks. We also investigate the effectiveness of our fine-tuned BERT $$_\mathrm{Large}$$ Large model on three cQA datasets for the QA selection task and observed a maximum improvement of 19.72% compared to the previous models. Experiments have been carried out on NCERT (Grade IX and X) Textbooks from India and “Pattern Recognition and Machine Learning” Textbook. The extensive evaluation methods demonstrate that the proposed model offers more precise and relevant recommendations in comparison to the state-of-the-art methods.

Shobhan Kumar, Arun Chauhan
XCrossNet: Feature Structure-Oriented Learning for Click-Through Rate Prediction

Click-Through Rate (CTR) prediction is a core task in nowadays commercial recommender systems. Feature crossing, as the mainline of research on CTR prediction, has shown a promising way to enhance predictive performance. Even though various models are able to learn feature interactions without manual feature engineering, they rarely attempt to individually learn representations for different feature structures. In particular, they mainly focus on the modeling of cross sparse features but neglect to specifically represent cross dense features. Motivated by this, we propose a novel Extreme Cross Network, abbreviated XCrossNet, which aims at learning dense and sparse feature interactions in an explicit manner. XCrossNet as a feature structure-oriented model leads to a more expressive representation and a more precise CTR prediction, which is not only explicit and interpretable, but also time-efficient and easy to implement. Experimental studies on Criteo Kaggle dataset show significant improvement of XCrossNet over state-of-the-art models on both effectiveness and efficiency.

Runlong Yu, Yuyang Ye, Qi Liu, Zihan Wang, Chunfeng Yang, Yucheng Hu, Enhong Chen
Learning Multiclass Classifier Under Noisy Bandit Feedback

This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $$O(\sqrt{T})$$ O ( T ) in the high noise case and of the order of $$O(T^{\nicefrac {2}{3}})$$ O ( T 2 3 ) in the worst case. We show our approach’s effectiveness using extensive experiments on several benchmark datasets.

Mudit Agarwal, Naresh Manwani
Diversify or Not: Dynamic Diversification for Personalized Recommendation

Diversity is believed to be an essential factor in improving user satisfaction in recommender systems, while how to take advantage of it has long been a problem worth exploring. Existing work either ignores the influence of diversity or overlooks users’ different diversity demands in recommendations. In this study, we analyze users’ behaviors on a real-world dataset collected from an e-commerce website and find that the demand for diversity changes among different users, even the same user’s demand varies among different shopping scenarios. There is also evidence that users’ behaviors are affected by the diversity of impressions, which has been often ignored by traditional session-based recommendation models. Then, we propose a Dynamic Diversification Recommendation Model (DDRM) with the integration of both click and impression diversities to better make use of diversity for recommendations. Extensive experimental results demonstrate that DDRM outperforms all baseline methods significantly. Further studies show our model can provide more precise and reasonable recommendations.

Bin Hao, Min Zhang, Cheng Guo, Weizhi Ma, Yiqun Liu, Shaoping Ma
Multi-criteria and Review-Based Overall Rating Prediction

An overall rating cannot reveal the details of user’s preferences toward each feature of a product. One widespread practice of e-commerce websites is to provide ratings on predefined aspects of the product and user-generated reviews. Most recent multi-criteria works employ aspect preferences of users or user reviews to understand the opinions and behavior of users. However, these works fail to learn how users correlate these information sources when users express their opinion about an item. In this work, we present Multi-task & Multi-Criteria Review-based Rating (MMCRR), a framework to predict the overall ratings of items by learning how users represent their preferences when using multi-criteria ratings and text reviews. We conduct extensive experiments with three real-life datasets and six baseline models. The results show that MMCRR can reduce prediction errors while learning features better from the data.

Edgar Ceh-Varela, Huiping Cao, Tuan Le
W2FM: The Doubly-Warped Factorization Machine

Factorization Machines (FMs) enhance an underlying linear regression or classification model by capturing feature interactions. Intuitively, FMs warp the feature space to help capture the underlying non-linear structure of the machine learning task. In this paper, we propose novel Doubly-Warped Factorization Machines (or $$\mathtt{W2FM}$$ W 2 FM s) that leverage multiple complementary space warping strategies to improve the representational ability of FMs. Our approach abstracts the feature interaction in FMs as additional affine transformations (thus warping the space), which can be learned efficiently without introducing large numbers of model parameters. We also explore alternative W2FM based approaches and conduct extensive experiments on real world data sets. These experiments show that $$\mathtt{W2FM}$$ W 2 FM achieves better performance in collaborative filtering task not only relative to vanilla FMs, but also against other state-of-the-art competitors, such as Attention FM (AFM), Holographic FM (HFM), and Neural FM (NFM).

Mao-Lin Li, K. Selçuk Candan
Causal Combinatorial Factorization Machines for Set-Wise Recommendation

With set-wise (exact-k, slate, combinatorial) recommendation, we aim to optimize the whole set of items to recommend while taking the dependency among items into consideration. This enables us to model, for example, the substitution relationship of items, i.e., a customer tends to purchase only one item in the same category, in contrast to the top-k recommendation in which the independency of items is assumed. Recent efforts in this context have focused on the computational aspects of optimizing the set of items to recommend. However, they have not taken into account sample selection bias in datasets. Real-world datasets for recommendation have missing entries not completely at random due to biased exposure or user preferences. Addressing the selection bias is important for the set-wise recommendation since methods with larger hypothesis spaces are more likely to overfit biased training data. In light of recent top-k recommendation research that has addressed this issue by using causal inference techniques, we therefore propose a set-wise recommendation model with debiased training methods based on recent causal inference techniques. We demonstrate the advantage of our method using real-world recommendation datasets consisting of biased training sets and randomized test sets.

Akira Tanimoto, Tomoya Sakai, Takashi Takenouchi, Hisashi Kashima
Transformer-Based Multi-task Learning for Queuing Time Aware Next POI Recommendation

Next point-of-interest (POI) recommendation is an important and challenging problem due to different contextual information and wide variety in human mobility patterns. Most of the prior studies incorporated user travel spatiotemporal andsequential patterns to recommend next POIs. However, few of these previous approaches considered the queuing time at POIs and its influence on user’s mobility. The queuing time plays a significant role in affecting user mobility behaviour, e.g., having to queue a long time to enter a POI might reduce visitor’s enjoyment. Recently, attention based recurrent neural networks-based approaches show promising performance in next POI recommendation but they are limited to single head attention which can have difficulty finding the appropriate complex connections between users, previous travel history and POI information. In this research, we present a problem of queuing time aware next POI recommendation and demonstrate how it is non-trivial to both recommend a next POI and simultaneously predict its queuing time. To solve this problem, we propose a multi-task, multi head attention transformer model called TLR-M. The model recommends next POIs to the target users and predicts queuing time to access the POIs simultaneously. By utilizing multi-head attention, the TLR-M model can integrate long range dependencies between any two POI visit efficiently and evaluate their contribution to select next POIs and to predict queuing time. Extensive experiments on eight real datasets show that the proposed model outperforms than the state-of-the-art baseline approaches in terms of precision, recall and F1 score evaluation metrics. The model also predicts and minimizes the queuing time effectively.

Sajal Halder, Kwan Hui Lim, Jeffrey Chan, Xiuzhen Zhang
Joint Modeling Dynamic Preferences of Users and Items Using Reviews for Sequential Recommendation

The emerging of sequential recommender (SR) has attracted increasing attention in recent years, which focuses on understanding and modeling the temporal dynamic of user behaviors hidden in the sequence of user-item interactions. However, with the tremendous increase of users and items, SR still faces several challenges: (1) the hardness of modeling user interests from spare explicit feedback; (2) the time and semantic irregularities hidden in the user’s successive actions. In this study, we present a neural network-based sequential recommender model to learn the temporal-aware user preferences and item popularity jointly from reviews. The proposed model consists of the semantic extracting layer and the dynamic feature learning layer, besides the embedding layer and the output layer. To alleviate the data sparse issue, the semantic extracting layer focuses on exploiting the enriched semantic information hidden in reviews. To address the time and semantic irregularities hidden in user behaviors, the dynamic feature learning layer leverages convolutional fitters with varying size, integrating with a time-ware controller to capture the temporal dynamic of user and item features from multiple temporal dimensions. The experimental results demonstrate that our proposed model outperforms several state-of-art methods consistently.

Tianqi Shang, Xinxin Li, Xiaoyu Shi, Qingxian Wang
Box4Rec: Box Embedding for Sequential Recommendation

Sequential recommendation aims to predict a user’s next behavior in near future by using the user’s most recent behaviors. Most of the existing methods always embed a user or an item as a point in a vector space, and then model the user’s recent behaviors as a sequence with a strict order to generate recommendations. However, both the point representation and strict order rule limit the capacity of sequential recommendation models as the diversity and uncertainty of a user’s interests. In this paper, by relaxing the condition that a sequence must follow a strict order, we introduce the box embedding into the sequential recommendation and present a novel model called Box4Rec. Box4Rec embeds a user and the user’s historical items as boxes instead of points to model the user’s general preference and short-term preference, and then integrates the conjunction and disjunction operations on items to generate flexible recommendation strategies. Experiments on five real-world datasets show the proposed Box4Rec model outperforms the state-of-the-art methods consistently.

Kai Deng, Jiajin Huang, Jin Qin
UKIRF: An Item Rejection Framework for Improving Negative Items Sampling in One-Class Collaborative Filtering

Collaborative Filtering (CF) is one of the most successful techniques in recommender systems. Most CF scenarios depict positive-only implicit feedback, which means that negative feedback is unavailable. Therefore, One-Class Collaborative Filtering (OCCF)techniques have been tailored to tackling these scenarios. Nonetheless, several OCCF models still require negative observations during training, and thus, a popular approach is to consider randomly selected unknown relationships as negative. This work brings forward a novel and non-random approach for selecting negative items called Unknown Item Rejection Framework (UKIRF). More specifically, we instantiate UKIRF using similarity approaches, i.e., TF-IDF and Cosine, to reject items similar to those a user interacted with. We apply UKIRF to different OCCF models in different datasets and show that it improves the recall rates up to 24% when compared to random sampling.

Antônio David Viniski, Jean Paul Barddal, Alceu de Souza Britto Jr.
IACN: Influence-Aware and Attention-Based Co-evolutionary Network for Recommendation

Recommending relevant items to users is a crucial task on online communities such as Reddit and Twitter. For recommendation system, representation learning presents a powerful technique that learns embeddings to represent user behaviors and capture item properties. However, learning embeddings on online communities is a challenging task because the user interest keep evolving. This evolution can be captured from 1) interaction between user and item, 2) influence from other users in the community. The existing dynamic embedding models only consider either of the factors to update user embeddings. However, at a given time, user interest evolves due to a combination of the two factors. To this end, we propose Influence-aware and Attention-based Co-evolutionary Network (IACN). Essentially, IACN consists of two key components: interaction modeling and influence modeling layer. The interaction modeling layer is responsible for updating the embedding of a user and an item when the user interacts with the item. The influence modeling layer captures the temporal excitation caused by interactions of other users. To integrate the signals obtained from the two layers, we design a novel fusion layer that effectively combines interaction-based and influence-based embeddings to predict final user embedding. Our model outperforms the existing state-of-the-art models from various domains.

Shalini Pandey, George Karypis, Jaideep Srivasatava
Nonlinear Matrix Factorization via Neighbor Embedding

Matrix factorization plays a fundamental role in collaborative filtering. There are two basic disciplines among collaborative filtering approaches: neighborhood-based methods and latent factor models. Based on the neighbor-entity spatial relationships, neighborhood-based methods capture the local structure of the user-item rating matrix. Latent factor models capture the global structure of the matrix. Neither neighborhood-based methods nor latent factor models can capture both of them. The recently developed capsule network can capture the part-whole spatial relationships in the images. The basic matrix factorization model and its extensions are among the most successful latent factor models. Motivated by the need for capturing both the local structure and the global structure of the matrix, and inspired by the recently developed capsule network, we propose a new matrix factorization model called capsule matrix factorization, which attempts to capture the two structure of the matrix by propagating the neighbor-entity spatial relationships in the rating matrix into the latent factor vectors. Experimental results on real datasets demonstrate that the capsule matrix factorization model improves the basic matrix model in terms of recommendation accuracy greatly.

Xuan Li, Yunfeng Wu, Li Zhang
Deconfounding Representation Learning Based on User Interactions in Recommendation Systems

Representation learning provides an attractive solution to capture users’ real intents by modeling user interactions in recommendation systems. However, there exist influencing factors called confounders in the process of user interactions. Most traditional methods might ignore these confounders, resulting in learning inaccurate users’ intents. To address the issue, we take a new perspective to develop a deconfounding representation learning model named DRL. Concretely, we infer the unobserved confounders existing in the user-item interactions with an inference network. Then we leverage a generative network to generate users’ personalized intents that contain no unobserved confounders. In order to learn comprehensive users’ intents, we model the user-user interactions by adopting state-of-the-art GNN with a new aggregating strategy. Thus, the users’ real intents we learn not only have their own personalized information but also imply the influence of their friends. The results of two real-world experiments demonstrate that our model can learn accurate and comprehensive representations.

Junruo Gao, Mengyue Yang, Yuyang Liu, Jun Li
Personalized Regularization Learning for Fairer Matrix Factorization

Matrix factorization is a canonical method for modeling user preferences for items. Regularization of matrix factorization models often uses a single hyperparameter tuned globally based on metrics evaluated on all data. However, due to the differences in the structure of per-user data, a globally optimal value may not be locally optimal for each individual user, leading to an unfair disparity in performance. Therefore, we propose to tune individual regularization parameters for each user. Our approach, personalized regularization learning (PRL), solves a secondary learning problem of finding the per-user regularization parameters by back-propagating through alternating least squares. Experiments on a benchmark dataset with different user group splits show that PRL outperforms existing methods in improving model performance for disadvantaged groups. We also analyze the learned parameters, finding insights into the effect of regularization on subpopulations with varying properties.

Sirui Yao, Bert Huang
Instance Selection for Online Updating in Dynamic Recommender Environments

Online recommender systems continuously learn from user interactions that occur in a streaming manner. A fundamental challenge of online recommendation is to select important instances (i.e., user interactions) for model updates to achieve higher prediction accuracy while omitting noisy instances. In this paper, we study (1) how to select the best instances and (2) how to effectively utilize the selected instances in dynamic recommender environments. We present two instance selection strategies based on Self-Paced Learning and rating profiles. We integrate them with Factorization Machines to perform online updates. Moreover, we study the impact of contextual information in online updating. We conducted experiments on a real-world check-in dataset, which contains temporal contextual features. Empirical results demonstrate that ox ur instance selection strategies effectively balance the trade-off between prediction accuracy and efficiency.

Thilina Thanthriwatta, David S. Rosenblum

Text Analytics

Frontmatter
Fusing Essential Knowledge for Text-Based Open-Domain Question Answering

Question answering (QA) systems can be classified as either text-based QA systems or knowledge base QA (KBQA) systems, depending on the used knowledge source. KBQA systems are generally domain-specific and can’t deal with a variety of questions in the open-domain QA setting, while text-based systems can. However, text-based systems’ performance is far from satisfactory. This paper focuses on the text-based open-domain QA setting. We argue that text-based approaches’ poor performance is largely caused by the lack of knowledge, which is often essential for answering the question and can be easily found in knowledge base (KB), in plain text. So in this paper, we propose a new text-based open-domain QA system called KF (Knowledge Fusion)-QA, which uses KB as a second knowledge source to incorporate essential knowledge into text to help answer the question. Our system has a Knowledge-Aware Encoder which extracts essential knowledge from KB and performs knowledge fusion to output knowledge-aware (KA) text representations. With this KA representations, the system first re-rank the retrieved documents, then read the re-ranked top-N documents to give the answer. Our system significantly outperforms existing text-based QA systems on multiple open-domain QA datasets, demonstrating the effectiveness of fusing essential knowledge.

Xiao Su, Ying Li, Zhonghai Wu
TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Short texts have been prevalent in Web sites and the emerging social media for several years, which makes it a critical task to identify intelligible topics from online data sources. However, the existing topic models over short texts cannot analyze the internal components of the learned topics, which is significant for improving the coherence and interpretability of topics. In this paper, we propose a novel topic model for short texts, named TSSE-DMM, for improving the coherence and interpretability of topics by the topic subdivision and alleviating the problem of text sparsity by the semantic enhancement strategy. Firstly, we subdivide each topic into 4 detailed aspects, namely the location aspect, the people & organization aspect, the core word aspect, and the background word aspect, to obtain the different and interpretable components of topics. Then, we combine the Generalized Polya Urn model and the joint word embedding to solve the problem of data sparsity. The extensive experimental results carried on three real-world text collections in two languages show that our model achieves better topic representations than the baseline methods. Moreover, our method has been adopted by the public service hotline platform of Jiangsu province in China.

Chengcheng Mai, Xueming Qiu, Kaiwen Luo, Min Chen, Bo Zhao, Yihua Huang
SILVER: Generating Persuasive Chinese Product Pitch

Building a silver-tongued salesbot is attractive and profitable. The first and pivotal step is to generate a product pitch, which is a short piece of persuasive text which both convey product information and deliver persuasive explanations related to customer demand. Recent advances in deep neural networks have empowered text generation systems to produce natural language descriptions of products. However, to produce persuasive product pitches, deep neural networks need to be fed with massive amounts of persuasive samples, which are not available due to huge labelling cost. This paper proposes SILVER, a persuasive Chinese product pitch generator, which addresses the issue of insufficient labeled data with data-level, knowledge-level and model-level solutions. At the data level, SILVER employs statistic analysis to automatically derive weak supervision rules that correlate with persuasive texts. At the model level, SILVER apply the weak supervision rules to re-rank outputs from an ensemble of models to enhance pitch generation performance. Finally, at the knowledge level, SILVER incorporates attribute hierarchy to embed product information in the pitch. Both automatic and human-involved evaluations on real data demonstrate that SILVER is able to produce more fluent, catchy and informative snippets than state-of-the-art text generation approaches.

Yunsen Hong, Hui Li, Yanghua Xiao, Ryan McBride, Chen Lin
Capturing SQL Query Overlapping via Subtree Copy for Cross-Domain Context-Dependent SQL Generation

The key challenge of cross-domain context-dependent text-to-SQL generation tasks lies in capturing the relation of natural language utterance and SQL queries in different turns. A line of works attempt to combat this challenge by capturing the overlaps among consecutively generated SQL queries. Existing models sequentially generate the SQL query for a single turn and model the SQL overlaps via copying tokens or segments generated in previous turns. However, they are not flexible enough to capture various overlapping granularities, e.g., columns, filters, or even the whole query, as they neglect the intrinsic structures inhabited in SQL queries. In this paper, we employ tree-structured intermediate representations of SQL queries, i.e., SemQL, for SQL generation and propose a novel subtree-copy mechanism to characterize the SQL overlaps. At each turn, we encode the interaction questions and previously generated trees as context and decode the SemQL tree in a top-down fashion. Each node is either generated according to SemQL grammar or copied from previously generated SemQL subtrees. Our model can capture various overlapping granularities by copying nodes at different levels of SemQL trees. We evaluate our approach on the SParC dataset and the experimental results show the superior performance of our model compared with state-of-the-art baselines.

Ruizhuo Zhao, Jinhua Gao, Huawei Shen, Xueqi Cheng
HScodeNet: Combining Hierarchical Sequential and Global Spatial Information of Text for Commodity HS Code Classification

Commodity Harmonization System (HS) code classification is an important customs procedure in cross-border trade. HS code classification is to identify the category (i.e., HS code) of a commodity according to its description information. In fact, HS code classification is essentially a text classification task. However, compared with general text classification, the challenge of this task is that commodity description texts are organized in special hierarchical structures and contain multiple independent semantic segments. What’s more, the label space (i.e., the HS code system) has hierarchical correlation. In this paper, we propose a HS code classification neural network (HScodeNet) by incorporating the hierarchical sequential and global spatial information of texts, in which a hierarchical sequence learning module is designed to capture the sequential information and a text graph learning module is designed to capture the spatial information of commodity description texts. In addition, a label correlation loss function is presented to train the model. Extensive experiments on several real-world customs commodity datasets show the superiority of our HScodeNet model.

Shaohua Du, Zhihao Wu, Huaiyu Wan, YouFang Lin
PLVCG: A Pretraining Based Model for Live Video Comment Generation

Live video comment generating task aims to automatically generate real-time viewer comments on videos like real viewers do. Like providing search suggestions by search engines, this task can help viewers find comments they want to post by providing generated comments. Previous works ignore the interactivity and diversity of comments and can only generate general and popular comments. In this paper, we incorporate post time of the comments to deal with the real-time related comment interactions. We also take the video type labels into consideration to handle the diversity of comments and generate more related and informative comments. To this end, we propose a pre-training based encoder-decoder joint model called PLVCG model. This model is composed of a bidirectional encoder to encode context comments and visual frames jointly as well as an auto-regressive decoder to generate real-time comments and classify the type of the video. We evaluate our model in a large-scale real-world live comment dataset. The experiment results present that our model outperforms the state-of-the-art on live video comment ranking and generating task significantly.

Zehua Zeng, Neng Gao, Cong Xue, Chenyang Tu
Inducing Rich Interaction Structures Between Words for Document-Level Event Argument Extraction

Event Argument Extraction (EAE) is the task of identifying roles of entity mentions/arguments in events evoked by trigger words. Most existing works have focused on sentence-level EAE, leaving document-level EAE (i.e., event triggers and arguments belong to different sentences in documents) an under-studied problem in the literature. This paper introduces a new deep learning model for document-level EAE where document structures/graphs are utilized to represent input documents and aid the representation learning. Our model employs different types of interactions between important context words in documents (i.e., syntax, semantic, and discourse) to enhance document representations. Extensive experiments are conducted to demonstratethe effectiveness of the proposed model, leading to the state-of-the-art performance for document-level EAE.

Amir Pouran Ben Veyseh, Franck Dernoncourt, Quan Tran, Varun Manjunatha, Lidan Wang, Rajiv Jain, Doo Soon Kim, Walter Chang, Thien Huu Nguyen
Exploiting Relevant Hyperlinks in Knowledge Base for Entity Linking

In this study, we propose a new model aiming to enhance the quality of entity linking by exploiting highly relevant hyperlinks in knowledge base for entity disambiguation. We find that most existing studies do not filter the corresponding hyperlinks for each entity, where using the irrelevant ones may introduce noises and lower the linking accuracy. To address this issue, we design and combine the hyperlink extraction stage and the hyperlink attention stage to learn more suitable hyperlinks in the document-level disambiguation. In addition, we also enhance the context-level disambiguation by utilizing additional entity descriptions and work on retrieving high-quality candidate set for entities at the beginning of our model. Experimental results show that our proposed model outperforms the state-of-the-arts on various benchmark datasets, and even being competitive to the models that rely on additional information.

Szu-Yuan Cheng, Yi-Ling Chen, Mi-Yen Yeh, Bo-Tao Lin
TANTP: Conversational Emotion Recognition Using Tree-Based Attention Networks with Transformer Pre-training

Conversational emotion recognition has gained significant attention in data mining and text mining recently. Most existing methods only consider the utterance in conversations as a temporal sequence and ignore the fine-grained emotional clues in the compositional structure, where the non-ignorable semantic transitions and tone enhancement are implied. Consequently, such models hardly capture accurate semantic features of the utterance, which results in the accumulation of incorrect emotional features in the memory bank. To address this problem, we propose a novel framework, Tree-based Attention Networks with Transformer Pre-training (TANTP), which incorporates contextual representations and recursive constituency tree structure into the model architecture. Different from merely modeling the utterance in light of the time order, TANTP could effectively capture compositional emotion semantics of utterance features for the memory bank, where complex semantic transitions and emotional progression are difficult to be revealed by previous conventional sequential methods. Experimental results conducted on two public benchmark datasets demonstrate that TANTP could achieve superior performance compared with other state-of-the-art models.

Haozhe Liu, Hongzhan Lin, Guang Chen
Semantic-Syntax Cascade Injection Model for Aspect Sentiment Triple Extraction

Aspect sentiment triple extraction aims to extract all aspects, opinions, and sentiments in a sentence and pair them into triples. The main challenge lies at mining the dependency between the aspect and corresponding opinion with the specific sentiment. Existing methods capture the dependency via either pipeline framework or collapsed sequence labeling model. However, the pipeline framework may suffer from error propagation, while collapsed tags cannot deal with complex pairing situations where the overlap or long dependency exists. In this paper, we propose a novel semantic-syntax cascade injection model (SSCIM) to address above issues. SSCIM adopts a cascade framework with joint training schema, where its lower layer extracts the aspects and injects those aspects into the upper layer to extract opinion and sentiment simultaneously. Such design is inspired by the fact that the sentiment is often conveyed in opinions, and the joint training schema alleviates error propagation effectively. Moreover, a novel semantic-syntax information injection gate (IIG) is designed to bridge the upper and lower layers of our model, enabling SSCIM to better capture the dependency between aspects and opinions. Experimental results on four benchmark datasets demonstrate the superior performance of the proposed model over state-of-the-art baselines.

Wenjun Ke, Jinhua Gao, Huawei Shen, Xueqi Cheng
Modeling Inter-aspect Relationship with Conjunction for Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis is currently a main focus within the domain of sentiment analysis, whose target is to identify the sentiment polarities of specific aspect terms. The ongoing research is absent of exploiting the inter-aspect relationship while mainly focus on modeling the aspect terms and its context independently. To address this problem, we propose a model integrating the conjunction information and the sentiment of the preceding aspect term. As such, the inter-aspect relation between adjacent aspect terms can be precisely modeled and applied to sentiment classification. Experimental results on SemEval 2014 and MAMS show that our model outperform the baseline methods, especially dealing with the multi-aspect terms, which establishes a strong evidence of the effectiveness of the proposed method.

Haoliang Zhao, Yun Xue, Donghong Gu, Jianying Chen, Luwei Xiao
Backmatter
Metadata
Title
Advances in Knowledge Discovery and Data Mining
Editors
Prof. Kamal Karlapalem
Hong Cheng
Naren Ramakrishnan
R. K. Agrawal
P. Krishna Reddy
Dr. Jaideep Srivastava
Assist. Prof. Tanmoy Chakraborty
Copyright Year
2021
Electronic ISBN
978-3-030-75765-6
Print ISBN
978-3-030-75764-9
DOI
https://doi.org/10.1007/978-3-030-75765-6

Premium Partner