main-content

## Über dieses Buch

The three-volume set LNAI 11439, 11440, and 11441 constitutes the thoroughly refereed proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019, held in Macau, China, in April 2019.

The 137 full papers presented were carefully reviewed and selected from 542 submissions. The papers present new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, visualization, decision-making systems, and the emerging applications. They are organized in the following topical sections: classification and supervised learning; text and opinion mining; spatio-temporal and stream data mining; factor and tensor analysis; healthcare, bioinformatics and related topics; clustering and anomaly detection; deep learning models and applications; sequential pattern mining; weakly supervised learning; recommender system; social network and graph mining; data pre-processing and featureselection; representation learning and embedding; mining unstructured and semi-structured data; behavioral data mining; visual data mining; and knowledge graph and interpretable data mining.

## Inhaltsverzeichnis

### AAANE: Attention-Based Adversarial Autoencoder for Multi-scale Network Embedding

Network embedding represents nodes in a continuous vector space and preserves structure information from a network. Existing methods usually adopt a “one-size-fits-all” approach when concerning multi-scale structure information, such as first- and second-order proximity of nodes, ignoring the fact that different scales play different roles in embedding learning. In this paper, we propose an Attention-based Adversarial Autoencoder Network Embedding (AAANE) framework, which promotes the collaboration of different scales and lets them vote for robust representations. The proposed AAANE consists of two components: (1) an attention-based autoencoder that effectively capture the highly non-linear network structure, which can de-emphasize irrelevant scales during training, and (2) an adversarial regularization guides the autoencoder in learning robust representations by matching the posterior distribution of the latent embeddings to a given prior distribution. Experimental results on real-world networks show that the proposed approach outperforms strong baselines.

Lei Sang, Min Xu, Shengsheng Qian, Xindong Wu

### NEAR: Normalized Network Embedding with Autoencoder for Top-K Item Recommendation

The recommendation system is an important tool both for business and individual users, aiming to generate a personalized recommended list for each user. Many studies have been devoted to improving the accuracy of recommendation, while have ignored the diversity of the results. We find that the key to addressing this problem is to fully exploit the hidden features of the heterogeneous user-item network, and consider the impact of hot items. Accordingly, we propose a personalized top-k item recommendation method that jointly considers accuracy and diversity, which is called Normalized Network Embedding with Autoencoder for Personalized Top-K Item Recommendation, namely NEAR. Our model fully exploits the hidden features of the heterogeneous user-item network data and generates more general low dimension embedding, resulting in more accurate and diverse recommendation sequences. We compare NEAR with some state-of-the-art algorithms on the DBLP and MovieLens1M datasets, and the experimental results show that our method is able to balance the accuracy and diversity scores.

Dedong Li, Aimin Zhou, Chuan Shi

### Ranking Network Embedding via Adversarial Learning

Network Embedding is an effective and widely used method for extracting graph features automatically in recent years. To handle the widely existed large-scale networks, most of the existing scalable methods, e.g., DeepWalk, LINE and node2vec, resort to the negative sampling objective so as to alleviate the expensive computation. Though effective at large, this strategy can easily generate false, thus low-quality, negative samples due to the trivial noise generation process which is usually a simple variant of the unigram distribution. In this paper, we propose a Ranking Network Embedding (RNE) framework to leverage the ranking strategy to achieve scalability and quality simultaneously. RNE can explicitly encode node similarity ranking information into the embedding vectors, of which we provide two ranking strategies, vanilla and adversarial, respectively. The vanilla strategy modifies the uniform negative sampling method with a consideration of edge existance. The adversarial strategy unifies the triplet sampling phase and the learning phase of the model with the framework of Generative Adversarial Networks. Through adversarial training, the triplet sampling quality can be improved thanks to a softmax generator which constructs hard negatives for a given target. The effectiveness of our RNE framework is empirically evaluated on a variety of real-world networks with multiple network analysis tasks.

Quanyu Dai, Qiang Li, Liang Zhang, Dan Wang

### Selective Training: A Strategy for Fast Backpropagation on Sentence Embeddings

Representation or embedding based machine learning models, such as language models or convolutional neural networks have shown great potential for improved performance. However, for complex models on large datasets training time can be extensive, approaching weeks, which is often infeasible in practice. In this work, we present a method to reduce training time substantially by selecting training instances that provide relevant information for training. Selection is based on the similarity of the learned representations over input instances, thus allowing for learning a non-trivial weighting scheme from multi-dimensional representations. We demonstrate the efficiency and effectivity of our approach in several text classification tasks using recursive neural networks. Our experiments show that by removing approximately one fifth of the training data the objective function converges up to six times faster without sacrificing accuracy.

Jan Neerbek, Peter Dolog, Ira Assent

### Extracting Keyphrases from Research Papers Using Word Embeddings

Unsupervised random-walk keyphrase extraction models mainly rely on global structural information of the word graph, with nodes representing candidate words and edges capturing the co-occurrence information between candidate words. However, integrating different types of useful information into the representation learning process to help better extract keyphrases is relatively unexplored. In this paper, we propose a random-walk method to extract keyphrases using word embeddings. Specifically, we first design a new word embedding learning model to integrate local context information of the word graph (i.e., the local word collocation patterns) with some crucial features of candidate words and edges. Then, a novel random-walk ranking model is designed to extract keyphrases by leveraging such word embeddings. Experimental results show that our approach outperforms 8 state-of-the-art unsupervised methods on two real datasets consistently for keyphrase extraction.

Wei Fan, Huan Liu, Suge Wang, Yuxiang Zhang, Yaocheng Chang

### Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods model documents through multinomial distribution on bags of words. Although these methods can effectively utilize the word burstiness representation of documents and achieve decent performance, they do not explore the sequential information of text and relationships among synonyms. In this paper, the documents are modeled as the joint of bags of words, sequential features and word embeddings. We proposed Sequential Embedding induced Dirichlet Process Mixture Model (SiDPMM) to effectively exploit this joint document representation in text clustering. The sequential features are extracted by the encoder-decoder component. Word embeddings produced by the continuous-bag-of-words (CBOW) model are introduced to handle synonyms. Experimental results demonstrate the benefits of our model in two major aspects: (1) improved performance across multiple diverse text datasets in terms of the normalized mutual information (NMI); (2) more accurate inference of ground truth cluster numbers with regularization effect on tiny outlier clusters.

Tiehang Duan, Qi Lou, Sargur N. Srihari, Xiaohui Xie

### SSNE: Status Signed Network Embedding

This work studies the problem of signed network embedding, which aims to obtain low-dimensional vectors for nodes in signed networks. Existing works mostly focus on learning representations via characterizing the social structural balance theory in signed networks. However, structural balance theory could not well satisfy some of the fundamental phenomena in real-world signed networks such as the direction of links. As a result, in this paper we integrate another theory Status Theory into signed network embedding since status theory can better explain the social mechanisms of signed networks. To be specific, we characterize the status of nodes in the semantic vector space and well design different ranking objectives for positive and negative links respectively. Besides, we utilize graph attention to assemble the information of neighborhoods. We conduct extensive experiments on three real-world datasets and the results show that our model can achieve a significant improvement compared with baselines.

Chunyu Lu, Pengfei Jiao, Hongtao Liu, Yaping Wang, Hongyan Xu, Wenjun Wang

### On the Network Embedding in Sparse Signed Networks

Network embedding, that learns low-dimensional node representations in a graph such that the network structure is preserved, has gained significant attention in recent years. Most state-of-the-art embedding methods have mainly designed algorithms for representing nodes in unsigned social networks. Moreover, recent embedding approaches designed for the sparse real-world signed networks have several limitations, especially in the presence of a vast majority of disconnected node pairs with opposite polarities towards their common neighbors. In this paper, we propose sign2vec, a deep learning based embedding model designed to represent nodes in a sparse signed network. sign2vec leverages on signed random walks to capture the higher-order neighborhood relationships between node pairs, irrespective of their connectivity. We design a suitable objective function to optimize the learned node embeddings such that the link forming behavior of individual nodes is captured. Experiments on empirical signed network datasets demonstrate the effectiveness of embeddings learned by sign2vec for several downstream applications while outperforming state-of-the-art baseline algorithms.

Ayan Kumar Bhowmick, Koushik Meneni, Bivas Mitra

### MSNE: A Novel Markov Chain Sampling Strategy for Network Embedding

Network embedding methods have obtained great progresses on many tasks, such as node classification and link prediction. Sampling strategy is very important in network embedding. It is still a challenge for sampling in a network with complicated topology structure. In this paper, we propose a high-order Markov chain Sampling strategy for Network Embedding (MSNE). MSNE selects the next sampled node based on a distance metric between nodes. Due to high-order sampling, it can exploit the whole sampled path to capture network properties and generate expressive node sequences which are beneficial for downstream tasks. We conduct the experiments on several benchmark datasets. The results show that our model can achieve substantial improvements in two tasks of node classification and link prediction. (Datasets and code are available at https://github.com/SongY123/MSNE .)

Ran Wang, Yang Song, Xin-yu Dai

### Auto-encoder Based Co-training Multi-view Representation Learning

Multi-view learning is a learning problem that utilizes the various representations of an object to mine valuable knowledge and improve the performance of learning algorithm, and one of the significant directions of multi-view learning is sub-space learning. As we known, auto-encoder is a method of deep learning, which can learn the latent feature of raw data by reconstructing the input, and based on this, we propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL), which utilizes both complementarity and consistency and finds a joint latent feature representation of multiple views. The algorithm has two stages, the first is to train auto-encoder of each view, and the second stage is to train a supervised network. Interestingly, the two stages share the weights partly and assist each other by co-training process. According to the experimental result, we can learn a well performed latent feature representation, and auto-encoder of each view has more powerful reconstruction ability than traditional auto-encoder.

Run-kun Lu, Jian-wei Liu, Yuan-fang Wang, Hao-jie Xie, Xin Zuo

### Robust Semi-supervised Representation Learning for Graph-Structured Data

The success of machine learning algorithms generally depends on data representation and recently many representation learning methods have been proposed. However, learning a good representation may not always benefit the classification tasks. It sometimes even hurt the performance as the learned representation maybe not related to the ultimate tasks, especially when the labeled examples are few to afford a reliable model selection. In this paper, we propose a novel robust semi-supervised graph representation learning method based on graph convolutional network. To make the learned representation more related to the ultimate classification task, we propose to extend label information based on the smooth assumption and obtain pseudo-labels for unlabeled nodes. Moreover, to make the model robust with noise in the pseudo-label, we propose to apply a large margin classifier to the learned representation. Influenced by the pseudo-label and the large-margin principle, the learned representation can not only exploit the label information encoded in the graph-structure sufficiently but also can produce a more rigorous decision boundary. Experiments demonstrate the superior performance of the proposal over many related methods.

Lan-Zhe Guo, Tao Han, Yu-Feng Li

### Characterizing the SOM Feature Detectors Under Various Input Conditions

A classifier with self-organizing maps (SOM) as feature detectors resembles the biological visual system learning mechanism. Each SOM feature detector is defined over a limited domain of viewing condition, such that its nodes instantiate the presence of an object’s part in the corresponding domain. The weights of the SOM nodes are trained via competition, similar to the development of the visual system. We argue that to approach human pattern recognition performance, we must look for a more accurate model of the visual system, not only in terms of the architecture, but also on how the node connections are developed, such as that of the SOM’s feature detectors. This work characterizes SOM as feature detectors to test the similarity of its response vis-á-vis the response of the biological visual system, and to benchmark its performance vis-á-vis the performance of the traditional feature detector convolution filter. We use various input environments i.e. inputs with limited patterns, inputs with various input perturbation and inputs with complex objects, as test cases for evaluation.

Macario O. Cordel, Arnulfo P. Azcarraga

### PCANE: Preserving Context Attributes for Network Embedding

Through mapping network nodes into low-dimensional vectors, network embedding methods have shown promising results for many downstream tasks, such as link prediction and node classification. Recently, attributed network embedding obtained progress on the network associated with node attributes. However, it is insufficient to ignore the attributes of the context nodes, which are also helpful for node proximity. In this paper, we propose a new attributed network embedding method named PCANE (Preserving Context Attributes for Network Embedding). PCANE preserves both network structure and the context attributes by optimizing new object functions, and further produces more informative node representations. PCANE++ is also proposed to represent the isolated nodes, and is better to represent high degree nodes. Experiments on 3 real-world attributed networks show that our methods outperform the other network embedding methods on link prediction and node classification tasks.

Danhao Zhu, Xin-yu Dai, Kaijia Yang, Jiajun Chen, Yong He

### A Novel Framework for Node/Edge Attributed Graph Embedding

Graph embedding has attracted increasing attention due to its critical application in social network analysis. Most existing algorithms for graph embedding utilize only the topology information, while recently several methods are proposed to consider node content information. However, the copious information on edges has not been explored. In this paper, we study the problem of representation learning in node/edge attributed graph, which differs from normal attributed graph in that edges can also be contented with attributes. We propose GERI, which learns graph embedding with rich information in node/edge attributed graph through constructing a heterogeneous graph. GERI includes three steps: construct a heterogeneous graph, take a novel and biased random walk to explore the constructed heterogeneous graph and finally use modified heterogeneous skip-gram to learn embedding. Furthermore, we upgrade GERI to semi-supervised GERI (named SGERI) by incorporating label information on nodes. The effectiveness of our methods is demonstrated by extensive comparison experiments with strong baselines on various datasets.

Guolei Sun, Xiangliang Zhang

### Context-Aware Dual-Attention Network for Natural Language Inference

Natural Language Inference (NLI) is a fundamental task in natural language understanding. In spite of the importance of existing research on NLI, the problem of how to exploit the contexts of sentences for more precisely capturing the inference relations (i.e. by addressing the issues such as polysemy and ambiguity) is still much open. In this paper, we introduce the corresponding image into inference process. Along this line, we design a novel Context-Aware Dual-Attention Network (CADAN) for tackling NLI task. To be specific, we first utilize the corresponding images as the Image Attention to construct an enriched representation for sentences. Then, we use the enriched representation as the Sentence Attention to analyze the inference relations from detailed perspectives. Finally, a sentence matching method is designed to determine the inference relation in sentence pairs. Experimental results on large-scale NLI corpora and real-world NLI alike corpus demonstrate the superior effectiveness of our CADAN model.

Kun Zhang, Guangyi Lv, Enhong Chen, Le Wu, Qi Liu, C. L. Philip Chen

### Best from Top k Versus Top 1: Improving Distant Supervision Relation Extraction with Deep Reinforcement Learning

Distant supervision relation extraction is a promising approach to find new relation instances from large text corpora. Most previous works employ the top 1 strategy, i.e., predicting the relation of a sentence with the highest confidence score, which is not always the optimal solution. To improve distant supervision relation extraction, this work applies the best from top k strategy to explore the possibility of relations with lower confidence scores. We approach the best from top k strategy using a deep reinforcement learning framework, where the model learns to select the optimal relation among the top k candidates for better predictions. Specifically, we employ a deep Q-network, trained to optimize a reward function that reflects the extraction performance under distant supervision. The experiments on three public datasets - of news articles, Wikipedia and biomedical papers - demonstrate that the proposed strategy improves the performance of traditional state-of-the-art relation extractors significantly. We achieve an improvement of 5.13% in average F $$_1$$ -score over four competitive baselines.

Yaocheng Gui, Qian Liu, Tingming Lu, Zhiqiang Gao

### Towards One Reusable Model for Various Software Defect Mining Tasks

Software defect mining is playing an important role in software quality assurance. Many deep neural network based models have been proposed for software defect mining tasks, and have pushed forward the state-of-the-art mining performance. These deep models usually require a huge amount of task-specific source code for training to capture the code functionality to mine the defects. But such requirement is often hard to be satisfied in practice. On the other hand, lots of free source code and corresponding textual explanations are publicly available in the open source software repositories, which is potentially useful in modeling code functionality. However, no previous studies ever leverage these resources to help defect mining tasks. In this paper, we propose a novel framework to learn one reusable deep model for code functional representation using the huge amount of publicly available task-free source code as well as their textual explanations. And then reuse it for various software defect mining tasks. Experimental results on three major defect mining tasks with real world datasets indicate that by reusing this model in specific tasks, the mining performance outperforms its counterpart that learns deep models from scratch, especially when the training data is insufficient.

Heng-Yi Li, Ming Li, Zhi-Hua Zhou

### User Preference-Aware Review Generation

There are more and more online sites that allow users to express their sentiments by writing reviews. Recently, researchers have paid attention to review generation. They generate review text under specific contexts, such as rating, user ID or product ID. The encoder-attention-decoder based methods achieve impressive performance in this task. However, these methods do not consider user preference when generating reviews. Only considering numeric contexts such as user ID or product ID, these methods tend to generate generic and boring reviews, which results in a lack of diversity when generating reviews for different users or products. We propose a user preference-aware review generation model to take account of user preference. User preference reflects the characteristics of the user and has a great impact when the user writes reviews. Specifically, we extract keywords from users’ reviews using a score function as user preference. The decoder generates words depending on not only the context vector but also user preference when decoding. Through considering users’ preferred words explicitly, we generate diverse reviews. Experiments on a real review dataset from Amazon show that our model outperforms state-of-the-art baselines according to two evaluation metrics.

Wei Wang, Hai-Tao Zheng, Hao Liu

### Mining Cluster Patterns in XML Corpora via Latent Topic Models of Content and Structure

We present two innovative machine-learning approaches to topic model clustering for the XML domain. The first approach consists in exploiting consolidated clustering techniques, in order to partition the input XML documents by their meaning. This is captured through a new Bayesian probabilistic topic model, whose novelty is the incorporation of Dirichlet-multinomial distributions for both content and structure. In the second approach, a novel Bayesian probabilistic generative model of XML corpora seamlessly integrates the foresaid topic model with clustering. Both are conceived as interacting latent factors, that govern the wording of the input XML documents. Experiments over real-world benchmark XML corpora reveal the overcoming effectiveness of the devised approaches in comparison to several state-of-the-art competitors.

Gianni Costa, Riccardo Ortale

### A Large-Scale Repository of Deterministic Regular Expression Patterns and Its Applications

Deterministic regular expressions (DREs) have been used in a myriad of areas in data management. However, to the best of our knowledge, presently there has been no large-scale repository of DREs in the literature. In this paper, based on a large corpus of data that we harvested from the Web, we build a large-scale repository of DREs by first collecting a repository after analyzing determinism of the real data; and then further processing the data by using normalized DREs to construct a compact repository of DREs, called DRE pattern set. At last we use our DRE patterns as benchmark datasets in several algorithms that have lacked experiments on real DRE data before. Experimental results demonstrate the usefulness of the repository.

Haiming Chen, Yeting Li, Chunmei Dong, Xinyu Chu, Xiaoying Mou, Weidong Min

### Determining the Impact of Missing Values on Blocking in Record Linkage

Record linkage is the process of integrating information from the same underlying entity across disparate data sets. This process, which is increasingly utilized to build accurate representations of individuals and organizations for a variety of applications, ranging from credit worthiness assessments to continuity of medical care, can be computationally intensive because it requires comparing large quantities of records over a range of attributes. To reduce the amount of computation in record linkage in big data settings, blocking methods, which are designed to limit the number of record pair comparisons that needs to be performed, are critical for scaling up the record linkage process. These methods group together potential matches into blocks, often using a subset of attributes before a final comparator function predicts which record pairs within the blocks correspond to matches. Yet data corruption and missing values adversely influence the performance of blocking methods (e.g., it may cause some matching records not to be placed in the same block). While there has been some investigation into the impact of missing values on general record linkage techniques (e.g., the comparator function), no study has addressed the impact of the missing values on blocking methods. To address this issue, in this work, we systematically perform a detailed empirical analysis of the individual and joint impact of missing values and data corruption on different blocking methods using realistic data sets. Our results show that blocking approaches that do not depend on one type of blocking attributes are more robust against missing values. In addition, our results indicate that blocking parameters must be chosen carefully for different blocking techniques.

Imrul Chowdhury Anindya, Murat Kantarcioglu, Bradley Malin

### Bridging the Gap Between Research and Production with CODE

Despite the ever-increasing enthusiasm from the industry, artificial intelligence or machine learning is a much-hyped area where the results tend to be exaggerated or misunderstood. Many novel models proposed in research papers never end up being deployed to production. The goal of this paper is to highlight four important aspects which are often neglected in real-world machine learning projects, namely Communication, Objectives, Deliverables, Evaluations (CODE). By carefully considering these aspects, we can avoid common pitfalls and carry out a smoother technology transfer to real-world applications. We draw from a priori experiences and mistakes while building a real-world online advertising platform powered by machine learning technology, aiming to provide general guidelines for translating ML research results to successful industry projects.

Yiping Jin, Dittaya Wanvarie, Phu T. V. Le

### Distance2Pre: Personalized Spatial Preference for Next Point-of-Interest Prediction

Point-of-interest (POI) prediction is a key task in location-based social networks. It captures the user preference to predict POIs. Recent studies demonstrate that spatial influence is significant for prediction. The distance can be converted to a weight reflecting the relevance of two POIs or can be utilized to find nearby locations. However, previous studies almost ignore the correlation between user and distance. When people choose the next POI, they will consider the distance at the same time. Besides, spatial influence varies greatly for different users. In this work, we propose a Distance-to-Preference (Distance2Pre) network for the next POI prediction. We first acquire the user’s sequential preference by modeling check-in sequences. Then, we propose to acquire the spatial preference by modeling distances between successive POIs. This is a personalized process and can capture the relationship in user-distance interactions. Moreover, we propose two preference encoders which are a linear fusion and a non-linear fusion. Such encoders explore different ways to fuse the above two preferences. Experiments on two real-world datasets show the superiority of our proposed network.

Qiang Cui, Yuyuan Tang, Shu Wu, Liang Wang

### Using Multi-objective Optimization to Solve the Long Tail Problem in Recommender System

An improved algorithm for recommender system is proposed in this paper where not only accuracy but also comprehensiveness of recommendation items is considered. We use a weighted similarity measure based on non-dominated sorting genetic algorithm II (NSGA-II). The solution of optimal weight vector is transformed into the multi-objective optimization problem. Both accuracy and coverage are taken as the objective functions simultaneously. Experimental results show that the proposed algorithm improves the coverage while the accuracy is kept.

Jiaona Pang, Jun Guo, Wei Zhang

### Event2Vec: Learning Event Representations Using Spatial-Temporal Information for Recommendation

Event-based social networks (EBSN), such as meetup.com and plancast.com , have witnessed increased popularity and rapid growth in recent years. In EBSN, a user can choose to join any events such as a conference, house party, or drinking event. In this paper, we present a novel model—Event2Vec, which explores how representation learning for events incorporating spatial-temporal information can help event recommendation in EBSN. The spatial-temporal information represents the physical location and the time where and when an event will take place. It typically has been modeled as a bias in conventional recommendation models. However, such an approach ignores the rich semantics associated with the spatial-temporal information. In Event2Vec, the spatial-temporal influences are naturally incorporated into the learning of latent representations for events, so that Event2Vec predicts user’s preference on events more accurately. We evaluate the effectiveness of the proposed model on three real datasets; our experiments show that with a proper modeling of the spatial-temporal information, we can significantly improve event recommendation performance.

Yan Wang, Jie Tang

### Maximizing Gain over Flexible Attributes in Peer to Peer Marketplaces

Peer to peer marketplaces enable transactional exchange of services directly between people. In such platforms, those providing a service are faced with various choices. For example in travel peer to peer marketplaces, although some amenities (attributes) in a property are fixed, others are relatively flexible and can be provided without significant effort. Providing an attribute is usually associated with a cost. Naturally, different sets of attributes may have a different “gains” for a service provider. Consequently, given a limited budget, deciding which attributes to offer is challenging.In this paper, we formally introduce and define the problem of Gain Maximization over Flexible Attributes (GMFA) and study its complexity. We provide a practically efficient exact algorithm to the GMFA problem that can handle any monotonic gain function. Since the users of the peer to peer marketplaces may not have access to any extra information other than existing tuples in the database, as the next part of our contribution, we introduce the notion of frequent-item based count (FBC), which utilizes nothing but the database itself. We conduct a comprehensive experimental evaluation on real data from AirBnB and a case study that confirm the efficiency and practicality of our proposal.

Abolfazl Asudeh, Azade Nazi, Nick Koudas, Gautam Das

### An Attentive Spatio-Temporal Neural Model for Successive Point of Interest Recommendation

In a successive Point of Interest (POI) recommendation problem, analyzing user behaviors and contextual check-in information in past POI visits are essential in predicting, thus recommending, where they would likely want to visit next. Although several works, especially the Matrix Factorization and/or Markov chain based methods, are proposed to solve this problem, they have strong independence and conditioning assumptions. In this paper, we propose a deep Long Short Term Memory recurrent neural network model with a memory/attention mechanism, for the successive Point-of-Interest recommendation problem, that captures both the sequential, and temporal/spatial characteristics into its learned representations. Experimental results on two popular Location-Based Social Networks illustrate significant improvements of our method over the state-of-the-art methods. Our method is also robust to overfitting compared with popular methods for the recommendation tasks.

Khoa D. Doan, Guolei Yang, Chandan K. Reddy

### Mentor Pattern Identification from Product Usage Logs

A typical software tool for solving complex problems tends to expose a rich set of features to its users. This creates challenges such as new users facing a steep onboarding experience and current users tending to use only a small fraction of the software’s features. This paper describes and solves an unsupervised mentor pattern identification problem from product usage logs for softening both challenges. The problem is formulated as identifying a set of users (mentors) that satisfies three mentor qualification metrics: (a) the mentor set is small, (b) every user is close to some mentor as per usage pattern, and (c) every feature has been used by some mentor. The proposed solution models the task as a non-convex variant of an regularized logistic regression problem and develops an alternating minimization style algorithm to solve it. Numerical experiments validate the necessity and effectiveness of mentor identification towards improving the performance of a k-NN based product feature recommendation system for a real-world dataset. Further, t-SNE visuals demonstrate that the proposed algorithm achieves a trade-off that is both quantitatively and qualitatively distinct from alternative approaches to mentor identification such as Maximum Marginal Relevance and K-means.

Ankur Garg, Aman Kharb, Yash H. Malviya, J. P. Sagar, Atanu R. Sinha, Iftikhar Ahamath Burhanuddin, Sunav Choudhary

### AggregationNet: Identifying Multiple Changes Based on Convolutional Neural Network in Bitemporal Optical Remote Sensing Images

The detection of multiple changes (i.e., different change types) in bitemporal remote sensing images is a challenging task. Numerous methods focus on detecting the changing location while the detailed “from-to” change types are neglected. This paper presents a supervised framework named AggregationNet to identify the specific “from-to” change types. This AggregationNet takes two image patches as input and directly output the change types. The AggregationNet comprises a feature extraction part and a feature aggregation part. Deep “from-to” features are extracted by the feature extraction part which is a two-branch convolutional neural network. The feature aggregation part is adopted to explore the temporal correlation of the bitemporal image patches. A one-hot label map is proposed to facilitate AggregationNet. One element in the label map is set to 1 and others are set to 0. Different change types are represented by different locations of 1 in the one-hot label map. To verify the effectiveness of the proposed framework, we perform experiments on general optical remote sensing image classification datasets as well as change detection dataset. Extensive experimental results demonstrate the effectiveness of the proposed method.

Qiankun Ye, Xiankai Lu, Hong Huo, Lihong Wan, Yiyou Guo, Tao Fang

### Detecting Micro-expression Intensity Changes from Videos Based on Hybrid Deep CNN

Facial micro-expressions, which usually last only for a fraction of a second, are challenging to detect by the human eye or machine. They are useful for understanding the genuine emotional state of a human face, and have various applications in education, medical, surveillance and legal sectors. Existing works on micro-expressions are focused on binary classification of the micro-expressions. However, detecting the micro-expression intensity changes over the spanning time, i.e., the micro-expression profiling, is not addressed in the literature. In this paper, we present a novel deep Convolutional Neural Network (CNN) based hybrid framework for micro-expression intensity change detection together with an image pre-processing technique. The two components of our hybrid framework, namely a micro-expression stage classifier, and an intensity estimator, are designed using a 3D and 2D shallow deep CNNs respectively. Moreover, we propose a fusion mechanism to improve the micro-expression intensity classification accuracy. Evaluation using the recent benchmark micro-expression datasets; CASME, CASME II and SAMM, demonstrates that our hybrid framework can accurately classify the various intensity levels of each micro-expression. Further, comparison with the state-of-the-art methods reveals the superiority of our hybrid approach in classifying the micro-expressions accurately.

Selvarajah Thuseethan, Sutharshan Rajasegarar, John Yearwood

### A Multi-scale Recalibrated Approach for 3D Human Pose Estimation

The major challenge for 3D human pose estimation is the ambiguity in the process of regressing 3D poses from 2D. The ambiguity is introduced by the poor exploiting of the image cues especially the spatial relations. Previous works try to use a weakly-supervised method to constrain illegal spatial relations instead of leverage image cues directly. We follow the weakly-supervised method to train an end-to-end network by first detecting 2D body joints heatmaps, and then constraining 3D regression through 2D heatmaps. To further utilize the inherent spatial relations, we propose to use a multi-scale recalibrated approach to regress 3D pose. The recalibrated approach is integrated into the network as an independent module, and the scale factor is altered to capture information in different resolutions. With the additional multi-scale recalibration modules, the spatial information in pose is better exploited in the regression process. The whole network is fine-tuned for the extra parameters. The quantitative result on Human3.6m dataset demonstrates the performance surpasses the state-of-the-art. Qualitative evaluation results on the Human3.6m and in-the-wild MPII datasets show the effectiveness and robustness of our approach which can handle some complex situations such as self-occlusions.

Ziwei Xie, Hailun Xia, Chunyan Feng

### Gossiping the Videos: An Embedding-Based Generative Adversarial Framework for Time-Sync Comments Generation

Recent years have witnessed the successful rise of the time-sync “gossiping comment”, or so-called “Danmu” combined with online videos. Along this line, automatic generation of Danmus may attract users with better interactions. However, this task could be extremely challenging due to the difficulties of informal expressions and “semantic gap” between text and videos, as Danmus are usually not straightforward descriptions for the videos, but subjective and diverse expressions. To that end, in this paper, we propose a novel Embedding-based Generative Adversarial (E-GA) framework to generate time-sync video comments with “gossiping” behavior. Specifically, we first model the informal styles of comments via semantic embedding inspired by variational autoencoders (VAE), and then generate Danmus in a generatively adversarial way to deal with the gap between visual and textual content. Extensive experiments on a large-scale real-world dataset demonstrate the effectiveness of our E-GA framework.

Guangyi Lv, Tong Xu, Qi Liu, Enhong Chen, Weidong He, Mingxiao An, Zhongming Chen

### Self-paced Robust Deep Face Recognition with Label Noise

Deep face recognition has achieved rapid development but still suffers from occlusions, illumination and pose variations, especially for face identification. The success of deep learning models in face recognition lies in large-scale high quality face data with accurate labels. However, in real-world applications, the collected data may be mixed with severe label noise, which significantly degrades the generalization ability of deep models. To alleviate the impact of label noise on face recognition, inspired by curriculum learning, we propose a self-paced deep learning model (SPDL) by introducing a negative $$l_1$$ -norm regularizer for face recognition with label noise. During training, SPDL automatically evaluates the cleanness of samples in each batch and chooses cleaner samples for training while abandons the noisy samples. To demonstrate the effectiveness of SPDL, we use deep convolution neural network architectures for the task of robust face recognition. Experimental results show that our SPDL achieves superior performance on LFW, MegaFace and YTF when there are different levels of label noise.

Pengfei Zhu, Wenya Ma, Qinghua Hu

### Multi-Constraints-Based Enhanced Class-Specific Dictionary Learning for Image Classification

Sparse representation based on dictionary learning has been widely applied in recognition tasks. These methods only work well under the conditions that the training samples are uncontaminated or contaminated by a little noise. However, with increasing noise, these methods are not robust for image classification. To address the problem, we propose a novel multi-constraints-based enhanced class-specific dictionary learning (MECDL) approach for image classification, of which our dictionary learning framework is composed of shared dictionary and class-specific dictionaries. For the class-specific dictionaries, we apply Fisher discriminant criterion on them to get structured dictionary. And the sparse coefficients corresponding to the class-specific dictionaries are also introduced into Fisher-based idea, which could obtain discriminative coefficients. At the same time, we apply low-rank constraint into these dictionaries to remove the large noise. For the shared dictionary, we impose a low-rank constraint on it and the corresponding intra-class coefficients are encouraged to be as similar as possible. The experimental results on three well-known databases suggest that the proposed method could enhance discriminative ability of dictionary compared with state-of-art dictionary learning algorithms. Moreover, with the largest noise, our approach both achieves a high recognition rate of over 80%.

Ze Tian, Ming Yang

### Discovering Senile Dementia from Brain MRI Using Ra-DenseNet

With the rapid development of medical industry, there is a growing demand for disease diagnosis using machine learning technology. The recent success of deep learning brings it to a new height. This paper focuses on application of deep learning to discover senile dementia from brain magnetic resonance imaging (MRI) data. In this work, we propose a novel deep learning model based on Dense convolutional Network (DenseNet), denoted as ResNeXt Adam DenseNet (Ra-DenseNet), where each block of DenseNet is modified using ResNeXt and the adapter of DenseNet is optimized by Adam algorithm. It compresses the number of the layers in DenseNet from 121 to 40 by exploiting the key characters of ResNeXt, which reduces running complexity and inherits the advantages of Group Convolution technology. Experimental results on a real-world MRI data set show that our Ra-DenseNet achieves a classification accuracy with 97.1 $$\%$$ and outperforms the existing state-of-the-art baselines (i.e., LeNet, AlexNet, VGGNet, ResNet and DenseNet) dramatically.

Xiaobo Zhang, Yan Yang, Tianrui Li, Hao Wang, Ziqing He

### Granger Causality for Heterogeneous Processes

Discovery of temporal structures and finding causal interactions among time series have recently attracted attention of the data mining community. Among various causal notions graphical Granger causality is well-known due to its intuitive interpretation and computational simplicity. Most of the current graphical approaches are designed for homogeneous datasets i.e. the interacting processes are assumed to have the same data distribution. Since many applications generate heterogeneous time series, the question arises how to leverage graphical Granger models to detect temporal causal dependencies among them. Profiting from the generalized linear models, we propose an efficient Heterogeneous Graphical Granger Model (HGGM) for detecting causal relation among time series having a distribution from the exponential family which includes a wider common distributions e.g. Poisson, gamma. To guarantee the consistency of our algorithm we employ adaptive Lasso as a variable selection method. Extensive experiments on synthetic and real data confirm the effectiveness and efficiency of HGGM.

Sahar Behzadi, Kateřina Hlaváčková-Schindler, Claudia Plant

### Knowledge Graph Embedding with Order Information of Triplets

Knowledge graphs (KGs) are large scale multi-relational directed graph, which comprise a large amount of triplets. Embedding knowledge graphs into continuous vector space is an essential problem in knowledge extraction. Many existing knowledge graph embedding methods focus on learning rich features from entities and relations with increasingly complex feature engineering. However, they pay little attention on the order information of triplets. As a result, current methods could not capture the inherent directional property of KGs fully. In this paper, we explore knowledge graphs embedding from an ingenious perspective, viewing a triplet as a fixed length sequence. Based on this idea, we propose a novel recurrent knowledge graph embedding method RKGE. It uses an order keeping concatenate operation and a shared sigmoid layer to capture order information and discriminate fine-grained relation-related information. We evaluate our method on knowledge graph completion on benchmark data sets. Extensive experiments show that our approach outperforms state-of-the-art baselines significantly with relatively much lower space complexity. Especially on sparse KGs, RKGE achieves a 86.5% improvement at Hits@1 on FB15K-237. The outstanding results demonstrate that the order information of triplets is highly beneficial for knowledge graph embedding.

Jun Yuan, Neng Gao, Ji Xiang, Chenyang Tu, Jingquan Ge

### Knowledge Graph Rule Mining via Transfer Learning

Mining logical rules from knowledge graphs (KGs) is an important yet challenging task, especially when the relevant data is sparse. Transfer learning is an actively researched area to address the data sparsity issue, where a predictive model is learned for the target domain from that of a similar source domain. In this paper, we propose a novel method for rule learning by employing transfer learning to address the data sparsity issue, in which most relevant source KGs and candidate rules can be automatically selected for transfer. This is achieved by introducing a similarity in terms of embedding representations of entities, relations and rules. Experiments are conducted on some standard KGs. The results show that proposed method is able to learn quality rules even with extremely sparse data and its predictive accuracy outperformed state-of-the-art rule learners (AMIE+ and RLvLR), and link prediction systems (TransE and HOLE).

Pouya Ghiasnezhad Omran, Zhe Wang, Kewen Wang

### Knowledge Base Completion by Inference from Both Relational and Literal Facts

Knowledge base (KB) completion predicts new facts in a KB by performing inference from the existing facts, which is very important for expanding KBs. Most previous KB completion approaches infer new facts only from the relational facts (facts containing object properties) in KBs. Actually, there are large number of literal facts (facts containing datatype properties) besides the relational ones in most KBs; these literal facts are ignored in the previous approaches. This paper studies how to take the literal facts into account when making inference, aiming to further improve the performance of KB completion. We propose a new approach that consumes both relational and literal facts to predict new facts. Our approach extracts literal features from literal facts, and incorporates them with path-based features extracted from relational facts; a predictive model is then trained on all the features to infer new facts. Experiments on YAGO KB show that our approach outperforms the compared approaches that only take relational facts as input.

Zhichun Wang, Yong Huang

### EMT: A Tail-Oriented Method for Specific Domain Knowledge Graph Completion

The basic unit of knowledge graph is triplet, including head entity, relation and tail entity. Centering on knowledge graph, knowledge graph completion has attracted more and more attention and made great progress. However, these models are all verified by open domain data sets. When applied in specific domain case, they will be challenged by practical data distributions. For example, due to poor presentation of tail entities caused by their relation-oriented feature, they can not deal with the completion of enzyme knowledge graph. Inspired by question answering and rectilinear propagation of lights, this paper puts forward a tail-oriented method - Embedding for Multi-Tails knowledge graph (EMT). Specifically, it first represents head and relation in question space; then, finishes projection to answer one by tail-related matrix; finally, gets tail entity via translating operation in answer space. To overcome time-space complexity of EMT, this paper includes two improved models: EMT $$^v$$ and EMT $$^s$$ . Taking some optimal translation and composition models as baselines, link prediction and triplets classification on an enzyme knowledge graph sample and Kinship proved our performance improvements, especially in tails prediction.

Yi Zhang, Zhijuan Du, Xiaofeng Meng

### An Interpretable Neural Model with Interactive Stepwise Influence

Deep neural networks have achieved promising prediction performance, but are often criticized for the lack of interpretability, which is essential in many real-world applications such as health informatics and political science. Meanwhile, it has been observed that many shallow models, such as linear models or tree-based models, are fairly interpretable though not accurate enough. Motivated by these observations, in this paper, we investigate how to fully take advantage of the interpretability of shallow models in neural networks. To this end, we propose a novel interpretable neural model with Interactive Stepwise Influence (ISI) framework. Specifically, in each iteration of the learning process, ISI interactively trains a shallow model with soft labels computed from a neural network, and the learned shallow model is then used to influence the neural network to gain interpretability. Thus ISI could achieve interpretability in three aspects: importance of features, impact of feature value changes, and adaptability of feature weights in the neural network learning process. Experiments on both synthetic and two real-world datasets demonstrate that ISI could generate reliable interpretation with respect to the three aspects, as well as preserve prediction accuracy by comparing with other state-of-the-art methods.

Yin Zhang, Ninghao Liu, Shuiwang Ji, James Caverlee, Xia Hu

### Multivariate Time Series Early Classification with Interpretability Using Deep Learning and Attention Mechanism

Multivariate time-series early classification is an emerging topic in data mining fields with wide applications like biomedicine, finance, manufacturing, etc. Despite of some recent studies on this topic that delivered promising developments, few relevant works can provide good interpretability. In this work, we consider simultaneously the important issues of model performance, earliness, and interpretability to propose a deep-learning framework based on the attention mechanism for multivariate time-series early classification. In the proposed model, we used a deep-learning method to extract the features among multiple variables and capture the temporal relation that exists in multivariate time-series data. Additionally, the proposed method uses the attention mechanism to identify the critical segments related to model performance, providing a base to facilitate the better understanding of the model for further decision making. We conducted experiments on three real datasets and compared with several alternatives. While the proposed method can achieve comparable performance results and earliness compared to other alternatives, more importantly, it can provide interpretability by highlighting the important parts of the original data, rendering it easier for users to understand how the prediction is induced from the data.

En-Yu Hsu, Chien-Liang Liu, Vincent S. Tseng

### Backmatter

Weitere Informationen