main-content

The three-volume set LNAI 11439, 11440, and 11441 constitutes the thoroughly refereed proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019, held in Macau, China, in April 2019.

The 137 full papers presented were carefully reviewed and selected from 542 submissions. The papers present new ideas, original research results, and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, visualization, decision-making systems, and the emerging applications. They are organized in the following topical sections: classification and supervised learning; text and opinion mining; spatio-temporal and stream data mining; factor and tensor analysis; healthcare, bioinformatics and related topics; clustering and anomaly detection; deep learning models and applications; sequential pattern mining; weakly supervised learning; recommender system; social network and graph mining; data pre-processing and feature selection; representation learning and embedding; mining unstructured and semi-structured data; behavioral data mining; visual data mining; and knowledge graph and interpretable data mining.

### Multitask Learning for Sparse Failure Prediction

Sparsity is a problem which occurs inherently in many real-world datasets. Sparsity induces an imbalance in data, which has an adverse effect on machine learning and hence reducing the predictability. Previously, strong assumptions were made by domain experts on the model parameters by using their experience to overcome sparsity, albeit assumptions are subjective. Differently, we propose a multi-task learning solution which is able to automatically learn model parameters from a common latent structure of the data from related domains. Despite related, datasets commonly have overlapped but dissimilar feature spaces and therefore cannot simply be combined into a single dataset. Our proposed model, namely hierarchical Dirichlet process mixture of hierarchical beta process (HDP-HBP), learns tasks with a common model parameter for the failure prediction model using hierarchical Dirichlet process. Our model uses recorded failure history to make failure predictions on a water supply network. Multi-task learning is used to gain additional information from the failure records of water supply networks managed by other utility companies to improve prediction in one network. We achieve superior accuracy for sparse predictions compared to previous state-of-the-art models and have demonstrated the capability to be used in risk management to proactively repair critical infrastructure.

Simon Luo, Victor W. Chu, Zhidong Li, Yang Wang, Jianlong Zhou, Fang Chen, Raymond K. Wong

### Cost Sensitive Learning in the Presence of Symmetric Label Noise

In binary classification framework, we are interested in making cost sensitive label predictions in the presence of uniform/symmetric label noise. We first observe that 0–1 Bayes classifiers are not (uniform) noise robust in cost sensitive setting. To circumvent this impossibility result, we present two schemes; unlike the existing methods, our schemes do not require noise rate. The first one uses $$\alpha$$ -weighted $$\gamma$$ -uneven margin squared loss function, $$l_{\alpha , usq}$$ , which can handle cost sensitivity arising due to domain requirement (using user given $$\alpha$$ ) or class imbalance (by tuning $$\gamma$$ ) or both. However, we observe that $$l_{\alpha , usq}$$ Bayes classifiers are also not cost sensitive and noise robust. We show that regularized ERM of this loss function over the class of linear classifiers yields a cost sensitive uniform noise robust classifier as a solution of a system of linear equations. We also provide a performance bound for this classifier. The second scheme that we propose is a re-sampling based scheme that exploits the special structure of the uniform noise models and uses in-class probability estimates. Our computational experiments on some UCI datasets with class imbalance show that classifiers of our two schemes are on par with the existing methods and in fact better in some cases w.r.t. Accuracy and Arithmetic Mean, without using/tuning noise rate. We also consider other cost sensitive performance measures viz., F measure and Weighted Cost for evaluation.

Sandhya Tripathi, Nandyala Hemachandra

### Semantic Explanations in Ensemble Learning

A combination method is an integral part of an ensemble classifier. Existing combination methods determine the combined prediction of a new instance by relying on the predictions made by the majority of base classifiers. This can result in incorrect combined predictions when the majority predict the incorrect class. It has been noted that in group decision-making, the decision by the majority, if lacking consistency in the reasons for the decision provided by its members, could be less reliable than the minority’s decision with higher consistency in the reasons of its members. Based on this observation, in this paper, we propose a new combination method, EBCM, which considers the consistency of the features, i.e. explanations of individual predictions for generating ensemble classifiers. EBCM firstly identifies the features accountable for each base classifier’s prediction, and then uses the features to measure the consistency among the predictions. Finally, EBCM combines the predictions based on both the majority and the consistency of features. We evaluated the performance of EBCM with 16 real-world datasets and observed substantial improvement over existing techniques.

Md. Zahidul Islam, Jixue Liu, Lin Liu, Jiuyong Li, Wei Kang

### Latent Gaussian-Multinomial Generative Model for Annotated Data

Traditional generative models annotate images by multiple instances independently segmented, but these models have been becoming prohibitively expensive and time-consuming along with the growth of Internet data. Focusing on the annotated data, we propose a latent Gaussian-Multinomial generative model (LGMG), which generates the image-annotations using a multimodal probabilistic models. Specifically, we use a continuous latent variable with prior of Normal distribution as the latent representation summarizing the high-level semantics of images, and a discrete latent variable with prior of Multinomial distribution as the topics indicator for annotation. We compute the variational posteriors from a mapping structure among latent representation, topics indicator and image-annotation. The stochastic gradient variational Bayes estimator on variational objective is realized by combining the reparameterization trick and Monte Carlo estimator. Finally, we demonstrate the performance of LGMG on LabelMe in terms of held-out likelihood, automatic image annotation with the state-of-the-art models.

Shuoran Jiang, Yarui Chen, Zhifei Qin, Jucheng Yang, Tingting Zhao, Chuanlei Zhang

### Investigating Neighborhood Generation Methods for Explanations of Obscure Image Classifiers

Given the wide use of machine learning approaches based on opaque prediction models, understanding the reasons behind decisions of black box decision systems is nowadays a crucial topic. We address the problem of providing meaningful explanations in the widely-applied image classification tasks. In particular, we explore the impact of changing the neighborhood generation function for a local interpretable model-agnostic explanator by proposing four different variants. All the proposed methods are based on a grid-based segmentation of the images, but each of them proposes a different strategy for generating the neighborhood of the image for which an explanation is required. A deep experimentation shows both improvements and weakness of each proposed approach.

Riccardo Guidotti, Anna Monreale, Leonardo Cariaggi

### On Calibration of Nested Dichotomies

Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high.

Tim Leathart, Eibe Frank, Bernhard Pfahringer, Geoffrey Holmes

### Ensembles of Nested Dichotomies with Multiple Subset Evaluation

A system of nested dichotomies (NDs) is a method of decomposing a multiclass problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of NDs produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of NDs, regardless of whether they are employed as an individual model or in an ensemble setting.

Tim Leathart, Eibe Frank, Bernhard Pfahringer, Geoffrey Holmes

### Topic-Level Bursty Study for Bursty Topic Detection in Microblogs

Microblogging services, such as Twitter and Sina Weibo, have gained tremendous popularity in recent years. The huge amount of user-generated information is spread on microblogs. Such user-generated contents are a mixture of different bursty topics (e.g., breaking news) and general topics (e.g., user interests). However, it is challenging to discriminate between them due to the extremely diverse and noisy user-generated text. In this paper, we introduce a novel topic model to detect bursty topics from microblogs. Our model is based on an observation that different topics usually exhibit different bursty levels at a certain time. We propose to utilize the topic-level burstiness to differentiate bursty topics and non-bursty topics and particularly different bursty topics. Extensive experiments on a Sina Weibo Dataset show that our approach outperforms the baselines and the state-of-the-art method.

Yakun Wang, Zhongbao Zhang, Sen Su, Muhammad Azam Zia

### Adaptively Transfer Category-Classifier for Handwritten Chinese Character Recognition

Handwritten character recognition (HCR) plays an important role in real-world applications, such as bank check recognition, automatic sorting of postal mail, the digitization of old documents, intelligence education and so on. Last decades have witnessed the vast amount of interest and research on handwritten character recognition, especially in the competition of HCR tasks on the specific data sets. However, the HCR task in real-world applications is much more complex than the one in HCR competition, since everyone has their own handwriting style, e.g., the HCR task on middle school students is much harder than the one on adults. Therefore, state-of-the-art methods proposed by the competitors may fail. Moreover, there is not enough labeled data to train a good model, since manually labelling data is usually tedious and expensive. So one question arises, is it possible to transfer the knowledge from related domain data to train a good recognition model for the target domain, e.g., from the handwritten character data of adults to the one of students? To this end, we propose a new neural network structure for handwritten Chinese character recognition (HCCR), in which we try to make full use of a large amount of labeled source domain data and a small number of target domain data to learn the model parameters. Furthermore, we make a transfer on the category-classifier level, and adaptively assign different weights to category-classifiers according to the usefulness of source domain data. Finally, experiments constructed from three data sets demonstrate the effectiveness of our model compared with several state-of-the-art baselines.

Yongchun Zhu, Fuzhen Zhuang, Jingyuan Yang, Xi Yang, Qing He

### Syntax-Aware Representation for Aspect Term Extraction

Aspect Term Extraction (ATE) plays an important role in aspect-based sentiment analysis. Syntax-based neural models that learn rich linguistic knowledge have proven their effectiveness on ATE. However, previous approaches mainly focus on modeling syntactic structure, neglecting rich interactions along dependency arcs. Besides, these methods highly rely on results of dependency parsing and are sensitive to parsing noise. In this work, we introduce a syntax-directed attention network and a contextual gating mechanism to tackle these issues. Specifically, a graphical neural network is utilized to model interactions along dependency arcs. With the help of syntax-directed self-attention, it could directly operate on syntactic graph and obtain structural information. We further introduce a gating mechanism to synthesize syntactic information with structure-free features. This gate is utilized to reduce the effects of parsing noise. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on three widely used benchmark datasets.

Jingyuan Zhang, Guangluan Xu, Xinyi Wang, Xian Sun, Tinglei Huang

### Short Text Similarity Measurement Based on Coupled Semantic Relation and Strong Classification Features

Measuring the similarity between short texts is made difficult by the fact that two texts that are semantically related may not contain any words in common. In this paper, we propose a novel short text similarity measure which aggregates coupled semantic relation (CSR) and strong classification features (SCF) to provide a richer semantic context. On the one hand, CSR considers both intra-relation (i.e. co-occurrence of terms based on the modified weighting strategy) and inter-relation (i.e. dependency of terms via paths that connect linking terms) between a pair of terms. On the other hand, Based on SCF for similarity measure is established based on the idea that the more similar two texts are, the more features of strong classification they share. Finally, we combine the above two techniques to address the semantic sparseness of short text. We carry out extensive experiments on real world short texts. The results demonstrate that our method significantly outperforms baseline methods on several evaluation metrics.

Huifang Ma, Wen Liu, Zhixin Li, Xianghong Lin

### A Novel Hybrid Sequential Model for Review-Based Rating Prediction

Nowadays, the online interactions between users and items become diverse, and may include textual reviews as well as numerical ratings. Reviews often express various opinions and sentiments, which can alleviate the sparsity problem of recommendations to some extent. In this paper, we address the personalized review-based rating prediction problem, namely, leveraging users’ historical reviews and corresponding ratings to predict their future ratings for items they have not interacted with before. While much effort has been devoted to this challenging problem mainly to investigate how to jointly model natural text and user personalization, most of them ignored sequential characteristics hidden in users’ review and rating sequences. To bridge this gap, we propose a novel Hybrid Review-based Sequential Model (HRSM) to capture future trajectories of users and items. This is achieved by feeding both users’ and items’ review sequences to a Long Short-Term Memory (LSTM) model that captures dynamics, in addition to incorporating a more traditional low-rank factorization that captures stationary states. The experimental results on real public datasets demonstrate that our model outperforms the state-of-the-art baselines.

Yuanquan Lu, Wei Zhang, Pan Lu, Jianyong Wang

### Integrating Topic Model and Heterogeneous Information Network for Aspect Mining with Rating Bias

Recently, there is a surge of research on aspect mining, where the goal is to predict aspect ratings of shops with reviews and overall ratings. Traditional methods assumed that aspect ratings in a specific review text are of the same level, which equal to the corresponding overall rating. However, recent research reveals a different phenomenon: there is an obvious rating bias between aspect ratings and overall ratings. Moreover, these methods usually analyze aspect ratings of reviews with topic models at textual level, while totally ignore potentially structural information among multiple entities (users, shops, reviews), which can be captured by a Heterogeneous Information Network (HIN). In this paper, we present a novel model integrating Topic model and HIN for Aspect Mining with rating bias (called THAM). Firstly, a phrase-level LDA model is designed to extract topic distributions of reviews by using textual information. Secondly, making full use of structural information, we constructs a topic propagation network, and propagate topic distributions in this heterogeneous network. Finally, by setting review as the sharing factor, the two parts are integrated into a uniform optimization framework. Experimental results on two real datasets demonstrate that THAM achieves significant performance improvement, compared to the state of the arts.

Yugang Ji, Chuan Shi, Fuzhen Zhuang, Philip S. Yu

### Dependency-Aware Attention Model for Emotion Analysis for Online News

This paper studies the emotion responses evoked by the news articles. Most work focuses on extracting effective features from text for emotion classification. As a result, the valuable information contained in the emotion labels has been largely neglected. In addition, all words are potentially conveying affective meaning yet they are not equally significant. Traditional attention mechanism can be leveraged to extract important words according to the word-label co-occurrence pattern. However, words that are important to the less popular emotions are still difficult to identify. Because emotions have intrinsic correlations, by integrating such correlations into attention mechanism, emotion triggering words can be detected more accurately. In this paper, we come up with an emotion dependency-aware attention model, which makes the best use of label information and the emotion dependency prior knowledge. The experiments on two public news datasets have proved the effectiveness of the proposed model.

Xue Zhao, Ying Zhang, Xiaojie Yuan

### Multi-task Learning for Target-Dependent Sentiment Classification

Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that is informed by feature representation learnt for the related auxiliary task of passage-level sentiment classification. The auxiliary task uses a gated recurrent unit (GRU) and pools GRU states, followed by an auxiliary fully-connected layer that outputs passage-level predictions. In the main task, these GRUs contribute auxiliary per-token representations over and above word embeddings. The main task has its own, separate GRUs. The auxiliary and main GRUs send their states to a different fully connected layer, trained for the main task. Extensive experiments using two auxiliary datasets and three benchmark datasets (of which one is new, introduced by us) for the main task demonstrate that MTTDSC outperforms state-of-the-art baselines. Using word-level sensitivity analysis, we present anecdotal evidence that prior systems can make incorrect target-specific predictions because they miss sentiments expressed by words independent of target.

Divam Gupta, Kushagra Singh, Soumen Chakrabarti, Tanmoy Chakraborty

### SC-NER: A Sequence-to-Sequence Model with Sentence Classification for Named Entity Recognition

Named Entity Recognition (NER) is a basic task in Natural Language Processing (NLP). Recently, the sequence-to-sequence (seq2seq) model has been widely used in NLP task. Different from the general NLP task, 60% sentences in the NER task do not contain entities. Traditional seq2seq method cannot address this issue effectively. To solve the aforementioned problem, we propose a novel seq2seq model, named SC-NER, for NER task. We construct a classifier between the encoder and decoder. In particular, the classifier’s input is the last hidden state of the encoder. Moreover, we present the restricted beam search to improve the performance of the proposed SC-NER. To evaluate our proposed model, we construct the patent documents corpus in the communications field, and conduct experiments on it. Experimental results show that our SC-NER model achieves better performance than other baseline methods.

Yu Wang, Yun Li, Ziye Zhu, Bin Xia, Zheng Liu

### BAB-QA: A New Neural Model for Emotion Detection in Multi-party Dialogue

In this paper, we propose a new neural model BAB-QA to address the task of emotion detection in multi-party dialogues, which aims to detect emotion for each utterance in a dialogue among four label candidates: joy, sadness, anger, and neutral. A variety of models have been proposed to solve this task, but few of them manage to capture contextual information in a dialogue properly. Therefore, we adopt a Bi-directional Long Short-Term Memory network (BiLSTM) and an attention network to obtain representations of sentences and then apply a contextualization network to refine the sentence representations for classification. More importantly, we propose and incorporate a new module called QA network in our model, which is inspired by natural language inference tasks. This QA network enables our model to acquire better sentence encodings by modeling adjacent sentences in a dialogue as question-answer pairs. We evaluate our model in the benchmark EmotionX datasets provided by SocialNLP2018 and our model achieves the state-of-the-art performance.

Zilong Wang, Zhaohong Wan, Xiaojun Wan

### Unsupervised User Behavior Representation for Fraud Review Detection with Cold-Start Problem

Detecting fraud review is becoming extremely important in order to provide reliable information in cyberspace, in which, however, handling cold-start problem is a critical and urgent challenge since the case of cold-start fraud review rarely provides sufficient information for further assessing its authenticity. Existing work on detecting cold-start cases relies on the limited contents of the review posted by the user and a traditional classifier to make the decision. However, simply modeling review is not reliable since reviews can be easily manipulated. Also, it is hard to obtain high-quality labeled data for training the classifier. In this paper, we tackle cold-start problems by (1) using a user’s behavior representation rather than review contents to measure authenticity, which further (2) consider user social relations with other existing users when posting reviews. The method is completely (3) unsupervised. Comprehensive experiments on Yelp data sets demonstrate our method significantly outperforms the state-of-the-art methods.

Qian Li, Qiang Wu, Chengzhang Zhu, Jian Zhang, Wentao Zhao

### Gated Convolutional Encoder-Decoder for Semi-supervised Affect Prediction

Analyzing human reactions from text is an important step towards automated modeling of affective content. The variance in human perceptions and experiences leads to a lack of uniform, well-labeled, ground-truth datasets, hence, limiting the scope of neural supervised learning approaches. Recurrent and convolutional networks are popular for text classification and generation tasks, specifically, where large datasets are available; but are inefficient when dealing with unlabeled corpora. We propose a gated sequence-to-sequence, convolutional-deconvolutional autoencoding (GCNN-DCNN) framework for affect classification with limited labeled data. We show that compared to a vanilla CNN-DCNN network, gated networks improve performance for affect prediction as well as text reconstruction. We present a regression analysis comparing outputs of traditional learning models with information captured by hidden variables in the proposed network. Quantitative evaluation with joint, pre-trained networks, augmented with psycholinguistic features, reports highest accuracies for affect prediction, namely frustration, formality, and politeness in text.

Kushal Chawla, Sopan Khosla, Niyati Chhaya

### Complaint Classification Using Hybrid-Attention GRU Neural Network

Recently, a growing number of customers tend to complain about the services of different enterprises on the Internet to express their dissatisfaction. The correct classification of complaint texts is fairly important for enterprises to improve the efficiency of transaction processing. However, the existing literature lacks research on complaint texts. Most previous approaches of text classification fail to take advantage of the information of specific characters and negative emotions in complaint texts. Besides, some grammatical and semantic errors caused by violent mood swings of customers are another challenge. To address the problems, a novel model based on hybrid-attention GRU neural network (HATT-GRU) is proposed for complaint classification. The model constructs text vectors at character level, and it is able to extract sentiment features in complaint texts. Then a hybrid-attention mechanism is proposed to learn the importance of each character and sentiment feature, so that the model can focus on the features that contribute more to text classification. Finally, experiments are conducted on two complaint datasets from different industries. Experiments show that our model can achieve state-of-the-art results on both Chinese and English datasets compared to several text classification baselines.

Shuyang Wang, Bin Wu, Bai Wang, Xuesong Tong

### FGST: Fine-Grained Spatial-Temporal Based Regression for Stationless Bike Traffic Prediction

Currently, fully stationless bike sharing systems, such as Mobike and Ofo are becoming increasingly popular in both China and some big cities in the world. Different from traditional bike sharing systems that have to build a set of bike stations at different locations of a city and each station is associated with a fixed number of bike docks, there are no stations in stationless bike sharing systems. Thus users can flexibly check-out/return the bikes at arbitrary locations. Such a brand new bike-sharing mode better meets people’s short travel demand, but also poses new challenges for performing effective system management due to the extremely unbalanced bike usage demand in different areas and time intervals. Therefore, it is crucial to accurately predict the future bike traffic for helping the service provider rebalance the bikes timely. In this paper, we propose a Fine-Grained Spatial-Temporal based regression model named FGST to predict the future bike traffic in a stationless bike sharing system. We motivate the method via discovering the spatial-temporal correlation and the localized conservative rules of the bike check-out and check-in patterns. Our model also makes use of external factors like Point-Of-Interest(POI) informations to improve the prediction. Extensive experiments on a large Mobike trip dataset demonstrate that our approach outperforms baseline methods by a significant margin.

Hao Chen, Senzhang Wang, Zengde Deng, Xiaoming Zhang, Zhoujun Li

### Customer Segmentation Based on Transactional Data Using Stream Clustering

Customer Segmentation aims to identify groups of customers that share similar interest or behaviour. It is an essential tool in marketing and can be used to target customer segments with tailored marketing strategies. Customer segmentation is often based on clustering techniques. This analysis is typically performed as a snapshot analysis where segments are identified at a specific point in time. However, this ignores the fact that customer segments are highly volatile and segments change over time. Once segments change, the entire analysis needs to be repeated and strategies adapted. In this paper we explore stream clustering as a tool to alleviate this problem. We propose a new stream clustering algorithm which allows to identify and track customer segments over time. The biggest challenge is that customer segmentation often relies on the transaction history of a customer. Since this data changes over time, it is necessary to update customers which have already been incorporated into the clustering. We show how to perform this step incrementally, without the need for periodic re-computations. As a result, customer segmentation can be performed continuously, faster and is more scalable. We demonstrate the performance of our algorithm using a large real-life case study.

Matthias Carnein, Heike Trautmann

### Spatio-Temporal Event Detection from Multiple Data Sources

The proliferation of Internet-enabled smartphones has ushered in an era where events are reported on social media websites such as Twitter and Facebook. However, the short text nature of social media posts, combined with a large volume of noise present in such datasets makes event detection challenging. This problem can be alleviated by using other sources of information, such as news articles, that employ a precise and factual vocabulary, and are more descriptive in nature. In this paper, we propose Spatio-Temporal Event Detection (STED), a probabilistic model to discover events, their associated topics, time of occurrence, and the geospatial distribution from multiple data sources, such as news and Twitter. The joint modeling of news and Twitter enables our model to distinguish events from other noisy topics present in Twitter data. Furthermore, the presence of geocoordinates and timestamps in tweets helps find the spatio-temporal distribution of the events. We evaluate our model on a large corpus of Twitter and news data, and our experimental results show that STED can effectively discover events, and outperforms state-of-the-art techniques.

Aman Ahuja, Ashish Baghudana, Wei Lu, Edward A. Fox, Chandan K. Reddy

### Discovering All-Chain Set in Streaming Time Series

Time series chains discovery is an increasingly popular research area in time series mining. Previous studies on this topic process fixed-length time series. In this work, we focus on the issue of all-chain set mining over the streaming time series, where the all-chain set is a very important kind of the time series chains. We propose a novel all-chain set mining algorithm about streaming time series (ASMSTS) to solve this problem. The main idea behind the ASMSTS is to obtain the mining results at current time-tick based on the ones at the last one. This makes the method more efficiency in time and space than the Naïve. Our experiments illustrate that ASMSTS does indeed detect the all-chain set correctly and can offer dramatic improvements in speed and space cost over the Naive method.

Shaopeng Wang, Ye Yuan, Hua Li

### Hawkes Process with Stochastic Triggering Kernel

The impact from past to future is a vital feature in modelling time series data, which has been described by many point processes, e.g. the Hawkes process. In classical Hawkes process, the triggering kernel is assumed to be a deterministic function. However, the triggering kernel can vary with time due to the system uncertainty in real applications. To model this kind of variance, we propose a Hawkes process variant with stochastic triggering kernel, which incorporates the variation of triggering kernel over time. In this model, the triggering kernel is considered to be an independent multivariate Gaussian distribution. We derive and implement a tractable inference algorithm based on variational auto-encoder. Results from synthetic and real data experiments show that the underlying mean triggering kernel and variance band can be recovered, and using the stochastic triggering kernel is more accurate than the vanilla Hawkes process in capacity planning.

Feng Zhou, Yixuan Zhang, Zhidong Li, Xuhui Fan, Yang Wang, Arcot Sowmya, Fang Chen

### Concept Drift Based Multi-dimensional Data Streams Sampling Method

A summary can immensely reduce the time and space complexity of an algorithm. This concept is considered a research hotspot in the field of data stream mining. Data streams are characterized as having continuous data arrival, rapid speed, large scale, and cannot be completely stored in memory simultaneously. A summary is often formed in the memory to approximate the database query or data mining task. A sampling technique is a commonly used method for constructing data stream summaries. Traditional simple random sampling algorithms do not consider the conceptual drift of data distributions that change over time. Therefore, a challenging task is sampling the summary of the data distribution in multi-dimensional data streams of a concept drift. This study proposes a sampling algorithm that ensures the consistency of the data distribution with the data streams of the concept drift. First, probability statistics is used on the data stream cells in the reference window to obtain data distribution. A probability sampling is performed on the basis of this distribution. Second, the sliding window is used to continuously detect whether the data distribution has changed. If the data distribution does not change, then the original sampling data are maintained. Otherwise, the data distribution in the statistical window is restated to form a new sampling probability. The proposed algorithm ensures that the data distribution in the data profile is continually consistent with the population distribution. We compare our algorithm with the state-of-the-art algorithms on synthetic and real data sets. Experimental results demonstrate the effectiveness of our algorithm.

Ling Lin, Xiaolong Qi, Zhirui Zhu, Yang Gao

### Spatial-Temporal Multi-Task Learning for Within-Field Cotton Yield Prediction

Understanding and accurately predicting within-field spatial variability of crop yield play a key role in site-specific management of crop inputs such as irrigation water and fertilizer for optimized crop production. However, such a task is challenged by the complex interaction between crop growth and environmental and managerial factors, such as climate, soil conditions, tillage, and irrigation. In this paper, we present a novel Spatial-temporal Multi-Task Learning algorithm for within-field crop yield prediction in west Texas from 2001 to 2003. This algorithm integrates multiple heterogeneous data sources to learn different features simultaneously, and to aggregate spatial-temporal features by introducing a weighted regularizer to the loss functions. Our comprehensive experimental results consistently outperform the results of other conventional methods, and suggest a promising approach, which improves the landscape of crop prediction research fields.

Long H. Nguyen, Jiazhen Zhu, Zhe Lin, Hanxiang Du, Zhou Yang, Wenxuan Guo, Fang Jin

### Online Data Fusion Using Incremental Tensor Learning

Despite the advances in Structural Health Monitoring (SHM) which provides actionable information on the current and future states of infrastructures, it is still challenging to fuse data properly from heterogeneous sources for robust damage identification. To address this challenge, the sensor data fusion in SHM is formulated as an incremental tensor learning problem in this paper. A novel method for online data fusion from heterogeneous sources based on incrementally-coupled tensor learning has been proposed. When new data are available, decomposed component matrices from multiple tensors are updated collectively and incrementally. A case study in SHM has been developed for sensor data fusion and online damage identification, where the SHM data are formed as multiple tensors to which the proposed data fusion method is applied, followed by a one-class support vector machine for damage detection. The effectiveness of the proposed method has been validated through experiments using synthetic data and data obtained from a real-life bridge. The results have demonstrated that the proposed fusion method is more robust to noise, and able to detect, assess and localize damage better than the use of individual data sources.

Nguyen Lu Dang Khoa, Hongda Tian, Yang Wang, Fang Chen

### Co-clustering from Tensor Data

With the exponential growth of collected data in different fields like recommender system (user, items), text mining (document, term), bioinformatics (individual, gene), co-clustering which is a simultaneous clustering of both dimensions of a data matrix, has become a popular technique. Co-clustering aims to obtain homogeneous blocks leading to an easy simultaneous interpretation of row clusters and column clusters. Many approaches exist, in this paper we rely on the latent block model (LBM) which is flexible allowing to model different types of data matrices. We extend its use to the case of a tensor (3D matrix) data in proposing a Tensor LBM (TLBM) allowing different relations between entities. To show the interest of TLBM, we consider continuous and binary datasets. To estimate the parameters, a variational EM algorithm is developed. Its performances are evaluated on synthetic and real datasets to highlight different possible applications.

Rafika Boutalbi, Lazhar Labiod, Mohamed Nadif

### A Data-Aware Latent Factor Model for Web Service QoS Prediction

Accurately predicting unknown quality-of-service (QoS) data based on historical QoS records is vital in web service recommendation or selection. Recently, latent factor (LF) model has been widely and successfully applied to QoS prediction because it is accurate and scalable under many circumstances. Hence, state-of-the-art methods in QoS prediction are primarily based on LF. They improve the basic LF-based models by identifying the neighborhoods of QoS data based on some additional geographical information. However, the additional geographical information may be difficult to collect in considering information security, identity privacy, and commercial interests in real-world applications. Besides, they ignore the reliability of QoS data while unreliable ones are often mixed in. To address these issues, this paper proposes a data-aware latent factor (DALF) model to achieve highly accurate QoS prediction, where ‘data-aware’ means DALF can easily implement the predictions according to the characteristics of QoS data. The main idea is to incorporate a density peaks based clustering method into an LF model to discover the neighborhoods and unreliable ones of QoS data. Experimental results on two benchmark real-world web service QoS datasets demonstrate that DALF has better performance than the state-of-the-art models.

Di Wu, Xin Luo, Mingsheng Shang, Yi He, Guoyin Wang, Xindong Wu

### Keyword Extraction with Character-Level Convolutional Neural Tensor Networks

Keyword extraction is a critical technique in natural language processing. For this essential task we present a simple yet efficient architecture involving character-level convolutional neural tensor networks. The proposed architecture learns the relations between a document and each word within the document and treats keyword extraction as a supervised binary classification problem. In contrast to traditional supervised approaches, our model learns the distributional vector representations for both documents and words, which directly embeds semantic information and background knowledge without the need for handcrafted features. Most importantly, we model semantics down to the character level to capture morphological information about words, which although ignored in related literature effectively mitigates the unknown word problem in supervised learning approaches for keyword extraction. In the experiments, we compare the proposed model with several state-of-the-art supervised and unsupervised approaches for keyword extraction. Experiments conducted on two datasets attest the effectiveness of the proposed deep learning framework in significantly outperforming several baseline methods.

Zhe-Li Lin, Chuan-Ju Wang

### Neural Variational Matrix Factorization with Side Information for Collaborative Filtering

Probabilistic Matrix Factorization (PMF) is a popular technique for collaborative filtering (CF) in recommendation systems. The purpose of PMF is to find the latent factors for users and items by decomposing a user-item rating matrix. Most methods based on PMF suffer from data sparsity and result in poor latent representations of users and items. To alleviate this problem, we propose the neural variational matrix factorization (NVMF) model, a novel deep generative model that incorporates side information (features) of both users and items, to capture better latent representations of users and items for the task of CF recommendation. Our NVMF consists of two end-to-end variational autoencoder neural networks, namely user neural network and item neural network respectively, which are capable of learning complex nonlinear distributed representations of users and items through our proposed variational inference. We derive a Stochastic Gradient Variational Bayes (SGVB) algorithm to approximate the intractable posterior distributions. Experiments conducted on three publicly available datasets show that our NVMF significantly outperforms the state-of-the-art methods.

Teng Xiao, Hong Shen

### Variational Deep Collaborative Matrix Factorization for Social Recommendation

In this paper, we propose a Variational Deep Collaborative Matrix Factorization (VDCMF) algorithm for social recommendation that infers latent factors more effectively than existing methods by incorporating users’ social trust information and items’ content information into a unified generative framework. Unlike neural network-based algorithms, our model is not only effective in capturing the non-linearity among correlated variables but also powerful in predicting missing values under the robust collaborative inference. Specifically, we use variational auto-encoder to extract the latent representations of content and then incorporate them into traditional social trust factorization. We propose an efficient expectation-maximization inference algorithm to learn the model’s parameters and approximate the posteriors of latent factors. Experiments on two sparse datasets show that our VDCMF significantly outperforms major state-of-the-art CF methods for recommendation accuracy on common metrics.

Teng Xiao, Hui Tian, Hong Shen

### Time-Dependent Survival Neural Network for Remaining Useful Life Prediction

Remaining useful life (RUL) prediction has been a topic of practical interest in many fields involving preventive intervention, including manufacturing, medicine and healthcare. While most of the conventional approaches suffer from censored failures arising and statistically circumscribed assumptions, few attempts have been made to predict RUL by developing a survival learning machine that explores the underlying relationship between time-varying prognostic variables and failure-free survival probability. This requires a purely data-driven prediction approach, devoid of any a survival model and all statistical assumptions. To this end, we propose a time-dependent survival neural network that additively estimates a latent failure risk and performs multiple binary classifications to generate prognostics of RUL-specific probability. We train the neural network by a new survival learning criterion that minimizes the censoring Kullback-Leibler divergence and guarantees monotonicity of the resulting probability. Experiments on four datasets demonstrate the great promise of our approach in real applications.

Jianfei Zhang, Shengrui Wang, Lifei Chen, Gongde Guo, Rongbo Chen, Alain Vanasse

### ACNet: Aggregated Channels Network for Automated Mitosis Detection

Mitosis count is a critical predictor for invasive breast cancer grading using the Nottingham grading system. Nowadays mitotic count is mainly performed on high-power fields by pathologists manually under a microscope which is a highly tedious, time-consuming and subjective task. Therefore, it is necessary to develop automated mitosis detection methods that can save a large amount of time for pathologists and enhance the reliability of pathological examination. This paper proposes a powerful and effective novel framework named ACNet to count mitosis by aggregating auxiliary handcrafted features associated with tissue texture into CNN and jointly training neural network in an end-to-end way. Completed Local Binary Patterns (CLBP) features, Scale Invariant Feature Transform (SIFT) features and edge features are extracted and used in the classification task. In the process of network training, we expand the original training set by utilizing hard example mining, making our network focus on classification of the most difficult cases. We evaluate our ACNet by conducting experiments on the public MITOSIS dataset from MICCAI TUPAC 2016 competition and obtain state-of-the-art results.

Kaili Cheng, Jiarui Sun, Xuesong Chen, Yanbo Ma, Mengjie Bai, Yong Zhao

### Attention-Based Hierarchical Recurrent Neural Network for Phenotype Classification

This paper focuses on labeling phenotypes of patients in Intensive Care Unit given their records from admission to discharge. Recent works mainly rely on recurrent neural networks to process such temporal data. However, such prevalent practice, which leverages the last hidden state in the network for sequence representation, falls short when dealing with long sequences. Moreover, the memorizing strategy inside the recurrent units does not necessarily identify the key health records for each specific class. In this paper, we propose an attention-based hierarchical recurrent neural network (AHRNN) for phenotype classification. Our intuition is to remember all the past records by a hierarchical structure and make predictions based on crucial information in the label’s perspective. To the best of our knowledge, it is the first work of applying attention-based hierarchical neural networks to clinical time series prediction. Experimental results show that our model outperforms the state-of-the-arts in accuracy, time efficiency and model interpretability.

Nan Xu, Yanyan Shen, Yanmin Zhu

### Identifying Mobility of Drug Addicts with Multilevel Spatial-Temporal Convolutional Neural Network

Human identification according to their mobility patterns is of great importance for a wide spectrum of spatial-temporal based applications. For example, detecting drug addicts from normal residents in public security area. However, extracting and classifying user behaviors in massive amount of moving records is not trivial because of three challenges: (1) the complex transition records with noisy data; (2) the heterogeneity and sparsity of spatiotemporal trajectory features; and (3) extremely imbalanced data distribution of real world data. In this paper, we propose MST-CNN, a multi-level convolutional neural network with spatial and temporal features. We first embed the multiple factors on single trajectory level and then generate a behavior matrix to capture the user’s mobility patterns. Finally, a CNN module is used to extract various features with different filters and classify user type. We perform experiments on real-life mobility datasets provided by public security office, and extensive evaluation results demonstrate that our method obtains significant improvement performance in identification accuracy and outperform all baseline methods.

Canghong Jin, Haoqiang Liang, Dongkai Chen, Zhiwei Lin, Minghui Wu

### MC-eLDA: Towards Pathogenesis Analysis in Traditional Chinese Medicine by Multi-Content Embedding LDA

Traditional Chinese medicine (TCM) is well-known for its unique theory and effective treatment for complicated diseases. In TCM theory, “pathogenesis” is the cause of patient’s disease symptoms and is the basis for prescribing herbs. However, the essence of pathogenesis analysis is not well depicted by current researches. In this paper, we propose a novel topic model called Multi-Content embedding LDA (MC-eLDA), aiming to collaboratively capture the relationships of symptom-pathogenesis-herb triples, relationship between symptom-symptom, and relationship between herb-herb, which can be used in auxiliary diagnosis and treatment. By projecting discrete symptom words and herb words into two continuous semantic spaces respectively, the semantic equivalence can be encoded by exploiting the contiguity of their corresponding embeddings. Compared with previous models, topic coherence in each pathogenesis cluster can be promoted. Pathogenesis structures that previous topic modeling can not capture can be discovered by MC-eLDA. Then a herb prescription recommendation method is conducted based on MC-eLDA. Experimental results on two real-world TCM medical cases datasets demonstrate the effectiveness of the proposed model for analyzing pathogenesis as well as helping make diagnosis and treatment in clinical practice.

Ying Zhang, Wendi Ji, Haofen Wang, Xiaoling Wang, Jin Chen

### Enhancing the Healthcare Retrieval with a Self-adaptive Saturated Density Function

The proximity based information retrieval models usually use the same pre-define density function for all of terms in the collection to estimate their influence distribution. In healthcare domain, however, different terms in the same document have different influence distributions, the same term in different documents also has different influence distributions, and the pre-defined density function may not completely match the terms’ actual influence distributions. In this paper, we define a saturated density function to measure the best suitable density function that fits the given term’s influence distribution, and propose a self-adaptive approach on saturated density function building for each term in various circumstance. Particularly, our approach utilizing Gamma process is an unsupervised model with no requirements for external resources. Then, we construct a density based weighting method for the purpose of evaluating the effectiveness of our approach. Finally, we conduct our experiment on five standard CLEF and TREC datasets, and the experimental results show that our approach is promising and outperforms the pre-defined density functions in healthcare retrieval.

Yang Song, Wenxin Hu, Liang He, Liang Dou

### CytoFA: Automated Gating of Mass Cytometry Data via Robust Skew Factor Analzyers

Cytometry plays an important role in clinical diagnosis and monitoring of lymphomas, leukaemia, and AIDS. However, analysis of modern-day cytometric data is challenging. Besides its high-throughput nature and high dimensionality, these data typically exhibit complex characteristics such as multimodality, asymmetry, heavy-tailness and other non-normal characteristics. This paper presents cytoFA, a novel data mining approach capable of clustering and performing dimensionality reduction of high-dimensional cytometry data. Our approach is also robust against non-normal features including heterogeneity, skewness, and outliers (dead cells) that are typical in flow and mass cytometry data. Based on a statistical approach with well-studied properties, cytoFA adopts a mixtures of factor analyzers (MFA) to learn latent nonlinear low-dimensional representations of the data and to provide an automatic segmentation of the data into its comprising cell populations. We also introduce a double trimming approach to help identify atypical observations and to reduce computation time. The effectiveness of our approach is demonstrated on two large mass cytometry data, outperforming existing benchmark algorithms. We note that while the approach is motivated by cytometric data analysis, it is applicable and useful for modelling data from other fields.

Sharon X. Lee

### Consensus Graph Learning for Incomplete Multi-view Clustering

Multi-view data clustering is a fundamental task in current machine learning, known as multi-view clustering. Existing multi-view clustering methods mostly assume that each data instance is sampled in all views. However, in real-world applications, it is common that certain views miss number of data instances, resulting in incomplete multi-view data. This paper concerns the task of clustering of incomplete multi-view data. We propose a novel Graph-based Incomplete Multi-view Clustering (GIMC) to perform this task. GIMC can effectively construct a complete graph for each view with the help of other view(s), and automatically weight each constructed graph to learn a consensus graph, which gives the final clusters. An alternating iterative optimization algorithm is proposed to optimize the objective function. Experimental results on real-world datasets show that the proposed method outperforms state-of-the-art baseline methods markedly.

Wei Zhou, Hao Wang, Yan Yang

### Beyond Outliers and on to Micro-clusters: Vision-Guided Anomaly Detection

Given a heatmap for millions of points, what patterns exist in the distributions of point characteristics, and how can we detect them and separate anomalies in a way similar to human vision? In this paper, we propose a vision-guided algorithm, EagleMine, to recognize and summarize point groups in the feature spaces. EagleMine utilizes a water-level tree to capture group structures according to vision-based intuition at multiple resolutions, and adopts statistical hypothesis tests to determine the optimal groups along the tree. Moreover, EagleMine can identify anomalous micro-clusters (i.e., micro-size groups), which exhibit very similar behavior but deviate away from the majority. Extensive experiments are conducted for large graph scenario, and show that our method can recognize intuitive node groups as human vision does, and achieves the best performance in summarization compared to baselines. In terms of anomaly detection, EagleMine also outperforms state-of-the-art graph-based methods by significantly improving accuracy in synthetic and microblog datasets.

Wenjie Feng, Shenghua Liu, Christos Faloutsos, Bryan Hooi, Huawei Shen, Xueqi Cheng

### Clustering of Mixed-Type Data Considering Concept Hierarchies

Most clustering algorithms have been designed only for pure numerical or pure categorical data sets while nowadays many applications generate mixed data. It arises the question how to integrate various types of attributes so that one could efficiently group objects without loss of information. It is already well understood that a simple conversion of categorical attributes into a numerical domain is not sufficient since relationships between values such as a certain order are artificially introduced. Leveraging the natural conceptual hierarchy among categorical information, concept trees summarize the categorical attributes. In this paper we propose the algorithm ClicoT (CLustering mixed-type data Including COncept Trees) which is based on the Minimum Description Length (MDL) principle. Profiting of the conceptual hierarchies, ClicoT integrates categorical and numerical attributes by means of a MDL based objective function. The result of ClicoT is well interpretable since concept trees provide insights of categorical data. Extensive experiments on synthetic and real data set illustrate that ClicoT is noise-robust and yields well interpretable results in a short runtime.

Sahar Behzadi, Nikola S. Müller, Claudia Plant, Christian Böhm

### DMNAED: A Novel Framework Based on Dynamic Memory Network for Abnormal Event Detection in Enterprise Networks

Abnormal event detection is a crucial step towards discovering insider threat in enterprise networks. However, most existing anomaly detection approaches fail to capture latent correlations between disparate events in different domains due to the lack of a panoramic view or the disability of iterative attention. In light of this, this paper presents DMNAED, a novel framework based on dynamic memory network for abnormal event detection in enterprise networks. Inspired by question answering systems in natural language processing, DMNAED considers the event to be inspected as a question, and a sequence of multi-domain historical events serve as a context. Through an iterative attention process, DMNAED captures the context-question interrelation and aggregates relevant historical events to make more accurate anomaly detection. The experimental results on the CERT insider threat dataset r4.2 demonstrate that DMNAED exhibits more stable and superior performance compared with three baseline methods in identifying aberrant events in multi-user and multi-domain environments.

Xueshuang Ren, Liming Wang

### NeoLOD: A Novel Generalized Coupled Local Outlier Detection Model Embedded Non-IID Similarity Metric

Traditional generalized local outlier detection model (TraLOD) unifies the abstract methods and steps for classic local outlier detection approaches that are able to capture local behavior to improve detection performance compared to global outlier detection techniques. However, TraLOD still suffers from an inherent limitation for rational data: it uses traditional (Euclidean) similarity metric to pick out the context/reference set ignoring the effect of attribute structure. i.e., it is with the fundamental assumption that attributes and attribute values are independent and identically distributed (IID). To address the issue above, this paper introduces a novel Non-IID generalized coupled local outlier detection model (NeoLOD) and its instance (NeoLOF) for identifying local outliers with strong couplings. Concretely, this paper mainly includes three aspects: (i) captures the underlying attribute relations automatically by using the Bayesian network. (ii) proposes a novel Non-IID similarity metric to capture the intra-coupling and inter-coupling between attributes and attribute values. (iii) unifies the generalized local outlier detection model by incorporating the Non-IID similarity metric and instantiates a novel NeoLOF algorithm. Results obtained from 13 data sets show the proposed similarity metric can utilize the attribute structure effectively and NeoLOF can improve the performance in local outlier detection tasks.

Fan Meng, Yang Gao, Jing Huo, Xiaolong Qi, Shichao Yi

### Dynamic Anomaly Detection Using Vector Autoregressive Model

Identifying vandal users or attackers hidden in dynamic online social network data has been shown a challenging problem. In this work, we develop a dynamic attack/anomaly detection approach using a novel combination of the graph spectral features and the restricted Vector Autoregressive (rVAR) model. Our approach utilizes the time series modeling method on the non-randomness metric derived from the graph spectral features to capture the abnormal activities and interactions of individuals. Furthermore, we demonstrate how to utilize Granger causality test on the fitted rVAR model to identify causal relationships of user activities, which could be further translated to endogenous and/or exogenous influences for each individual’s anomaly measures. We conduct empirical evaluations on the Wikipedia vandal detection dataset to demonstrate efficacy of our proposed approach.

Yuemeng Li, Aidong Lu, Xintao Wu, Shuhan Yuan

### A Convergent Differentially Private k-Means Clustering Algorithm

Preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied in the interactive and the non-interactive settings. However, existing interactive differentially private clustering algorithms suffer from a non-convergence problem, i.e., these algorithms may not terminate without a predefined number of iterations. This problem severely impacts the clustering quality and the efficiency of the algorithm. To resolve this problem, we propose a novel iterative approach in the interactive settings which controls the orientation of the centroids movement over the iterations to ensure the convergence by injecting DP noise in a selected area. We prove that, in the expected case, our approach converges to the same centroids as Lloyd’s algorithm in at most twice the iterations of Lloyd’s algorithm. We perform experimental evaluations on real-world datasets to show that our algorithm outperforms the state-of-the-art of the interactive differentially private clustering algorithms with a guaranteed convergence and better clustering quality to meet the same DP requirement.

Zhigang Lu, Hong Shen