nach oben

2018 | Buch

Kapitel lesen Erstes Kapitel lesen

Advances in Knowledge Discovery and Data Mining

22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I

herausgegeben von: Dinh Phung, Vincent S. Tseng, Prof. Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This three-volume set, LNAI 10937, 10938, and 10939, constitutes the thoroughly refereed proceedings of the 22nd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2018, held in Melbourne, VIC, Australia, in June 2018.

The 164 full papers were carefully reviewed and selected from 592 submissions. The volumes present papers focusing on new ideas, original research results and practical development experiences from all KDD related areas, including data mining, data warehousing, machine learning, artificial intelligence, databases, statistics, knowledge engineering, visualization, decision-making systems and the emerging applications.

Inhaltsverzeichnis

Frontmatter

Classification and Supervised Machine Learning

Frontmatter

Classifier Risk Estimation Under Limited Labeling Resources

Evaluating a trained system is an important component of machine learning. Labeling test data for large scale evaluation of a trained model can be extremely time consuming and expensive. In this paper we propose strategies for estimating performance of a classifier using as little labeling resource as possible. Specifically, we assume a labeling budget is given and the goal is to get a good estimate of the classifier performance using the provided labeling budget. We propose strategies to get a precise estimate of classifier accuracy under this restricted labeling budget scenario. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over $$\mathbf {65\%}$$65% in several cases). In terms of labeling resource, the reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only $$1\%$$1% error is high as $$\mathbf {60\%}$$60% in some cases.

Anurag Kumar, Bhiksha Raj

Social Stream Classification with Emerging New Labels

As an important research topic with well-recognized practical values, classification of social streams has been identified with increasing popularity with social data, such as the tweet stream generated by Twitter users in chronological order. A salient, and perhaps also the most interesting, feature of such user-generated content is its never-failing novelty, which, unfortunately, would challenge most traditional pre-trained classification models as they are built based on fixed label set and would therefore fail to identify new labels as they emerge. In this paper, we study the problem of classification of social streams with emerging new labels, and propose a novel ensemble framework, integrating an instance-based learner and a label-based learner by completely-random trees. The proposed framework can not only classify known labels in the multi-label scenario, but also detect emerging new labels and update itself in the data stream. Extensive experiments on real-world stream data set from Weibo, a Chinese micro-blogging platform, demonstrate the superiority of our approach over the state-of-the-art methods.

Xin Mu, Feida Zhu, Yue Liu, Ee-Peng Lim, Zhi-Hua Zhou

Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz

Modeling Label Interactions in Multi-label Classification: A Multi-structure SVM Perspective

Multi-label classification has attracted much interest due to its wide applicability. Modeling label interactions and investigating their impact on classifier quality are crucial aspects of multi-label classification. In this paper, we propose a multi-structure SVM (called MSSVM) which allows the user to hypothesize multiple label interaction structures and helps to identify their importance in improving generalization performance. We design an efficient optimization algorithm to solve the proposed MSSVM. Extensive empirical evaluation provides fresh and interesting insights into the following questions: (a) How do label interactions affect multiple performance metrics typically used in multi-label classification? (b) Do higher order label interactions significantly impact a given performance metric for a particular dataset? (c) Can we make useful suggestions on the label interaction structure? and (d) Is it always beneficial to model label interactions in multi-label classification?

Anusha Kasinikota, P. Balamurugan, Shirish Shevade

Sentiment Classification Using Neural Networks with Sentiment Centroids

Neural networks (NN) have demonstrated powerful ability to extract text features automatically for sentiment classification in recent years. Although semantic and syntactic features are well studied, global category information has been mostly ignored within the NN based framework. Samples with the same sentiment category should have similar vectors in represent space. Motivated by this, we propose a novel global sentiment centroids based neural framework, which incorporates the sentiment category features. The centroids assist NN to extract discriminative category features from a global perspective. We apply our approach to several real large-scale sentiment-labeled datasets, and the extensive experiments show that our model not only obtains more powerful sentiment feature representations, but also achieves some state-of-the-art results with a simple neural network structure.

Maoquan Wang, Shiyun Chen, Liang He

Random Pairwise Shapelets Forest

Shapelet is a discriminative subsequence of time series. An advanced time series classification method is to integrate shapelet with random forest. However, it shows several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy and interpretability. Third, randomized ensemble causes interpretability declining. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It is more efficient due to omit of threshold search, and more effective due to including of additional information from different classes. Moreover, a discriminability metric, Decomposed Mean Decrease Impurity (DMDI), is proposed to identify influential region for every class. Extensive experiments show that RPSF improves the accuracy and training speed of shapelet forest. Case studies demonstrate the interpretability of our method.

Mohan Shi, Zhihai Wang, Jidong Yuan, Haiyang Liu

A Locally Adaptive Multi-Label k-Nearest Neighbor Algorithm

In the field of multi-label learning, ML-kNN is the first lazy learning approach and one of the most influential approaches. The main idea of it is to adapt k-NN method to deal with multi-label data, where maximum a posteriori rule is utilized to adaptively adjust decision boundary for each unseen instance. In ML-kNN, all test instances which get the same number of votes among k nearest neighbors have the same probability to be assigned a label, which may cause improper decision since it ignores the local difference of samples. Actually, in real world data sets, the instances with (or without) label l from different locations may have different numbers of neighbors with the label l. In this paper, we propose a locally adaptive Multi-Label k-Nearest Neighbor method to address this problem, which takes the local difference of samples into account. We show how a simple modification to the posterior probability expression, previously used in ML-kNN algorithm, allows us to take the local difference into account. Experimental results on benchmark data sets demonstrate that our approach has superior classification performance with respect to other kNN-based algorithms.

Dengbao Wang, Jingyuan Wang, Fei Hu, Li Li, Xiuzhen Zhang

Classification with Reject Option Using Conformal Prediction

In this paper, we propose a practically useful means of interpreting the predictions produced by a conformal classifier. The proposed interpretation leads to a classifier with a reject option, that allows the user to limit the number of erroneous predictions made on the test set, without any need to reveal the true labels of the test objects. The method described in this paper works by estimating the cumulative error count on a set of predictions provided by a conformal classifier, ordered by their confidence. Given a test set and a user-specified parameter k, the proposed classification procedure outputs the largest possible amount of predictions containing on average at most k errors, while refusing to make predictions for test objects where it is too uncertain. We conduct an empirical evaluation using benchmark datasets, and show that we are able to provide accurate estimates for the error rate on the test set.

Henrik Linusson, Ulf Johansson, Henrik Boström, Tuve Löfström

Target Learning: A Novel Framework to Mine Significant Dependencies for Unlabeled Data

To mine significant dependencies among predictiveattributes, much work has been carried out to learn Bayesian netwrok classifiers (BNC$$_\mathcal {T}$$Ts) from labeled training data set $$\mathcal {T}$$T. However, if BNC$$_\mathcal {T}$$T does not capture the “right” dependencies that would be most relevant to unlabeled testing instance, that will result in performance degradation. To address this issue we propose a novel framework, called target learning, that takes each unlabeled testing instance as a target and builds an “unstable” Bayesian model BNC$$_\mathcal {P}$$P for it. To make BNC$$_\mathcal {P}$$P and BNC$$_\mathcal {T}$$T complementary to each other and work efficiently in combination, the same learning strategy is applied to build them. Experimental comparison on 32 large data sets from UCI machine learning repository shows that, for BNCs with different degrees of dependence target learning always helps improve the generalization performance with minimal additional computation.

Limin Wang, Shenglei Chen, Musa Mammadov

Automatic Chinese Reading Comprehension Grading by LSTM with Knowledge Adaptation

Owing to the subjectivity of graders and the complexity of assessment standard, grading is a tough problem in the field of education. This paper presents an algorithm for automatic grading of open-ended Chinese reading comprehension questions. Due to the high complexity of feature engineering and the lack of consideration for word order in frequency based word embedding models, we utilize long-short term memory recurrent neural network to extract semantic feature in student answers automatically. In addition, we also try to impose the knowledge adaptation from web corpus to student answers, and represent the students’ responses to vectors which are fed into the memory network. Along this line, the workload of teacher and the subjectivity in reading comprehension grading can both be reduced obviously. What’s more, the automatic grading methods for Chinese reading comprehension will be more thorough. The experimental results on five Chinese and two English data sets demonstrate the superior performance over compared baselines.

Yuwei Huang, Xi Yang, Fuzhen Zhuang, Lishan Zhang, Shengquan Yu

Data Mining with Algorithmic Transparency

In this paper, we investigate whether decision trees can be used to interpret a black-box classifier without knowing the learning algorithm and the training data. Decision trees are known for their transparency and high expressivity. However, they are also notorious for their instability and tendency to grow excessively large. We present a classifier reverse engineering model that outputs a decision tree to interpret the black-box classifier. There are two major challenges. One is to build such a decision tree with controlled stability and size, and the other is that probing the black-box classifier is limited for security and economic reasons. Our model addresses the two issues by simultaneously minimizing sampling cost and classifier complexity. We present our empirical results on four real datasets, and demonstrate that our reverse engineering learning model can effectively approximate and simplify the black box classifier.

Yan Zhou, Yasmeen Alufaisan, Murat Kantarcioglu

Cost-Sensitive Reference Pair Encoding for Multi-Label Learning

Label space expansion for multi-label classification (MLC) is a methodology that encodes the original label vectors to higher dimensional codes before training and decodes the predicted codes back to the label vectors during testing. The methodology has been demonstrated to improve the performance of MLC algorithms when coupled with off-the-shelf error-correcting codes for encoding and decoding. Nevertheless, such a coding scheme can be complicated to implement, and cannot easily satisfy a common application need of cost-sensitive MLC—adapting to different evaluation criteria of interest. In this work, we show that a simpler coding scheme based on the concept of a reference pair of label vectors achieves cost-sensitivity more naturally. In particular, our proposed cost-sensitive reference pair encoding (CSRPE) algorithm contains cluster-based encoding, weight-based training and voting-based decoding steps, all utilizing the cost information. Furthermore, we leverage the cost information embedded in the code space of CSRPE to propose a novel active learning algorithm for cost-sensitive MLC. Extensive experimental results verify that CSRPE performs better than state-of-the-art algorithms across different MLC criteria. The results also demonstrate that the CSRPE-backed active learning algorithm is superior to existing algorithms for active MLC, and further justify the usefulness of CSRPE.

Yao-Yuan Yang, Kuan-Hao Huang, Chih-Wei Chang, Hsuan-Tien Lin

Fuzzy Integral Optimization with Deep Q-Network for EEG-Based Intention Recognition

Non-invasive brain-computer interface using electroencephalography (EEG) signals promises a convenient approach empowering humans to communicate with and even control the outside world only with intentions. Herein, we propose to analyze EEG signals using fuzzy integral with deep reinforcement learning optimization to aggregate two aspects of information contained within EEG signals, namely local spatio-temporal and global temporal information, and demonstrate its benefits in EEG-based human intention recognition tasks. The EEG signals are first transformed into a 3D format preserving both topological and temporal structures, followed by distinctive local spatio-temporal feature extraction by a 3D-CNN, as well as the global temporal feature extraction by an RNN. Next, a fuzzy integral with respect to the optimized fuzzy measures with deep reinforcement learning is utilized to integrate the two extracted information and makes a final decision. The proposed approach retains the topological and temporal structures of EEG signals and merges them in a more efficient way. Experiments on a public EEG-based movement intention dataset demonstrate the effectiveness and superior performance of our proposed method.

Dalin Zhang, Lina Yao, Sen Wang, Kaixuan Chen, Zheng Yang, Boualem Benatallah

Heterogeneous Domain Adaptation Based on Class Decomposition Schemes

This paper introduces a novel classification algorithm for heterogeneous domain adaptation. The algorithm projects both the target and source data into a common feature space of the class decomposition scheme used. The distinctive features of the algorithm are: (1) it does not impose any assumptions on the data other than sharing the same class labels; (2) it allows adaptation of multiple source domains at once; and (3) it can help improving the topology of the projected data for class separability. The algorithm provides two built-in classification rules and allows applying any other classification model.

Firat Ismailoglu, Evgueni Smirnov, Ralf Peeters, Shuang Zhou, Pieter Collins

A Deep Neural Spoiler Detection Model Using a Genre-Aware Attention Mechanism

The fast-growing volume of online activity and user-generated content increases the chances of users being exposed to spoilers. To address this problem, several spoiler detection models have been proposed. However, most of the previous models rely on hand-crafted domain-specific features, which limits the generalizability of the models. In this paper, we propose a new deep neural spoiler detection model that uses a genre-aware attention mechanism. Our model consists of a genre encoder and a sentence encoder. The genre encoder is used to extract a genre feature vector from given genres using a convolutional neural network. The sentence encoder is used to extract sentence feature vectors from a given sentence using a bi-directional gated recurrent unit. We also propose a genre-aware attention layer based on the attention mechanism that utilizes genre information for detecting spoilers which vary by genres. Using a sentence feature, our proposed model determines whether a given sentence is a spoiler. The experimental results on a spoiler dataset show that our proposed model which does not use hand-crafted features outperforms the state-of-the-art spoiler detection baseline models. We also conduct a qualitative analysis on the relations between spoilers and genres, and highlight the results through an attention weight visualization.

Buru Chang, Hyunjae Kim, Raehyun Kim, Deahan Kim, Jaewoo Kang

Robust Semi-Supervised Learning on Multiple Networks with Noise

Graph-regularized semi-supervised learning has been effectively used for classification when (i) data instances are connected through a graph, and (ii) labeled data is scarce. Leveraging multiple relations (or graphs) between the instances can improve the prediction performance, however noisy and/or irrelevant relations may deteriorate the performance. As a result, an effective weighing scheme needs to be put in place for robustness.In this paper, we propose iMUNE, a robust and effective approach for multi-relational graph-regularized semi-supervised classification, that is immune to noise. Under a convex formulation, we infer weights for the multiple graphs as well as a solution (i.e., labeling). We provide a careful analysis of the inferred weights, based on which we devise an algorithm that filters out irrelevant and noisy graphs and produces weights proportional to the informativeness of the remaining graphs. Moreover, iMUNE is linearly scalable w.r.t. the number of edges. Through extensive experiments on various real-world datasets, we show the effectiveness of our method, which yields superior results under different noise models, and under increasing number of noisy graphs and intensity of noise, as compared to a list of baselines and state-of-the-art approaches.

Junting Ye, Leman Akoglu

-Distance Weighted Support Vector Regression

We propose a novel support vector regression approach called $$\varepsilon $$ε-Distance Weighted Support Vector Regression ($$\varepsilon $$ε-DWSVR). $$\varepsilon $$ε-DWSVR specifically addresses a challenging issue in support vector regression: how to deal with the situation when the distribution of the internal data in the $$\varepsilon $$ε-tube is different from that of the boundary data containing support vectors. The proposed $$\varepsilon $$ε-DWSVR optimizes the minimum margin and the mean of functional margin simultaneously to tackle this issue. To solve the new optimization problem arising from $$\varepsilon $$ε-DWSVR, we adopt dual coordinate descent (DCD) with kernel functions for medium-scale problems and also employ averaged stochastic gradient descent (ASGD) to make $$\varepsilon $$ε-DWSVR scalable to larger problems. We report promising results obtained by $$\varepsilon $$ε-DWSVR in comparison with five popular regression methods on sixteen UCI benchmark datasets.

Ge Ou, Yan Wang, Lan Huang, Wei Pang, George Macleod Coghill

Healthcare, BioInformatics and Related Topics (Application)

Frontmatter

Corrosion Prediction on Sewer Networks with Sparse Monitoring Sites: A Case Study

Sewer corrosion is a widespread and costly issue for water utilities. Knowing the corrosion status of a sewer network could help the water utility to improve efficiency and save costs in sewer pipe maintenance and rehabilitation. However, inspecting the corrosion status of all sewer pipes is impractical. To prioritize sewer pipes in terms of corrosion risk, the water utility requires a corrosion prediction model built on influential factors that cause sewer corrosion, such as hydrogen sulphide (H$$_2$$2S) and temperature. Unfortunately, monitoring sites of influential factors are very sparse on the sewer network such that a reliable prediction has often been hampered by insufficient observations – It is a challenge to predict H$$_2$$2S distribution and sewer corrosion levels on the entire sewer network with a limited number of monitoring sites. This work leverages a Bayesian nonparametric method, Gaussian Process, to integrate the physical model developed by domain experts, the sparse H$$_2$$2S and temperature monitored records, and the sewer geometry to predict corrosion risk levels on the entire sewer network. A case study has been conducted on a real data set of a water utility in Australia. The evaluation results well demonstrate the effectiveness of the model and admit promising applications for water utilities, including prioritizing high corrosion areas and recommending chemical dosing profiles.

Jianjia Zhang, Bin Li, Xuhui Fan, Yang Wang, Fang Chen

CAPED: Context-Aware Powerlet-Based Energy Disaggregation

Energy disaggregation is the task of decomposing a household’s total electricity consumption into individual appliances, which becomes increasingly important in energy reservation research nowadays. In this paper, we propose a novel algorithm taking the context of disaggregation task into consideration. First, we design a new method to efficiently extract each appliance’s typical consumption patterns, i.e. powerlets. When performing the disaggregation task, we model it as an optimization problem and incorporate context information into the cost function. Experiments on two public datasets have demonstrated the superiority of our algorithm over the state-of-the-art work. The mean improvements of disaggregation accuracy are about 13.7% and 4.8%.

Jingyue Gao, Yasha Wang, Xu Chu, Yuanduo He, Ziqing Mao

Rolling Forecasting Forward by Boosting Heterogeneous Kernels

The problem discussed in this paper stems from a project of cellular network traffic prediction, the primary step of network planning striving to serve the continuously soaring network traffic best with limited resource. The traffic prediction emphasizes two aspects: (1) how to exploit the potential value of physical and electronic properties for tens of thousands of wireless stations, which may partly determine the allocation of traffic load in some intricate way; (2) the lack of sufficient and high-quality historical records, for the appropriate training of long-term predictions, further aggravated by frequent reconfigurations in daily operation. To solve this problem, we define a general framework to accommodate several variants of multi-step forecasting, via decomposing the problem into a series of single-step vector-output regression tasks. They can further be augmented by miscellaneous attributive information, in the form of boosted multiple kernels. Experiments on multiple telecom datasets show that the solution outperforms conventional time series methods on accuracy, especially for long horizons. Those attributes describing the macroscopic factors, such as the network type, topology, locations, are significantly helpful for longer horizons, whereas the immediate values in the near future are mainly determined by their recent records.

Di Zhang, Yunquan Zhang, Qiang Niu, Xingbao Qiu

IDLP: A Novel Label Propagation Framework for Disease Gene Prioritization

Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. We conduct extensive experiments over OMIM datasets, and our proposed method IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches.

Yaogong Zhang, Yuan Wang, Jiahui Liu, Xiaohu Liu, Yuxiang Hong, Xin Fan, Yalou Huang

Deep Learning for Forecasting Stock Returns in the Cross-Section

Many studies have been undertaken by using machine learning techniques, including neural networks, to predict stock returns. Recently, a method known as deep learning, which achieves high performance mainly in image recognition and speech recognition, has attracted attention in the machine learning field. This paper implements deep learning to predict one-month-ahead stock returns in the cross-section in the Japanese stock market and investigates the performance of the method. Our results show that deep neural networks generally outperform shallow neural networks, and the best networks also outperform representative machine learning models. These results indicate that deep learning shows promise as a skillful machine learning method to predict stock returns in the cross-section.

Masaya Abe, Hideki Nakayama

Vine Copula-Based Asymmetry and Tail Dependence Modeling

Financial variables such as asset returns in the massive market contain various hierarchical and horizontal relationships that form complicated dependence structures. Modeling these structures is challenging due to the stylized facts of market data. Many research works in recent decades showed that copula is an effective method to describe relations among variables. Vine structures were introduced to represent the decomposition of multivariate copula functions. However, the model construction of vine structures is still a tough problem owing to the geometrical data, conditional independent assumptions and the stylized facts. In this paper, we introduce a new bottom-to-up method to construct regular vine structures and applies the model to 12 currencies over 16 years as a case study to analyze the asymmetric and fat tail features. The out-of-sample performance of our model is evaluated by Value at Risk, a widely used industrial benchmark. The experimental results show that our model and its intrinsic design significantly outperform industry baselines, and provide financially interpretable knowledge and profound insights into the dependence structures of multi-variables with complex dependencies and characteristics.

Jia Xu, Longbing Cao

Detecting Forged Alcohol Non-invasively Through Vibrational Spectroscopy and Machine Learning

Alcoholic spirits are a common target for counterfeiting and adulteration, with potential costs to public health, the taxpayer and brand integrity. Current methods to authenticate spirits include examinations of superficial appearance and consistency, or require the tester to open the bottle and remove a sample. The former is inexact, while the latter is not suitable for widespread screening or for high-value spirits, which lose value once opened. We study whether non-invasive near infrared spectroscopy, in combination with traditional and time series classification methods, can correctly classify the alcohol content (a key factor in determining authenticity) of synthesised spirits sealed in real bottles. Such an experimental setup could allow for a portable, cheap to operate, and fast authentication device. We find that ethanol content can be classified with high accuracy, however methanol content proved difficult with the algorithms evaluated.

James Large, E. Kate Kemsley, Nikolaus Wellner, Ian Goodall, Anthony Bagnall

Research and Application of Mapping Relationship Based on Learning Attention Mechanism

The study on the interactions between different or the same variables of financial markets is an interesting topic. Many efforts have been devoted to investigate this issue. However, there has been little work studying the relationship of the various attributes within the stock, while this relationship is essential for us to have a deeper understanding of stock’s internal mechanisms. So in this paper, we explored using sequence-to-sequence model for extracting the relationship of arbitrarily two properties of the stock. We not only give a qualitative description of the relationship between stock’s attributes, but also quantify the relationship through the model. The experimental results show that there are certain correlations between the internal attributes of the stock, among which the correlation between $$ Close \& \% Tuv$$Close&%Tuv and $$ \% Chg \& \% Tuv$$%Chg&%Tuv are more prominent. In addition, we also conducted the anomaly detection on network public opinion information, and found out the starting points of abnormal events combined with the network news information. By comparing the starting points of the events and the changes in the relationship between stock attributes, we concluded that there is a certain regularity between them.

Wanwan Jiang, Lingyu Xu, Jie Yu, Gaowei Zhang

Human Identification via Unsupervised Feature Learning from UWB Radar Data

This paper presents an automated approach to automatically distinguish the identity of multiple residents in smart homes. Without using any intrusive video surveillance devices or wearable tags, we achieve the goal of human identification through properly processing and analyzing the received signals from the ultra-wideband (UWB) radar installed in indoor environments. Because the UWB signals are very noisy and unstable, we employ unsupervised feature learning techniques to automatically learn local, discriminative features that can incorporate intra-class variations of the same identity, and yet reflect differences in distinguishing different human identities. The learned features are then used to train an SVM classifier and recognize the identity of residents. We validate our proposed solution via extensive experiments using real data collected in real-life situations. Our findings show that feature learning based on K-means clustering, coupled with whitening and pooling, achieves the highest accuracy, when only limited training data is available. This shows that the proposed feature learning and classification framework combined with the UWB radar technology provides an effective solution to human identification in multi-residential smart homes.

Jie Yin, Son N. Tran, Qing Zhang

Prescriptive Analytics Through Constrained Bayesian Optimization

Prescriptive analytics leverages predictive data mining algorithms to prescribe appropriate changes to alter a predicted outcome of undesired class to a desired one. As an example, based on the conversation of a reformed addict on a message board, prescriptive analytics may predict the intervention required. We develop a novel prescriptive analytics solution by formulating a constrained Bayesian optimization problem to find the smallest change that we need to make on an actionable set of features so that with sufficient confidence an instance can be changed from an undesirable class to the desirable class. We use two public health dataset, multi-year CDC dataset on disease prevalence across the 50 states of USA and alcohol related data from Reddit to demonstrate the usefulness of our results.

Haripriya Harikumar, Santu Rana, Sunil Gupta, Thin Nguyen, Ramachandra Kaimal, Svetha Venkatesh

Neighborhood Constraint Matrix Completion for Drug-Target Interaction Prediction

Identifying drug-target interactions is an important step in drug discovery, but only a small part of the interactions have been validated, and the experimental determination process is both expensive and time-consuming. Therefore, there is a strong demand to develop the computational methods, which can predict potential drug-target interactions to guide the experimental verification. In this paper, we propose a novel algorithm for drug-target interaction prediction, named Neighborhood Constraint Matrix Completion (NCMC). Different from previous methods, for existing drug-target interaction network, we exploit the low rank property of its adjacency matrix to predict new interactions. Moreover, with the rarity of known entries, we introduce the similarity information of drugs/targets, and propose the neighborhood constraint to regularize the unknown cases. Furthermore, we formulate the whole task into a convex optimization problem and solve it by a fast proximal gradient descent framework, which can quickly converge to a global optimal solution. Finally, we extensively evaluated our method on four real datasets, and NCMC demonstrated its effectiveness compared with the other five state-of-the-art approaches.

Xin Fan, Yuxiang Hong, Xiaohu Liu, Yaogong Zhang, Maoqiang Xie

Detecting Hypopnea and Obstructive Apnea Events Using Convolutional Neural Networks on Wavelet Spectrograms of Nasal Airflow

We present a novel approach for detecting hypopnea and obstructive apnea events during sleep, using a single channel nasal airflow from polysomnography recordings, applying a Convolutional Neural Network (CNN) to a 2-D image wavelet spectrogram of the nasal signal. We compare this approach to directly training a 1-D CNN on the raw nasal airflow signal. The evaluation was conducted on a large dataset consisting of 69,264 examples from 1,507 subjects. Our results showed that both approaches achieved good accuracy, with the 2-D CNN outperforming the 1-D CNN. The higher accuracy and the less complex architecture of the 2-D CNN show that converting biological signals into spectrograms and using them in conjunction with CNNs is a promising method for sleep apnea recognition.

Stephen McCloskey, Rim Haidar, Irena Koprinska, Bryn Jeffries

Deep Ensemble Classifiers and Peer Effects Analysis for Churn Forecasting in Retail Banking

Modern customer analytics offers retailers a variety of unprecedented opportunities to enhance customer intelligence solutions by tracking individual clients and their peers and studying clientele behavioral patterns. While telecommunication providers have been actively utilizing peer network data to improve their customer analytics for a number of years, there yet exists a very limited knowledge on the peer effects in retail banking. We introduce modern deep learning concepts to quantify the impact of social network variables on bank customer attrition. Furthermore, we propose a novel deep ensemble classifier that systematically integrates predictive capabilities of individual classifiers in a meta-level model, by efficiently stacking multiple predictions using convolutional neural networks. We evaluate our methodology in application to customer retention in a retail financial institution in Canada.

Yuzhou Chen, Yulia R. Gel, Vyacheslav Lyubchich, Todd Winship

GBTM: Graph Based Troubleshooting Method for Handling Customer Cases Using Storage System Log

Present day computing environments consist of different bits of hardware and software that are associated with each other in a complex way. Hence, in case of failures of such system, it is very difficult to detect the exact module which has caused the problem. In such a situation, an automated technique which can pin down to (at least) a set of modules that may be responsible for the failure would be very useful for support engineers. This paper makes an important step towards that direction. We propose a graph based troubleshooting methodology exploring storage system logs (EMS) of around 4500 customer cases to troubleshoot customer problems. We provide a ranked list of modules to the support engineers which can significantly narrow down the troubleshooting process for around 95% cases with only 10% false positive rate whereas the competing baseline MonitorRank covers only 74% cases with 23% false positive rate.

Subhendu Khatuya, Ajay Bakhshi, Jayanta Basak, Niloy Ganguly, Bivas Mitra

Fusion of Modern and Tradition: A Multi-stage-Based Deep Network Approach for Head Detection

Detecting humans in video is becoming essential for monitoring crowd behavior. Head detection is proven as a promising way to realize detecting and tracking crowd. In this paper, a novel learning strategy, called Deep Motion Information Network (abbr. as DMIN) is proposed for head detection. The concept of DMIN is to borrow the traditional well-developed head detection approaches which are composed of multiple stages, and then replace each stages in the pipeline into a cascade of sub-deep-networks to simulate the function of each stage. This learning strategy can lead to many benefits such as preventing many trial and error in designing deep networks, achieving global optimization for each stage, and reducing the amount of training dataset needed. The proposed approach is validated using the PETS2009 dataset. The results show the proposed approach can achieve impressive speedup of the process in addition to significant improvement in recall rates. A very high F-score of 85% is achieved using the proposed network that is by far higher than other methods proposed in literature.

Fu-Chun Hsu, Chih-Chieh Hung

Learning Treatment Regimens from Electronic Medical Records

Appropriate treatment regimens play a vital role in improving patient health status. Although some achievements have been made, few of the recent studies of learning treatment regimens have exploited different kinds of patient information due to the difficulty in adopting heterogeneous data to many data mining methods. Moreover, current studies seem too rigid with fixed intervals of treatment periods corresponding to the varying lengths of hospital stay. To this end, this work proposes a generic data-driven framework which can derive group-treatment regimens from electronic medical records by utilizing a mixed-variate restricted Boltzmann machine and incorporating medical domain knowledge. We conducted experiments on coronary artery disease as a case study. The obtained results show that the framework is promising and capable of assisting physicians in making clinical decisions.

Khanh Hung Hoang, Tu Bao Ho

Human, Behaviour and Interactions (Application)

Frontmatter

Mining POI Alias from Microblog Conversations

In location-based analysis for microblogs, it is important to know if two toponyms refer to the same point-of-interest, i.e., alias. However, existing online knowledge bases are often incomplete or inaccurate for toponym alias data, especially for those used in informal conversations. In this paper, we propose a method for extracting compatible toponyms from microblog conversations. We first extract a number of coordinate-associated toponyms, then use compatibility measures to identify compatible toponyms. We propose three compatibility measures, namely, geographical closeness, surface name similarity, and association similarity. We show that by combining these measures and using particle swarm optimization for weight tuning, we can reach a high matching accuracy. The finding of this paper can be useful for improving location-based analysis as well as extending existing knowledge bases.

Yihong Zhang, Lina Yao

DyPerm: Maximizing Permanence for Dynamic Community Detection

In this paper, we propose $$\mathsf {DyPerm}$$DyPerm, the first dynamic community detection method which optimizes a novel community scoring metric, called permanence. $$\mathsf {DyPerm}$$DyPerm incrementally modifies the community structure by updating those communities where the editing of nodes and edges has been performed, keeping the rest of the network unchanged. We present strong theoretical guarantees to show how/why mere updates on the existing community structure lead to permanence maximization in dynamic networks, which in turn decreases the computational complexity drastically. Experiments on both synthetic and six real-world networks with given ground-truth community structure show that $$\mathsf {DyPerm}$$DyPerm achieves (on average) 35% gain in accuracy (based on NMI) compared to the best method among four baseline methods. $$\mathsf {DyPerm}$$DyPerm also turns out to be 15 times faster than its static counterpart.

Prerna Agarwal, Richa Verma, Ayush Agarwal, Tanmoy Chakraborty

Mining User Behavioral Rules from Smartphone Data Through Association Analysis

The increasing popularity of smart mobile phones and their powerful sensing capabilities have enabled the collection of rich contextual information and mobile phone usage records through the device logs. This paper formulates the problem of mining behavioral association rules of individual mobile phone users utilizing their smartphone data. Association rule learning is the most popular technique to discover rules utilizing large datasets. However, it is well-known that a large proportion of association rules generated are redundant. This redundant production makes not only the rule-set unnecessarily large but also makes the decision making process more complex and ineffective. In this paper, we propose an approach that effectively identifies the redundancy in associations and extracts a concise set of behavioral association rules that are non-redundant. The effectiveness of the proposed approach is examined by considering the real mobile phone datasets of individual users.

Iqbal H. Sarker, Flora D. Salim

A Context-Aware Evaluation Method of Driving Behavior

As Uber-like chauffeured car services become more and more popular, many drivers have joined the market without special training. To ensure the safety and efficiency of transportation services, it is an important task to accurately evaluate the driving performance of individual driver. Most of the existing methods basically depend on the statistic of abnormal driving events extracted from individual vehicles. However, the occurrence of abnormal events can be affected by various factors, such as road conditions, time of day and weather. It can be bias to judge the driver’s performance by merely counting the abnormal events without considering the driving context. In this paper, we analyze the influence of driving context over driving behaviors and propose a context-aware evaluation method. Instead of taking all the occurrence of driving events as the same, we adopt the TF-IDF to determine the risk weight of a driving event in a specific driving context. Based on the risk-weighted statistics, we evaluate the driving performance precisely and normalize it using the Z score model. An evaluation system is implemented. We evaluate the effectiveness of our method based on a real dataset with 3-year traces of 1000 drivers. The normalized score determined by our method have a greater correlation (0.611) with the accident records than that of the number of abnormal driving events (0.523).

Yikai Zhai, Tianyu Wo, Xuelian Lin, Zhou Huang, Junyu Chen

Measurement of Users’ Experience on Online Platforms from Their Behavior Logs

Explicit measurement of experience, as mostly practiced, takes the form of satisfaction scores obtained by asking questions to users. Obtaining response from every user is not feasible, the responses are conditioned on the questions, and provide only a snapshot, while experience is a journey. Instead, we measure experience values from users’ click actions (events), thereby measuring for every user and for every event. The experience values are obtained without-asking-questions, by combining a recurrent neural network (RNN) with value elicitation from event-sequence. The platform environment is modeled using an RNN, recognizing that a user’s sequence of actions has a temporal dependence structure. We then elicit value of a user’s experience as a latent construct in this environment. We offer two methods: one based on rules crafted from consumer behavior theories, and another data-driven approach using fixed point iteration, similar to that used in model-based reinforcement learning. Evaluation and comparison with baseline show that experience values by themselves provide a good basis for predicting conversion behavior, without feature engineering.

Deepali Jain, Atanu R. Sinha, Deepali Gupta, Nikhil Sheoran, Sopan Khosla

Mining Human Periodic Behaviors Using Mobility Intention and Relative Entropy

Human periodic behaviors is essential to many applications, and many research work show that human behaviors are periodic. However, existing human periodic works are reported with limited improvements in using periodicity of locations and unsatisfactory accuracy for oscillation of human periodic behaviors. To address these challenges, in this paper we propose a Mobility Intention and Relative Entropy (MIRE) model. We use mobility intentions extracting from dataset by tensor decomposition to characterize users’ history records, and use sub-sequence of same mobility intention to mine human periodic behaviors. A new periodicity detection algorithm based on relative entropy is then proposed. The experimental results on real-world datasets demonstrate that the proposed MIRE model can properly mining human periodic behaviors. The comparison results also indicate that MIRE model significantly outperforms state-of-the-art periodicity detection algorithms.

Feng Yi, Libo Yin, Hui Wen, Hongsong Zhu, Limin Sun, Gang Li

Context-Uncertainty-Aware Chatbot Action Selection via Parameterized Auxiliary Reinforcement Learning

We propose a context-uncertainty-aware chatbot and a reinforcement learning (RL) model to train the chatbot. The proposed model is named Parameterized Auxiliary Asynchronous Advantage Actor Critic (PA4C). We utilize a user simulator to simulate the uncertainty of users’ utterance based on real data. Our PA4C model interacts with simulated users to gradually adapt to different users’ utterance confidence in a conversation context. Compared with naive rule-based approaches, our chatbot trained via the PA4C model avoids hand-crafted action selection and is more robust to user utterance variance. The PA4C model optimizes conventional RL models with action parameterization and auxiliary tasks for chatbot training, which address the problems of a large action space and zero-reward states. We evaluate the PA4C model over training a chatbot for calendar event creation tasks. Experimental results show that our model outperforms the state-of-the-art RL models in terms of success rate, dialogue length, and episode reward.

Chuandong Yin, Rui Zhang, Jianzhong Qi, Yu Sun, Tenglun Tan

Learning Product Embedding from Multi-relational User Behavior

Network embedding is a very important method to learn low-dimensional representations of vertexes in networks, which is quite useful in many tasks such as label classification and visualization. However, most existing network embedding methods can only learning embedding from single relational network, which only contains one type of edge relationship between two nodes. However, in real world, especially in product network, many information is presented in multi-relational network. Based on user behavior, edges in product network have many types: “co-purchasing”, “co-viewing”, “view after purchasing” and so on. Therefore, we propose a novel network embedding method aiming to embed multi-relational product network into a low-dimensional vector space. The results show that our method leads to better performance on label classification and visualization tasks in product network.

Zhao Zhang, Weizheng Chen, Xiaoxuan Ren, Yan Zhang

Vulnerability Assessment of Metro Systems Based on Dynamic Network Structure

Invulnerable metro systems are essential for the safety and efficiency of urban transportation services. Therefore, it is of significant interest to systematically assess the vulnerability of metro systems. To this end, in this paper, we assess the vulnerability of metro systems with a data-driven framework in which dynamic travel patterns are considered. Specifically, we use effective attack strategies based on the topology structure of metro networks. The network structure depends on not only connectivity among metro stations but also dynamic passenger flow patterns. Thus, two data-driven metrics, satisfaction rate (SR) and satisfaction rate with path cost (SRPC), are proposed to quantify the vulnerability of metro networks after our attack strategies. Finally, we conduct experiments on Shanghai metro system. The results indicate that the metro system is vulnerable to malicious attacks while it shows strong robustness to random failures. Our results also highlight weak-points and bottlenecks in the system, which may bear practical managerial implications for policymakers to improve the reliability and robustness of the metro systems and the public transportation services.

Jun Pu, Chuanren Liu, Jianghua Zhao, Ke Han, Yuanchun Zhou

Visual Relation Extraction via Multi-modal Translation Embedding Based Model

Visual relation, such as “person holds dog” is an effective semantic unit for image understanding, as well as a bridge to connect computer vision and natural language. Recent work has been proposed to extract the object features in the image with the aid of respective textual description. However, very little work has been done to combine the multi-modal information to model the subject-predicate-object relation triplets to obtain deeper scene understanding. In this paper, we propose a novel visual relation extraction model named Multi-modal Translation Embedding Based Model to integrate the visual information and respective textual knowledge base. For that, our proposed model places objects of the image as well as their semantic relationships in two different low-dimensional spaces where the relation can be modeled as a simple translation vector to connect the entity descriptions in the knowledge graph. Moreover, we also propose a visual phrase learning method to capture the interactions between objects of the image to enhance the performance of visual relation extraction. Experiments are conducted on two real world datasets, which show that our proposed model can benefit from incorporating the language information into the relation embeddings and provide significant improvement compared to the state-of-the-art methods.

Zhichao Li, Yuping Han, Yajing Xu, Sheng Gao

Anomaly Detection and Analytics

Frontmatter

Sub-trajectory- and Trajectory-Neighbor-based Outlier Detection over Trajectory Streams

Precisely and efficiently anomaly detection over trajectory streams is critical for many real-time applications. However, due to the uncertainty and complexity of behaviors of objects over trajectory streams, this problem has not been well solved. In this paper, we propose a novel detection algorithm, called STN-Outlier, for real time applications, where a set of fine-grained behavioral features are extracted from the sub-trajectory instead of point and a novel distance function is designed to measure the behavior similarity between two trajectories. Additionally, an optimized framework(TSX) is introduced to reduce the CPU resources cost of STN-Outlier. The performance experiments demonstrate that STN-Outlier successfully captures more fine-grained behaviors than the state-of-the-art methods; besides, the TSX framework outperforms the baseline solutions in terms of the CPU time in all cases.

Zhihua Zhu, Di Yao, Jianhui Huang, Hanqiang Li, Jingping Bi

An Unsupervised Boosting Strategy for Outlier Detection Ensembles

Ensemble techniques have been applied to the unsupervised outlier detection problem in some scenarios. Challenges are the generation of diverse ensemble members and the combination of individual results into an ensemble. For the latter challenge, some methods tried to design smaller ensembles out of a wealth of possible ensemble members, to improve the diversity and accuracy of the ensemble (relating to the ensemble selection problem in classification). We propose a boosting strategy for combinations showing improvements on benchmark datasets.

Guilherme O. Campos, Arthur Zimek, Wagner Meira Jr.

DeepAD: A Generic Framework Based on Deep Learning for Time Series Anomaly Detection

This paper presents a generic anomaly detection approach for time-series data. Existing anomaly detection approaches have several drawbacks such as a large number of false positives, parameters tuning difficulties, the need for a labeled dataset for training, use-case restrictions, or difficulty of use. We propose DeepAD, an anomaly detection framework that leverages a plethora of time-series forecasting models in order to detect anomalies more accurately, irrespective of the underlying complex patterns to be learnt. Our solution does not rely on the labels of the anomalous class for training the model, nor for optimizing the threshold based on highest detection given the labels in the training data. We compare our framework against EGADS framework on real and synthetic data with varying time-series characteristics. Results show significant improvements on average of 25% and up to $$40-50$$40-50% in $$F_1{\text{- }}score$$F1-score, precision, and recall on the Yahoo Webscope Benchmark.

Teodora Sandra Buda, Bora Caglayan, Haytham Assem

Anomaly Detection Technique Robust to Units and Scales of Measurement

Existing anomaly detection methods are sensitive to units and scales of measurement. Their performances vary significantly if feature values are measured in different units or scales. In many data mining applications, units and scales of feature values may not be known. This paper introduces a new anomaly detection technique using unsupervised stochastic forest, called ‘usfAD’, which is robust to units and scales of measurement. Empirical results show that it produces more consistent results than five state-of-the-art anomaly detection techniques across a wide range of synthetic and benchmark datasets.

Sunil Aryal

Automated Explanations of User-Expected Trends for Aggregate Queries

Recently, a deeper level of data exploration has emerged enabling users to infer anomalies in their queries. This exploration level strives to explain why a particular anomaly exists within a query result by providing a set of explanations. These explanations are precisely a set of alterations, such that when applied on the original query cause anomalies to disappear. Trends are pattern changes in business applications generated based on SQL aggregated queries. Additionally, a user expected trend is a particular pattern change in data was supposedly happen based on businesses studies.In this paper, we generalize this process to automatically produce explanations for users expected trends. We propose User Trend Explanations (UTE) framework which provides insightful explanations by taking a set of user-specified points (called prospective trend), and finds a top explanation that produce this trend. We develop a notion of uniformity of a predicate on a given output, and implement a set of algorithms to search the data space efficiently and effectively. The key idea is harnessing the linear search space rather than the exponential space to enable accurate explanations that are possible with tuples. Our experiments on real datasets show significant improvements UTE provides when compared with state-of-the-art related algorithms.

Ibrahim A. Ibrahim, Xue Li, Xin Zhao, Sanad Al Maskari, Abdullah M. Albarrak, Yanjun Zhang

Social Spammer Detection: A Multi-Relational Embedding Approach

Since the relation is the main data shape of social networks, social spammer detection desperately needs a relation-dependent but content-independent framework. Some recent detection method transforms the social relations into a set of topological features, such as degree, k-core, etc. However, the multiple heterogeneous relations and the direction within each relation have not been fully explored for identifying social spammers. In this paper, we make an attempt to adopt the Multi-Relational Embedding (MRE) approach for learning latent features of the social network. The MRE model is able to fuse multiple kinds of different relations and also learn two latent vectors for each relation indicating both sending role and receiving role of every user, respectively. Experimental results on a real-world multi-relational social network demonstrate the latent features extracted by our MRE model can improve the detection performance remarkably.

Jun Yin, Zili Zhou, Shaowu Liu, Zhiang Wu, Guandong Xu

Opinion Mining and Sentiment Analysis

Frontmatter

Learning to Rank Items of Minimal Reviews Using Weak Supervision

Customer reviews and star ratings are widely used on E-commerce and reviewing sites for the public to express their opinions. To help the online public make decisions, items (e.g., products, services, movies, books) are typically represented and ordered by an aggregated star rating from all reviews. Existing approaches simply average star ratings or use other statistical functions to aggregate star ratings. However, these approaches rely on the existence of large numbers of reviews to work effectively. On the other hand, many new items have few reviews. In this paper, we argue that at the core of review aggregation is ranking items, hence, we cast the problem of ranking a set of items as a learning to rank (L2R) problem to address the issue of reviews scarcity. We devise a rank-oriented loss function to directly optimize the ranking of groups of items. Standard L2R models require ranking labels for training, but item ranking ground-truth information is not always available. Therefore, we propose to aggregate star ratings for items with large numbers of reviews to automatically generate weak supervision ranking labels for training. We further propose to extract features from review contents, rating distributions and helpfulness information to train the ranking model. Extensive experiments on an Amazon dataset showed that our model is very effective compared to state-of-the-art heuristic aggregation approaches, regression and standard L2R approaches.

Yassien Shaalan, Xiuzhen Zhang, Jeffrey Chan

Multimodal Mixture Density Boosting Network for Personality Mining

Knowing people’s personalities is useful in various real-world applications, such as personnel selection. Traditionally, we have to rely on qualitative methodologies, e.g. surveys or psychology tests to determine a person’s traits. However, recent advances in machine learning have it possible to automate this process by inferring personalities from textual data. Despite of its success, text-based method ignores the facial expression and the way people speak, which can also carry important information about human characteristics. In this work, a personality mining framework is proposed to exploit all the information from videos, including visual, auditory, and textual perspectives. Using a state-of-art cascade network built on advanced gradient boosting algorithms, the result produced by our proposed methodology can achieve lower the prediction errors than most current machine learning algorithms. Our multimodal mixture density boosting network especially perform well with small sample size datasets, which is useful for learning problems in psychology fields where big data is often not available.

Nhi N. Y. Vo, Shaowu Liu, Xuezhong He, Guandong Xu

Identifying Singleton Spammers via Spammer Group Detection

Opinion spam is a well-recognized threat to the credibility of online reviews. Existing approaches to detecting spam reviews or spammers examine review content, reviewer behavior and reviewer-product network, and often operate on the assumption that spammers write at least several if not many fake reviews. On the other hand, spammers setup multiple sockpuppet IDs and write one-time, singleton spam reviews to avoid detection. It is reported that for most review sites, a large portion, sometimes over 90%, of reviewers are singletons (identified by the reviewer ID). Singleton spammers are difficult to catch due to the scarcity of behavioral clues. In this paper, we argue that the key to detect singleton spammers (and their fake reviews) is to detect group spam attacks by inferring the hidden collusiveness among them. To address the challenge of lack of explicit behavioral signals for singleton reviewers, we propose to infer the hidden reviewer-product associations by completing the review-product matrix by leveraging the product and review metadata and text. Experiments on three real-life Yelp datasets established that our approach can effectively detect singleton spammers via group detection, which are often missed by existing approaches.

Dheeraj Kumar, Yassien Shaalan, Xiuzhen Zhang, Jeffrey Chan

Adaptive Attention Network for Review Sentiment Classification

Document-level sentiment classification is an important NLP task. The state of the art shows that attention mechanism is particularly effective on document-level sentiment classification. Despite the success of previous attention mechanism, it neglects the correlations among inputs (e.g., words in a sentence), which can be useful for improving the classification result. In this paper, we propose a novel Adaptive Attention Network (AAN) to explicitly model the correlations among inputs. Our AAN has a two-layer attention hierarchy. It first learns an attention score for each input. Given each input’s embedding and attention score, it then computes a weighted sum over all the words’ embeddings. This weighted sum is seen as a “context” embedding, aggregating all the inputs. Finally, to model the correlations among inputs, it computes another attention score for each input, based on the input embedding and the context embedding. These new attention scores are our final output of AAN. In document-level sentiment classification, we apply AAN to model words in a sentence and sentences in a review. We evaluate AAN on three public data sets, and show that it outperforms state-of-the-art baselines.

Chuantao Zong, Wenfeng Feng, Vincent W. Zheng, Hankz Hankui Zhuo

Cross-Domain Sentiment Classification via a Bifurcated-LSTM

Sentiment classification plays a vital role in current online commercial transactions because it is critical to understand users’ opinions and feedbacks in businesses or products. Cross-domain sentiment classification can adopt a well-trained classifier from one source domain to other target domains, which reduces the time and efforts of training new classifiers in these domains. Existing cross-domain sentiment classification methods require data or other information in target domains in order to train their models. However, collecting and processing new corpora require very heavy workload. Besides, the data in target domains may be private and not always available for training. To address these issues, motivated by multi-task learning, we design a Bifurcated-LSTM which takes advantages of attention-based LSTM classifiers along with augmented dataset and orthogonal constraints. This Bifurcated-LSTM can extract domain-invariant sentiment features from the source domain to perform sentiment analysis in different target domains. We conduct extensive experiments on seven classic types of product reviews, and results show that our system leads to significant performance improvement.

Jinlong Ji, Changqing Luo, Xuhui Chen, Lixing Yu, Pan Li

Backmatter

Titel: Advances in Knowledge Discovery and Data Mining
herausgegeben von: Dinh Phung
Vincent S. Tseng
Prof. Geoffrey I. Webb
Bao Ho
Mohadeseh Ganji
Lida Rashidi
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-93034-3
Print ISBN: 978-3-319-93033-6
DOI: https://doi.org/10.1007/978-3-319-93034-3