nach oben

2021 | Buch

Database Systems for Advanced Applications

26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part III

herausgegeben von: Christian S. Jensen, Ee-Peng Lim, De-Nian Yang, Wang-Chien Lee, Vincent S. Tseng, Vana Kalogeraki, Jen-Wei Huang, Chih-Ya Shen

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The three-volume set LNCS 12681-12683 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.

The total of 156 papers presented in this three-volume set was carefully reviewed and selected from 490 submissions.

The topic areas for the selected papers include information retrieval, search and recommendation techniques; RDF, knowledge graphs, semantic web, and knowledge management; and spatial, temporal, sequence, and streaming data management, while the dominant keywords are network, recommendation, graph, learning, and model. These topic areas and keywords shed the light on the direction where the research in DASFAA is moving towards.

Due to the Corona pandemic this event was held virtually.

Inhaltsverzeichnis

Frontmatter

Recommendation

Frontmatter

Gated Sequential Recommendation System with Social and Textual Information Under Dynamic Contexts

Recommendation systems are undergoing plentiful practices in research and industry to improve consumers’ satisfaction. In recent years, many research papers leverage abundant data from heterogeneous information sources to grasp diverse preferences and improve overall accuracy. Some noticeable papers proposed to extract users’ preference from information along with ratings such as reviews or social relations. However, their combinations are generally static and less expressive without considerations on dynamic contexts in users’ purchases and choices.In this paper, we propose Heterogeneous Information Sequential Recom-mendation System (HISR), a dual-GRU structure that builds the sequential dynamics behind the customer behaviors, and combines preference features from review text and social attentional relations under dynamics contexts. A novel gating layer is applied to dynamically select and explicitly combine two views of data. Moreover, in social attention module, temporal textual information is brought in as a clue to dynamically select friends that are helpful for contextual purchase intentions as an implicit combination. We validate our proposed method on two large subsets of real-world local business dataset Yelp, and our method outperforms the state of the art methods on related tasks including social, sequential and heterogeneous recommendations.

Haoyu Geng, Shuodian Yu, Xiaofeng Gao

SRecGAN: Pairwise Adversarial Training for Sequential Recommendation

Sequential recommendation is essentially a learning-to-rank task under special conditions. Bayesian Personalized Ranking (BPR) has been proved its effectiveness for such a task by maximizing the margin between observed and unobserved interactions. However, there exist unobserved positive items that are very likely to be selected in the future. Treating those items as negative leads astray and poses a limitation to further exploiting its potential. To alleviate such problem, we present a novel approach, Sequential Recommendation GAN (SRecGAN), which learns to capture latent users’ interests and to predict the next item in a pairwise adversarial manner. It can be interpreted as playing a minimax game, where the generator would learn a similarity function and try to diminish the distance between the observed samples and its unobserved counterpart, whereas the discriminator would try to maximize their margin. This intense adversarial competition provides increasing learning difficulties and constantly pushes the boundaries of its performance. Extensive experiments on three real-world datasets demonstrate the superiority of our methods over some strong baselines and prove the effectiveness of adversarial training in sequential recommendation.

Guangben Lu, Ziheng Zhao, Xiaofeng Gao, Guihai Chen

SSRGAN: A Generative Adversarial Network for Streaming Sequential Recommendation

Studying the sequential recommendation in streaming settings becomes meaningful because large volumes of user-item interactions are generated in a chronological order. Although a few streaming update strategies have been developed, they cannot be applied in sequential recommendation, because they can hardly capture the long-term user preference only by updating the model with random sampled new instances. Besides, some latent information is ignored because the existing streaming update strategies are designed for individual interactions, without considering the interaction subsequence. In this paper, we propose a Streaming Sequential Recommendation with Generative Adversarial Network (SSRGAN) to solve the streaming sequential recommendation problem. To maintain the long-term memory and keep sequential information, we use the reservoir-based streaming storage mechanism and exploit an active subsequence selection strategy to update model. Moreover, to improve the effectiveness and efficiency of online model training, we propose a novel negative sampling strategy based on GAN to generate the most informative negative samples and use Gumble-Softmax to overcome the gradient block problem. We conduct extensive experiments on two real-world datasets and the results shows the superiority of our approaches in streaming sequential recommendation.

Yao Lv, Jiajie Xu, Rui Zhou, Junhua Fang, Chengfei Liu

Topological Interpretable Multi-scale Sequential Recommendation

Sequential recommendation attempts to predict next items based on user historical sequences. However, items to be predicted next depend on user’s long, short or mid-term interest. The multi-scale modeling of user interest in an interpretable way poses a great challenge in sequential recommendation. Hence, we propose a topological data analysis based framework to model target items’ explicit dependency on previous items or item chunks with different time scales, which are easily changed into sequential patterns. First, we propose a topological transformation layer to map each user interaction sequence into persistent homology organized in a multi-scale interest tree. Then, this multi-scale interest tree is encoded to represent natural inclusion relations across scales through an recurrent aggregation process, namely tree aggregation block. Next, we add this block to the vanilla transformer, referred to as recurrent tree transformer, and utilize this new transformer to generate a unified user interest representation. The last fully connected layer is utilized to model the interaction between this unified representation and item embedding. Comprehensive experiments are conducted on two public benchmark datasets. Performance improvement on both datasets is averagely $$5\%$$ 5 % over state-of-the-art baselines.

Tao Yuan, Shuzi Niu, Huiyuan Li

SANS: Setwise Attentional Neural Similarity Method for Few-Shot Recommendation

Recommender systems generate personalized recommendations for users based on their historical data. However, if some users have few interactions in the training data, i.e., few-shot users, recommendations for them will be inaccurate. In this paper, we propose a setwise attentional neural similarity method (SANS) for the few-shot recommendation problem. Unlike general recommendation algorithms, we eliminate direct representations of few-shot users. First, a neural similarity method is proposed to effectively estimate the correlation between items. Then, we propose a setwise attention mechanism to obtain recommendation scores by aggregating the correlations between a candidate item and items in a candidate user’s historical interactions. To facilitate model training in the few-shot scenario, training samples are generated by episode sampling, and each training sample is assigned with an adaptive weight to emphasize the importance of few-shot users. We simulate the few-shot recommendation problem on three real-world datasets and extensive results show that SANS can outperform the state-of-the-art recommendation algorithms in few-shot recommendation.

Zhenghao Zhang, Tun Lu, Dongsheng Li, Peng Zhang, Hansu Gu, Ning Gu

Semi-supervised Factorization Machines for Review-Aware Recommendation

Textual reviews, as a useful supplementary of the interaction data, has been widely used to enhance the performance of recommender systems, especially when the interaction data is sparse. However, existing solutions to review-aware recommendation only focus on learning more informative features from reviews, yet ignore the insufficient number of training examples, resulting in limited performance improvements. To this end, we propose a co-training style semi-supervised review-aware recommendation model, called Collaborative Factorization Machines (CoFM), to augment the training dataset as well as increase its informativeness. Our CoFM employs two FMs as base predictors, each of which labels unlabeled examples for its peer predictor in the learning process. Specifically, a user-leaded FM and an item-leaded FM are separately built using different reviews to increase the diversity between two predictors. Furthermore, to exploit unlabeled data safely, the labeling confidence is estimated through validating the influence of the labeling of unlabeled examples on the labeled ones. The final prediction is made by linearly blending the outputs of two predictors. Extensive experiments on three real-world benchmarks demonstrate the superiority of CoFM over several state-of-the-art review-aware and semi-supervised recommendation schemes.

Junheng Huang, Fangyuan Luo, Jun Wu

DCAN: Deep Co-Attention Network by Modeling User Preference and News Lifecycle for News Recommendation

Personalized news recommendation systems aim to alleviate information overload and provide users with personalized reading suggestions. In general, each news has its own lifecycle that is depicted by a bell-shaped curve of clicks, which is highly likely to influence users’ choices. However, existing methods typically depend on capturing user preference to make recommendations while ignoring the importance of news lifecycle. To fill this gap, we propose a Deep Co-Attention Network DCAN by modeling user preference and news lifecycle for news recommendation. The core of DCAN is a Co-Attention Net that fuses the user preference attention and news lifecycle attention together to model the dual influence of users’ clicked news. In addition, in order to learn the comprehensive news representation, a Multi-Path CNN is proposed to extract multiple patterns from the news title, content and entities. Moreover, to better capture user preference and model news lifecycle, we present a User Preference LSTM and a News Lifecycle LSTM to extract sequential correlations from news representations and additional features. Extensive experimental results on two real-world news datasets demonstrate the significant superiority of our method and validate the effectiveness of our Co-Attention Net by means of visualization.

Lingkang Meng, Chongyang Shi, Shufeng Hao, Xiangrui Su

Considering Interaction Sequence of Historical Items for Conversational Recommender System

Different from the traditional recommender systems with content-based and collaborative filtering, conversational recommender systems (CRS) can dynamically dialogue with users to capture fine-grained preferences. Although several efforts have been made for CRS, they neglect the importance of interaction sequences, which seek to capture the ‘context’ of users’ activities based on actions they have performed recently. Therefore, we propose a framework that considers interaction Sequence of historical items for Conversational Recommendation (SeqCR). Specifically, SeqCR first scores candidate items through the sequence which users interact with. Then it can generate the recommendation list and attributes to be asked based on the scores. We restrict candidate attributes to the ones with high-scoring (high-relevance) items, which effectively reduces the search space of attributes and leads to user preferences that can be hit more quickly and accurately. Finally, SeqCR utilizes the policy network to decide whether to recommend or ask. We conduct extensive experiments on two datasets from MovieLens 10M and Yelp in multi-round conversational recommendation scenarios. Empirical results demonstrate our SeqCR significantly outperforms the state-of-the-art methods.

Xintao Tian, Yongjing Hao, Pengpeng Zhao, Deqing Wang, Yanchi Liu, Victor S. Sheng

Knowledge-Aware Hypergraph Neural Network for Recommender Systems

Knowledge graph (KG) has been widely studied and employed as auxiliary information to alleviate the cold start and sparsity problems of collaborative filtering in recommender systems. However, most of the existing KG-based recommendation models suffer from the following drawbacks, i.e., insufficient modeling of high-order correlations among users, items, and entities, and simple aggregation strategies which fail to preserve the relational information in the neighborhood. In this paper, we propose a Knowledge-aware Hypergraph Neural Network (KHNN) framework to tackle the above issues. First, the knowledge-aware hypergraph structure, which is composed of hyperedges, is employed for modeling users, items, and entities in the knowledge graph with explicit hybrid high-order correlations. Second, we propose a novel knowledge-aware hypergraph convolution method to aggregate different knowledge-based neighbors in hyperedge efficiently. Moreover, it can conduct the embedding propagation of high-order correlations explicitly and efficiently in knowledge-aware hypergraph. Finally, we apply the proposed model on three real-world datasets, and the empirical results demonstrate that KHNN can achieve the best improvements against other state-of-the-art methods.

Binghao Liu, Pengpeng Zhao, Fuzhen Zhuang, Xuefeng Xian, Yanchi Liu, Victor S. Sheng

Personalized Dynamic Knowledge-Aware Recommendation with Hybrid Explanations

Explainable recommendation is attracting more and more attention in both industry and research communities. While some existing models utilize reviews for improving the performance of recommender systems, most of them assume that user’s preference is static and each review’s importance is user-independent. However, it is intuitive that user’s preference is always dynamically changing and reviews from similar users should be given more importance as they share similar tastes. Moreover, they achieve model explainability at either feature level that is too concise or review level that is too redundant. To deal with these problems, we propose a Personalized Dynamic Knowledge-aware Recommender (PDKR) for dynamic user modeling and personalized item modeling. In particular, we model user’s preference with defined entities and relations in sequential knowledge graphs and capture its dynamics with a novel interval-aware Gated Recurrent Unit (GRU). Furthermore, by leveraging self-attention mechanism, we can not only learn each review’s user-specific importance, but also provide tailored explanations for each user at both feature level and review level. We conduct extensive experiments on three benchmark datasets from Amazon and Yelp and the results show that PDKR outperforms all the state-of-the-art recommendation approaches in rating prediction task while providing more effective explanations simultaneously.

Hao Sun, Zijian Wu, Yue Cui, Liwei Deng, Yan Zhao, Kai Zheng

Graph Attention Collaborative Similarity Embedding for Recommender System

We present Graph Attention Collaborative Similarity Embedding (GACSE), a new recommendation framework that exploits collaborative information in the user-item bipartite graph for representation learning. Our framework consists of two parts: the first part is to learn explicit graph collaborative filtering information such as user-item association through embedding propagation with attention mechanism, and the second part is to learn implicit graph collaborative information such as user-user similarities and item-item similarities through auxiliary loss. We design a new loss function that combines BPR loss with adaptive margin and similarity loss for the similarities learning. Extensive experiments on three benchmarks show that our model is consistently better than the latest state-of-the-art models.

Jinbo Song, Chao Chang, Fei Sun, Zhenyang Chen, Guoyong Hu, Peng Jiang

Learning Disentangled User Representation Based on Controllable VAE for Recommendation

User behaviour on purchasing is always driven by complex latent factors, which are highly disentangled in the real world. Learning latent factorized representation of users can uncover user intentions behind the observed data (i.e. user-item interaction) and improve the robustness and interpretability of the recommender system. However, existing collaborative filtering methods learning disentangled representation face problems of balancing the trade-off between reconstruction quality and disentanglement. In this paper, we propose a controllable variational autoencoder framework for collaborative filtering. Specifically, we adopt a modified Proportional-Integral-Derivative (PID) control to the $$\beta $$ β -VAE objective to automatically tune the hyperparameter $$\beta $$ β using the output of Kullback-Leibler divergence as feedback. We further introduce item embeddings to guide the system to learn representation related to the real-world concepts using a factorized Gaussian distribution. Experimental results show that our model can get a crucial improvement over state-of-the-art baselines. We further evaluate our model’s effectiveness to control the trade-off between reconstruction error and disentanglement quality in the recommendation.

Yunyi Li, Pengpeng Zhao, Deqing Wang, Xuefeng Xian, Yanchi Liu, Victor S. Sheng

DFCN: An Effective Feature Interactions Learning Model for Recommender Systems

Data features in real industrial recommendation scenarios are diverse, high-dimensional and sparse. Effective feature crossing can improve the performance of recommendation, which is of great significance. Manual feature engineering is no longer applicable due to its high cost and low efficiency. Factorization machines introduce the second-order feature interactions to enhance learning ability. Deep neural networks (DNNs) have good nonlinear combination ability and can learn high-order feature interactions. However, DNNs implicitly learn feature interactions at the bit-wise level is not always effective. In this paper, we propose a novel factorization cross network (FCN), which is based on factorization to learn explicit feature crossing through neural network. FCN can learn low- and high-order feature interactions at the vector-wise level with linear time complexity. We introduce deep residual network (DRN) to learn implicit feature interactions. We further use learnable parameters to combine FCN and DRN, and name the new model as deep factorization cross network (DFCN). DFCN can automatically learn low- and high-order explicit and implicit feature interaction information. We have carried out comprehensive experiments on three real-world datasets. Experimental results demonstrate the effectiveness of DFCN, which performs best compared with other competitive models.

Wei Yang, Tianyu Hu

Tell Me Where to Go Next: Improving POI Recommendation via Conversation

Next Point-of-Interest (POI) recommendation estimates user preference on POIs according to past check-in history, suffering from the intrinsic limitation of obtaining dynamic user preferences. Conversational Recommendation System (CRS), which can collect dynamic user preferences through conversation, brings a solution to the above limitation. However, none of the existing CRS methods consider the spatio-temporal factors in the action selection phase, which are essential for POI conversational recommendation. In this paper, we propose a new Spatio-Temporal Conversational Recommendation System (STCRS) to fuse the spatio-temporal and dialogue information for next POI recommendation. Specifically, STCRS first learns the spatio-temporal information in the user’s check-in history. Then reinforcement learning is used to decide which action (asking for an attribute or recommending POIs) to take at the next turn to achieve successful POI recommendation within as few turns as possible. Finally, our extensive experiments on two real-world datasets demonstrate significant improvements over the state-of-the-art methods.

Changheng Li, Yongjing Hao, Pengpeng Zhao, Fuzhen Zhuang, Yanchi Liu, Victor S. Sheng

MISS: A Multi-user Identification Network for Shared-Account Session-Aware Recommendation

The user’s interactions with the system within a given time frame are organized into a session. The task of session-aware recommendation aims to predict the next interaction based on user’s historical sessions and current session. Though existing methods have achieved promising results, they still have drawbacks in some aspects. First, most existing deep learning methods model a session as a sequence, but neglect the complex transition relationships between items. Second, a single account is usually regarded as a single user by default, where the scenario of multiple users sharing the same account is ignored. To this end, we propose a Multi-user Identification network named MISS for the Shared-account Session-aware recommendation problem. MISS consists of two core components: one is the Dwell Graph Neural Network (DGNN), which incorporates item dwell time into the gated graph neural network to capture user interest drift across sessions. The other is a Multi-user Identification (MI) module, which draws on the attention mechanism to distinguish behaviors of different users under the same account. To verify the effectiveness of MISS, we construct two data sets with shared account characteristics from real-world smart TV watching logs. Extensive experiments conducted on the two data sets demonstrate that MISS evidently outperforms the state-of-the-art recommendation methods.

Xinyu Wen, Zhaohui Peng, Shanshan Huang, Senzhang Wang, Philip S. Yu

VizGRank: A Context-Aware Visualization Recommendation Method Based on Inherent Relations Between Visualizations

Visualization recommendation systems measure the importance of visualizations to make suggestions. While considering each visualization individually may be enough to gauge its importance in specific scenarios, it ignores the relations between visualizations under a visual analysis context. This paper is to study a strategy via a more general method called VizGRank which models the relations between visualizations as a graph, then calculates the importance of visualizations by adopting a graph-based algorithm. In this model, the relations derived from the visual encoding of the visualizations and the underlying data schema are used for recommendation. Due to the lack of public benchmarks, the effectiveness of the model is evaluated on the synthetic results from an existing public benchmark IDEBench as a workaround. However, since the existing benchmark is specific and synthetic and does not reflect the realistic scenarios of visualization recommendation completely, a new benchmark for visualization recommendation is designed and constructed by collecting real public datasets. Extensive experiments on both the public benchmark and the new benchmark demonstrate that the VizGRank can better capture the relative importance of visualization and outperforms the existing state-of-the-art method.

Qianfeng Gao, Zhenying He, Yinan Jing, Kai Zhang, X. Sean Wang

Deep User Representation Construction Model for Collaborative Filtering

Model-based collaborative filtering (CF) methods can be divided into user-item methods and item-item methods. In most cases, both of them can be seen as modeling the user-item interaction and the only difference between them is that they adopt different ways to build user representations. User-item methods obtain user representations by directly assigning each user a real-valued vector and do not consider users’ historical item information. However, users’ historical item information can reflect users’ preferences to some extent and can alleviate the problem of data sparsity. Ignoring this information may lead to incomplete construction of user representations and vulnerability to data sparsity. Although existing item-item methods address this problem by using the users’ historical items to build the user representations, they always use the same vector to represent the same historical item for different users, which may limit the expressiveness and further improvement of the models. In this paper, we propose Deep User Representation Construction Model (DURCM) to construct user presentations in a more effective and robust way. Specially, different from existing item-item methods that directly use historical item vectors to build user representations, we first adopt a conversion module to convert a user’s historical item vectors into personalized item vectors, which enables that even the same item has different expressions for different users. Second, we design a special attention module to automatically assign weights to these personalized item vectors when constructing the users’ final representations. We conduct comprehensive experiments on four real-world datasets and the results verify the effectiveness of our proposed methods.

Daomin Ji, Zhenglong Xiang, Yuanxiang Li

DiCGAN: A Dilated Convolutional Generative Adversarial Network for Recommender Systems

Generative Adversarial Network (GAN) has recently been introduced into the domain of recommendation due to its ability of learning the distribution of users’ preferences. However, most existing GAN-based recommendation methods only exploit the user-item interactions, while ignoring to leverage the information between user’s interacted items. On the other hand, Convolutional Neural Network (CNN) has shown its power in learning high-order correlations. In this paper, combining with the strengths of both GAN and CNN, we propose a Dilated Convolutional Generative Adversarial Network (DiCGAN) for recommendation, in which we first embed the interacted items of per user into an image in a latent space, and then use several dilated convolutional filters and a vertical convolutional filter to capture the high-order correlations among the interacted items. Moreover, an attention module is employed before convolution to generate attention maps for adaptive feature refinement. Experiments on several public datasets verify the superiority of DiCGAN over several baselines in terms of top-N recommendation. Further more, our experimental results show that when the dataset is more large and sparse, the performance gain of DiCGAN is also more significant, demonstrating the effectiveness of the CNN component in extracting high-order correlations from interacted data for better performance.

Zhiqiang Guo, Chaoyang Wang, Jianjun Li, Guohui Li, Peng Pan

RE-KGR: Relation-Enhanced Knowledge Graph Reasoning for Recommendation

A knowledge graph (KG) has been widely adopted to improve recommendation performance. The multi-hop user-item connections in a KG can provide reasons for recommending an item to a user. However, existing methods do not effectively leverage the relations of entities and interpretable paths in a KG. To address this limitation, in this paper, we propose a novel recommendation framework called relation-enhanced knowledge graph reasoning for recommendation (RE-KGR) that combines recommendation and explainability by reasoning user-item interaction paths (UIIPs). First, instead of applying an alignment algorithm for preprocessing, RE-KGR directly learns the semantic representation of entities from structured knowledge by stacking relation-based convolutional layers to take full advantage of the KG. Moreover, RE-KGR infers user preferences by calculating the sum of all UIIPs between users and items. Finally, RE-KGR selects several UIIPs with the highest probabilities as possible reasons for the recommendations. Extensive experiments on three real-world datasets demonstrate that our proposed method significantly outperforms several state-of-the-art baselines and achieves superior performance and explainability.

Ming He, Hanyu Zhang, Han Wen

LGCCF: A Linear Graph Convolutional Collaborative Filtering with Social Influence

Collaborative filtering (CF) is the dominant technique in personalized recommendation. It models user-item interactions to select the relevant items for a user, and it is widely applied in real recommender systems. Recently, graph convolutional network (GCN) has been incorporated into CF, and it achieves better performance in many recommendation scenarios. However, existing works usually suffer from limited performance due to data sparsity and high computational costs in large user-item graphs. In this paper, we propose a linear graph convolutional CF (LGCCF) framework that incorporates the social influence as side information to help improve recommendation and address the aforementioned issues. Specifically, LGCCF integrates the user-item interactions and the social influence into a unified GCN model to alleviate data sparsity. Furthermore, in the graph convolutional operations of LGCCF, we remove the nonlinear transformations and replace them with linear embedding propagations to overcome training difficulty and improve the recommendation performance. Finally, extensive experiments conducted on two real datasets show that LGCCF consistently outperforms the state-of-the-art recommendation methods.

Ming He, Han Wen, Hanyu Zhang

Sirius: Sequential Recommendation with Feature Augmented Graph Neural Networks

Many practical recommender systems recommend personalized items for different users by mining user-item interaction sequences. The interaction sequences, as a whole, imply the manifold collaborative relations among users and items. Further, from the view of users, the item orders and time intervals between interactions could expose the evolution of user interests, and from the view of items, attributes of the items on interaction sequences may reveal the variation of item popularity. However, most of the existing recommendation models ignore those valuable information, and cannot fully explore the intrinsic implication of interaction sequences. In the paper, we propose a method named Sirius, which develops GNNs (Graph Neural Networks) to model the collaborative relations and capture the dynamics of time and attribute features in sequences. We give the workflow of the Sirius method, and describe the implementations about graph construction, item embedding generation, sequence embedding generation and next-item prediction. Finally, we give an example of Sirius recommendations, which visually shows the impact of feature information on the recommendation results. At present, Sirius has been adopted by MX Player, one of India’s largest streaming platforms, recommending movies for thousands of users.

Xinzhou Dong, Beihong Jin, Wei Zhuo, Beibei Li, Taofeng Xue

Combining Meta-path Instances into Layer-Wise Graphs for Recommendation

In the recommendation area, the concept of meta-path is famous for inferring explicit and effective relationships between nodes such as users and items. To extract useful information from the instances of meta-paths, existing methods embed meta-path instances separately. However, they ignore the complicated semantics presented by multiple instances. These complicated semantics not only provide additional information but also affect the semantics of single instances. Without considering the complicated semantics, the information extracted from the instances may be incomplete and less effective. To solve the problem, we propose to learn the complicated semantics by combining meta-path instances into layer-wise graphs (instance-graphs) for recommendation. Following the idea, we develop an Instance-Graph based Recommendation method (IGR). IGR combines meta-path instances into layer-wise instance-graphs. Then, the instance-graphs are investigated layer by layer to generate effective embeddings. Finally, these embeddings are discriminatively merged into user/item embeddings to make predictions. Extensive experimental results show that IGR outperforms various state-of-the-arts recommendation methods.

Mingda Qian, Bo Li, Xiaoyan Gu, Zhuo Wang, Feifei Dai, Weiping Wang

GCAN: A Group-Wise Collaborative Adversarial Networks for Item Recommendation

Recommendation System aims to provide personalized recommendation for different users. Recently, Generative Adversarial Networks based recommendation systems have attracted considerable attention. In previous research, GAN has shown potential and flexibility to learn latent features of users’ preferences. However, GANs are hard to train to converge and waste many processes of fulfilling empty data, especially when meeting with the data sparsity problem.In this paper, we propose a new group-wise framework, namely Group-wise Collaborative Adversarial Networks (GCAN) to solve the data sparsity problem and enable GAN to converge faster. We combine GAN with traditional collaborative filtering methods to generate recommendations (CAN), and then propose binary masking and sample shifting to achieve GCAN. Binary masking separates binary user-item interaction and abstracts group-wise relationship from these binary vectors, while sample shifting is designed to avoid incorrect learning process. A noise corruption parameter is then introduced with experiments to show the robustness of GCAN. We compare GCAN with other baseline methods on Yelp and SC dataset, where GCAN achieves the state-of-the-art performances for personalized item recommendation.

Xuehan Sun, Tianyao Shi, Xiaofeng Gao, Xiang Li, Guihai Chen

Emerging Applications

Frontmatter

PEEP: A Parallel Execution Engine for Permissioned Blockchain Systems

Unlike blockchain systems in public settings, the stricter trust model in permissioned blockchain opens an opportunity for pursuing higher throughput. Recently, as the consensus protocols are developed significantly, the existing serial execution manner of transactions becomes a key factor in limiting overall performance. However, it is not easy to extend the concurrency control protocols, widely used in database systems, to blockchain systems. In particular, there are two challenges to achieve parallel execution of transactions in blockchain as follows: (i) the final results of different replicas may diverge since most protocols just promise the effect of transactions equivalent to some serial order but this order may vary for every concurrent execution; and (ii) almost all state trees that are used to manage states of blockchain do not support fast concurrent updates. In the view of above challenges, we propose a parallel execution engine called PEEP, towards permissioned blockchain systems. Specifically, PEEP employs a deterministic concurrency mechanism to obtain a predetermined serial order for parallel execution, and offers parallel update operations on state tree, which can be implemented on any radix tree with Merkle property. Finally, the extensive experiments show that PEEP outperforms existing serial execution greatly.

Zhihao Chen, Xiaodong Qi, Xiaofan Du, Zhao Zhang, Cheqing Jin

URIM: Utility-Oriented Role-Centric Incentive Mechanism Design for Blockchain-Based Crowdsensing

Crowdsensing is a prominent paradigm that collects data by outsourcing to individuals with sensing devices. However, most existing crowdsensing systems are based on centralized architecture which suffers from poor data quality, high service charge, single point of failure, etc. Some studies have explored decentralized architectures and implementations for crowdsensing based on blockchain, while incentive mechanisms for worker participation and miner participation, which serve as a crucial role in blockchain-based crowdsensing systems (BCSs), are ignored. To address this issue, we propose an incentive mechanism design named URIM to maximize participants’ utilities, which consists of worker-centric and miner-centric incentive mechanisms for BCSs. For the worker-centric incentive mechanism, we model it as a reverse auction, in which dynamic programming is utilized to select workers, and payments are determined based on the Vickrey-Clarke-Groves scheme. We also prove this incentive mechanism is computationally efficient, individually rational and truthful. For the miner-centric incentive mechanism, we model interactions among the requester and miners as a Stackelberg game and adopt the backward induction to analyze its equilibrium at which the utilities of the requester and miners are optimized. Finally, we demonstrate the significant performance of URIM through extensive simulations.

Zheng Xu, Chaofan Liu, Peng Zhang, Tun Lu, Ning Gu

PAS: Enable Partial Consensus in the Blockchain

Permissioned Blockchain enables distributed collaboration among organizations that may not trust each other. However, existing systems cannot efficiently support the ordering and execution of transactions in different workflows parallelly, which seriously affects system scalability and performances in terms of throughput and latency.In this paper, we present a partial consensus mechanism named PAS to achieve fault tolerance and parallelism of transaction processing. In PAS, transactions in different workflows only need to be confirmed by the involved subset of nodes, which significantly enhances the system performance and scalability. Specifically, we introduce a novel data structure, called the hierarchical consensus tree (HCT). It is maintained in each node and used to coordinate the consensus process. HCT guarantees that the consistency reached in different sets of nodes is eventually agreed by all nodes without conflicts and rollbacks. Since there are many valid HCTs with different system improvements, we introduce an optimization problem, named OHCT, to obtain an HCT with respect to the optimal enhancement. We prove OHCT is NP-hard and propose a general framework with efficient algorithms to address it. Finally, we implement PAS on PBFT-based Hyperledger fabric and conduct extensive experiments to show the performance and scalability of PAS.

Zihuan Xu, Siyuan Han, Lei Chen

Redesigning the Sorting Engine for Persistent Memory

Emerging persistent memory (PM, also termed as non-volatile memory) technologies can promise large capacity, non-volatility, byte-addressability and DRAM-comparable access latency. Such amazing features have inspired a host of PM-based storage systems and applications that store and access data directly in PM. Sorting is an important function for many systems, but how to optimize sorting for PM-based systems has not been systematically studied yet. In this paper, we conduct extensive experiments for many existing sorting methods, including both conventional sorting algorithms adapted for PM and recently-proposed PM-friendly sorting techniques, on a real PM platform. The results indicate that these sorting methods all have drawbacks for various workloads. Some of the results are even counterintuitive compared to running on a DRAM-simulated platform in their papers. To the best of our knowledge, we are the first to perform a systematic study on the sorting issue for persistent memory. Based on our study, we propose an adaptive sorting engine, namely SmartSort, to optimize the sorting performance for different conditions. The experimental results demonstrate that SmartSort remarkably outperforms existing sorting methods in a variety of cases.

Yifan Hua, Kaixin Huang, Shengan Zheng, Linpeng Huang

ImputeRNN: Imputing Missing Values in Electronic Medical Records

Electronic Medical Records (EMRs), which record visits of patients to the hospital, are the main resources for medical data analysis. However, plenty of missing values in EMRs limit the model capability for various researches in healthcare. Recently, many imputation methods have been proposed to address this challenging problem, but they fail to take medical bias into account. Medical bias is a ubiquitous phenomenon that the missingness of medical data is missing not at random because doctors prone to measure features related to the disease of patients. It reflects the physical conditions of patients, which helps impute missing data with accurate and practical values. In this paper, we propose a novel joint recurrent neural network (RNN) model called ImputeRNN, which considers medical bias for EMR imputation. We model the medical bias by an additional RNN based on a mask (missing or not) matrix, whose hidden vectors are incorporated into the model as contexts by a fusion layer. Extensive experiments on two real-world EMR datasets demonstrate that ImputeRNN outperforms state-of-the-art methods on imputation and downstream prediction tasks.

Jiawei Ouyang, Yuhao Zhang, Xiangrui Cai, Ying Zhang, Xiaojie Yuan

Susceptible Temporal Patterns Discovery for Electronic Health Records via Adversarial Attack

The recent advancements in deep neural networks (DNNs) are revolutionizing the healthcare domain. Although many studies try to build medical DNNs model based on historical Electronic Health Records (EHR) and have achieved promising performance in many clinical prediction tasks, recent studies show that DNNs are vulnerable to adversarial attacks. Much of the interest in adversarial examples has stemmed from their ability to shed light on possible limitations of DNNs. However, related research has been receiving sustained attention in computer vision community, how to design adversarial examples for EHR data remains a rarely investigated. To figure out this problem, we propose a novel approach for generating EHR adversarial examples, named as TSAttack, which explores temporal structure contained in EHR to achieve an effective and efficient attack. Based on the generated EHR adversarial examples, we further propose a procedure to discover susceptible temporal patterns (STP) in a patient’s medical records, which provide clinical decision support for dynamic monitoring. Extensive experiments on the real-world longitudinal EHR database MIMIC-III have demonstrated the effectiveness of our approach is yielding better performance in adversarial settings.

Rui Zhang, Wei Zhang, Ning Liu, Jianyong Wang

A Decision Support System for Heart Failure Risk Prediction Based on Weighted Naive Bayes

Heart failure (HF) affects the health of millions of people worldwide and the early detection of HF risk plays a vital role in prevention and prompt treatment. Various decision support systems based on machine learning have been presented recently to predict HF. However, the existing systems usually assumed that all features add equal weight to the prediction result, which could not properly simulate the diagnostic status. In this study, a decision support system is proposed for HF prediction using MSE Back Propagation Method (MSEBPM) and weighted naive Bayes. First, the feature selection method eliminates irrelevant features to improve accuracy and decrease computational times. Second, the proposed MSEBPM computes a weight vector for features based on their contributions, trying to minimize the MSE loss of the predicted class probabilities. Finally, the trained weight vector is applied to the weighted naive Bayes model for HF risk prediction. The proposed system is evaluated with a published dataset of 899 patients, and compared with conventional data mining techniques and other state-of-the-art systems. The results show that our proposed system leads to 82.96% accuracy in HF risk prediction, which suggests that it could be used to early detect HF in the clinic.

Kehui Song, Shenglong Yu, Haiwei Zhang, Ying Zhang, Xiangrui Cai, Xiaojie Yuan

Inheritance-Guided Hierarchical Assignment for Clinical Automatic Diagnosis

Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. Considering that manual diagnosis could be error-prone and time-consuming, many intelligent approaches based on clinical text mining have been proposed to perform automatic diagnosis. However, these methods may not achieve satisfactory results due to the following challenges. First, most of the diagnosis codes are rare, and the distribution is extremely unbalanced. Second, existing methods are challenging to capture the correlation between diagnosis codes. Third, the lengthy clinical note leads to the excessive dispersion of key information related to codes. To tackle these challenges, we propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis. Specifically, we propose a hierarchical joint prediction strategy to address the challenge of unbalanced codes distribution. Then, we utilize graph convolutional neural networks to obtain the correlation and semantic representations of medical ontology. Furthermore, we introduce multi attention mechanisms to extract crucial information. Finally, extensive experiments on MIMIC-III dataset clearly validate the effectiveness of our method.

Yichao Du, Pengfei Luo, Xudong Hong, Tong Xu, Zhe Zhang, Chao Ren, Yi Zheng, Enhong Chen

BPTree: An Optimized Index with Batch Persistence on Optane DC PM

Intel Optane DC Persistent Memory (PM) is the first commercially available PM product. Although it meets many hypothesises about PM in previous studies, some other design considerations are observed in subsequent tests. For instance, 1) the internal data access granularity in Optane DC PM is 256B, accesses smaller than 256B will cause read/write amplification; 2) the locking overhead will be amplified when the PM operations are included in the critical area or the lock is added on PM. In this paper, we propose a novel persistent index called BPTree to fit with these new features. The core idea of BPTree is to buffer multiple writes in DRAM first, and later persist them in batches to PM to reduce the write amplification. We add a buffer layer in BPTree to enable the batch persistence, and design a GC-friendly log structure on PM to guarantee the buffer’s durability. To improve the scalability, we also implement a hybrid concurrency control strategy to ensure most of the operations on PM are lock-free, and move the lock from PM to DRAM for the operations that must be locked. Our experiments on Optane DC PM show that BPTree reduces 256B PM writes by a factor of 1.95–2.48x compared to the state-of-the-art persistent indexes. Moreover, BPTree has better scalability in the concurrent environment.

Chenchen Huang, Huiqi Hu, Aoying Zhou

An Improved Dummy Generation Approach for Enhancing User Location Privacy

Location-based services (LBS), which provide personalized and timely information, entail privacy concerns such as unwanted leak of current user locations to potential stalkers. Existing works have proposed dummy generation techniques by creating a cloaking region (CR) such that the user’s location is at a fixed distance from the center of CR. Hence, if the adversary somehow knows the location of the center of CR, the user’s location would be vulnerable to attack. We propose an improved dummy generation approach for facilitating improved location privacy for mobile users. Our performance study demonstrates that our proposed approach is indeed effective in improving user location privacy.

Shadaab Siddiqie, Anirban Mondal, P. Krishna Reddy

Industrial Papers

Frontmatter

LinkLouvain: Link-Aware A/B Testing and Its Application on Online Marketing Campaign

A lot of online marketing campaigns aim to promote user interaction. The average treatment effect (ATE) of campaign strategies need to be monitored throughout the campaign. A/B testing is usually conducted for such needs, whereas the existence of user interaction can introduce interference to normal A/B testing. With the help of link prediction, we design a network A/B testing method LinkLouvain to minimize graph interference and it gives an accurate and sound estimate of the campaign’s ATE. In this paper, we analyze the network A/B testing problem under a real-world online marketing campaign, describe our proposed LinkLouvain method, and evaluate it on real-world data. Our method achieves significant performance compared with others and is deployed in the online marketing campaign.

Tianchi Cai, Daxi Cheng, Chen Liang, Ziqi Liu, Lihong Gu, Huizhi Xie, Zhiqiang Zhang, Xiaodong Zeng, Jinjie Gu

An Enhanced Convolutional Inference Model with Distillation for Retrieval-Based QA

A common solution of automatic question-answering (QA) systems is retrieving the most similar question for a given user query from a QA knowledge base. Even though some models have got promising performance on this task, it may be hard for them to achieve a balance between accuracy and efficiency. In this paper, we propose an enhanced convolutional inference model with StructBert distillation, called StructBert-ECIM, to achieve such balance.

Shuangyong Song, Chao Wang, Xiao Pu, Zehui Wang, Huan Chen

Familia: A Configurable Topic Modeling Framework for Industrial Text Engineering

In this paper, we propose a configurable topic modeling framework named Familia. Familia supports an important line of topic models that are widely applicable in text engineering scenarios. In order to relieve burdens of software engineers without knowledge of Bayesian networks, Familia is able to conduct automatic parameter inference for a variety of topic models. Simply through changing the data organization of Familia, software engineers are able to easily explore a broad spectrum of existing topic models or even design their own topic models, and find the one that best suits the problem at hand. With its superior extendability, Familia has a novel sampling mechanism that strikes balance between effectiveness and efficiency of parameter inference. Furthermore, Familia is essentially a big topic modeling framework that supports parallel parameter inference and distributed parameter storage. The utilities and necessity of Familia are demonstrated in real-life industrial applications. Familia would significantly enlarge software engineers’ arsenal of topic models and pave the way for utilizing highly customized topic models in real-life problems. Source code of Familia have been released at Github via https://github.com/baidu/Familia/ .

Di Jiang, Yuanfeng Song, Rongzhong Lian, Siqi Bao, Jinhua Peng, Huang He, Hua Wu, Chen Zhang, Lei Chen

Generating Personalized Titles Incorporating Advertisement Profile

Advertisement (Ad) title plays a significant role in the effectiveness of online commercial advertising. However, it’s difficult for most advertisers to think of attractive titles for their products. By mining keywords from current ad material, traditional retrieval methods and neural text generation models have been applied to solve this problem. However, few of them focus on personalized ad titles generation. Ad titles from different advertisers can be very diversified, and there is massive previous advertising data available, which can tell the style, content, and vocabulary of specific advertisers. Based on massive previous advertising data and current ad material, we propose an Ad-Profile-based Title Generation Network (APTGN) to automatically generate personalized titles for ads. The model utilizes massive advertising data and current ad material to construct a profile for each ad, which is further integrated into the generation model to help recognize the preferences of specific ads. Automatic evaluation metrics and online A/B testing both show that our model significantly outperforms all the baselines, increasing the adoption rate of recommendation titles by 27.22%. Through our deployed model, once an advertiser needs to customize an ad title for their products, satisfactory titles can be recommended automatically without bothering to write any words.

Jingbing Wang, Zhuolin Hao, Minping Zhou, Jiaze Chen, Hao Zhou, Zhenqiao Song, Jinghao Wang, Jiandong Yang, Shiguang Ni

Parasitic Network: Zero-Shot Relation Extraction for Knowledge Graph Populating

The relation tuple is the basic unit of the knowledge graph. Conventional relation extraction methods can only identify limited relation classes and not recognize the unseen relation types that have no pre-labeled training data. In this paper, we explore the zero-shot relation extraction to overcome the challenge. The only requisite information about an unseen type is the label name. We propose a Parasitic Neural Network (PNN), where unseen types are parasitic on seen types to get automatic annotation and training. The model learns a mapping between the feature representations of text samples and the distributions of unseen types in a shared semantic space. Experiment results show that our model significantly outperforms others on the unseen relation extraction task and achieves effect improvement of more than 20% when there are not any manual annotations or additional resources. This model, with good performance and fast implementation, can support the industrial knowledge graph populating.

Shengbin Jia, E. Shijia, Ling Ding, Xiaojun Chen, LingLing Yao, Yang Xiang

Graph Attention Networks for New Product Sales Forecasting in E-Commerce

Aiming to discover competitive new products, sales forecasting has been playing an increasingly important role in real-world E-Commerce systems. Current methods either only utilize historical sales records with time series based models, or train powerful classifiers (e.g., DNN and GBDT) with subtle feature engineering. Despite effectiveness, they have limited abilities to make prediction for new products due to the sparsity of product-related features. With the observation on real-world data, we find that some additional time series features (e.g., brand and category) implying product characteristics also play vital roles in new product sales forecasting. Hence, we organize them as a new kind of dense feature called CPV (Category-Property-Value) and propose a Time Series aware Heterogeneous Graph (TSHG) to integrate CPVs and products based time series into a unified framework for fine-grained interaction. Furthermore, we propose a novel Graph Attention Networks based new product Sales Forecasting model (GASF) that jointly exploits high-order structure and time series features derived from THSG for new product sales forecasting with graph attention networks. Moreover, a multi trend attention (MTA) mechanism is also proposed to solve temporal shifting and spatial inconsistency between the time series of products and CPVs. Extensive experiments on an industrial dataset and online system demonstrate the effectiveness of our proposed approaches.

Chuanyu Xu, Xiuchong Wang, Binbin Hu, Da Zhou, Yu Dong, Chengfu Huo, Weijun Ren

Transportation Recommendation with Fairness Consideration

Recent years have witnessed the widespread use of online map services to recommend transportation routes involving multiple transport modes, such as bus, subway, and taxi. However, existing transportation recommendation services mainly focus on improving the overall user click-through rate that is dominated by mainstream user groups, and thus may result in unsatisfactory recommendations for users with diversified travel needs. In other words, different users may receive unequal services. To this end, in this paper, we first identify two types of unfairness in transportation recommendation, (i) the under-estimate unfairness which reflects lower recommendation accuracy (i.e., the quality), and (ii) the under-recommend unfairness which indicates lower recommendation volume (i.e., the quantity) for users who travel in certain regions and during certain time periods. Then, we propose the Fairness-Aware Spatiotemporal Transportation Recommendation (FASTR) framework to mitigate the transportation recommendation bias. In particular, based on a multi-task wide and deep learning model, we propose the dual-focal mechanism for under-estimate mitigation and tailor-designed spatiotemporal fairness metrics and regularizers for under-recommend mitigation. Finally, extensive experiments on two real-world datasets verify the effectiveness of our approach to handle these two types of unfairness.

Ding Zhou, Hao Liu, Tong Xu, Le Zhang, Rui Zha, Hui Xiong

Constraint-Adaptive Rule Mining in Large Databases

Decision rules are widely used due to their interpretability, efficiency, and stability in various applications, especially for financial tasks, such as fraud detection and loan assessment. In many scenarios, it is highly demanded to generate decision rules under some specific constraints. However, the performance, efficiency, and adaptivity of previous methods, which take no consideration of these constraints, is far from satisfactory in these scenarios, especially when the constraints are relatively tight. In this paper, to deal with this problem, we propose a constraint-adaptive rule mining algorithm named CARM (Constraint Adaptive Rule Mining), which is a novel decision tree based model. To provide a practical balance between purity and constraint fitness when building the trees, an adaptive criterion is designed and applied to better meet the constraints. Besides, a rule extraction and pruning process is applied to satisfy the constraints and further alleviate the overfitting problem. In addition, to improve the coverage, an iterative covering framework is proposed in this paper. Experiments on both public and business data sets show that the proposed method is able to achieve better performance, competitive efficiency, as well as low rule complexity when comparing with other methods.

Meng Li, Ya-Lin Zhang, Qitao Shi, Xinxing Yang, Qing Cui, Longfei Li, Jun Zhou

Demo Papers

Frontmatter

FedTopK: Top-K Queries Optimization over Federated RDF Systems

Recently, how to evaluate SPARQL queries over federated RDF systems has become a hot research topic. However, most existing studies mainly focus on implementing and optimizing the basic queries over federated SPARQL systems, and few of them discuss top-k queries. To remedy this defect, this demo designs a system named FedTopK that can support top-k queries over federated RDF systems. FedTopK employs a cost-based optimal query plan generation algorithm and a query plan execution optimization strategy to minimize the top-k query cost. In addition, FedTopK uses a query decomposition optimization scheme which allow merge triple patterns with the same multi-sources into one subquery to reduce the remote access times. Experimental studies over real federated RDF datasets show that the demo is efficient.

Ningchao Ge, Zheng Qin, Peng Peng, Lei Zou

Shopping Around: CoSurvey Helps You Make a Wise Choice

When shopping online, customers usually compare commodities with each other before making their purchase decision. In addition to the product price, they also concern the word-of-mouth. However, marketing strategies from various e-commerce platforms, along with the diverse online commodities, make it difficult for customers to distinguish the most cost-effective products. Present cross-platform commodity comparison applications merely focus on product prices, without jointly concerning the reviews. In this demonstration, we developed a web-based application, CoSurvey, which matches commodities from various e-commerce platforms and analyzes product comment sentiment on the base of the proposed Attention-BiLSTM-CNN Model. The model uses an attention-based Bi-LSTM network to learn sentence sequence information, uses a CNN to learn sentence structure information, and uses a multilayer perceptron (MLP) to learn meta-information. The meta-information in the comment sentiment analysis task includes comment’s like number, reviewer level, additional image, deliver time, and sentence length. Besides the keyword query, CoSurvey provides customers a survey of cross-platform products price changing trends and comment sentiment evolutions. The high concurrency requirements and load balance are also concerned.

Qinhui Chen, Liping Hua, Junjie Wei, Hui Zhao, Gang Zhao

IntRoute: An Integer Programming Based Approach for Best Bus Route Discovery

An efficient data-driven public transportation system can improve urban potency. In this research, we propose IntRoute, an Integer Programming (IP) based approach to optimize bus route planning. Specifically, IntRoute first contracts bus stops via clustering and then derives a new bus route via a mixed integer linear program (ILP). This two-phase strategy brings three major merits, i.e., a single bus route without any transfer, the minimal total time consuming, and an efficient optimization algorithm for large-scale problems. Experimental results show that our IntRoute significantly reduces the traditional commuting time in Sydney from 31.53 min down to 18.06 min on average.

Chang-Wei Sung, Xinghao Yang, Chung-Shou Liao, Wei Liu

NRCP-Miner: Towards the Discovery of Non-redundant Co-location Patterns

Co-location pattern mining, which refers to discovering neighboring spatial features in geographic space, is an interesting and important task in spatial data mining. However, in practice, the usefulness of prevalent (interesting) co-location patterns generated by traditional frameworks is strongly limited by their huge amount, which may affect the user’s following decisions. To address this issue, in this demonstration, we present a novel schema, named NRCP-Miner, aiming at the redundancy reduction for prevalent co-location patterns, i.e., discovering non-redundant co-location patterns by utilizing the spatial distribution information of co-location instances. NRCP-Miner can effectively remove the redundant patterns contained in prevalent co-location patterns, thus furtherly assists the user to make the following decisions. We evaluated the efficiency of NRCP-Miner compared with related state-of-the-art approaches.

Xuguang Bao, Jinjie Lu, Tianlong Gu, Liang Chang, Lizhen Wang

ARCA: A Tool for Area Calculation Based on GPS Data

In this paper, we develop a tool to efficiently and effectively calculate agricultural machinery’s working area based on farming machinery’s GPS data. The tool works as follows. First, we pre-process GPS data by removing duplicate data, abnormal data and invalid data. Data projection is performed using Gauss-Kruger and the minimum value after projection is used for data transforming and shifting. Second, the tool operates farming machinery trajectory fitting. Finally, an algorithm of area calculation is developed to form the farming machinery’s area based on trajectory data produced in the first two steps. The algorithm achieves an error rate 0.29%, and takes 0.03 s to process about 60 GPS records collected in one minute.

Sujing Song, Jie Sun, Jianqiu Xu

LSTM Based Sentiment Analysis for Cryptocurrency Prediction

Recent studies in big data analytics and natural language processing develop automatic techniques in analyzing sentiment in the social media information. In addition, the growing user base of social media and the high volume of posts also provide valuable sentiment information to predict the price fluctuation of the cryptocurrency. This research is directed to predicting the volatile price movement of cryptocurrency by analyzing the sentiment in social media and finding the correlation between them. While previous work has been developed to analyze sentiment in English social media posts, we propose a method to identify the sentiment of the Chinese social media posts from the most popular Chinese social media platform Sina-Weibo. We develop the pipeline to capture Weibo posts, describe the creation of the crypto-specific sentiment dictionary, and propose a long short-term memory (LSTM) based recurrent neural network along with the historical cryptocurrency price movement to predict the price trend for future time frames. The conducted experiments demonstrate the proposed approach outperforms the state of the art auto regressive based model by 18.5% in precision and 15.4% in recall.

Xin Huang, Wenbin Zhang, Xuejiao Tang, Mingli Zhang, Jayachander Surbiryala, Vasileios Iosifidis, Zhen Liu, Ji Zhang

SQL-Middleware: Enabling the Blockchain with SQL

With the development of blockchain, blockchain has a broad prospect as a new type of data management system. However, limited to the data modeling method of blockchain, the usability of blockchain is restricted; In addition, every blockchain system has its own native but naive interfaces, when developing based on the different blockchain systems, which will leads to low development efficiency and high development costs. In this study, we construct a SQL-Middleware for blockchain system to solve these problems. The SQL-Middleware first performs relational modeling of blockchain data, mapping the blockchain data into a relational table; On the basis of modeling the blockchain data, SQL-Middleware encapsulates a set of SQL interfaces for blockchain system, thus realizing the unification of interface access methods of different blockchain systems. At last, we implement the SQL-Middleware based on the open source blockchain system CITA. Demonstration shows that the SQL-Middleware greatly improves the data management capabilities of blockchain and simplifies the blockchain access steps.

Xing Tong, Haibo Tang, Nan Jiang, Wei Fan, Yichen Gao, Sijia Deng, Zhao Zhang, Cheqing Jin, Yingjie Yang, Gang Qin

Loupe: A Visualization Tool for High-Level Execution Plans in SystemDS

The declarative programming language in SystemDS simplifies users to implement machine learning algorithms. It is able to generate execution jobs on different data processing engines including MapReduce and Spark. The GUI in data processing engines typical visualizes the low-level execution process (e.g., RDD transformation in Spark). However, the low-level description in Spark GUI does not show the relationship between DML operations and RDD primitives. In this work, we propose Loupe, a tool to visualize high-level execution plans in SystemDS to ease users to understand the execution process. This paper introduces the design of the tool and demonstrates a visualization case.

Zhizhen Xu, Zihao Chen, Chen Xu

Ph.D Consortium

Frontmatter

Algorithm Fairness Through Data Inclusion, Participation, and Reciprocity

Learning algorithms have become the basis of decision making and the modern tool of assessment in all spares of human endeavours. Consequently, several competing arguments about the reliability of learning algorithm remain at AI global debate due to concerns about arguable algorithm biases such as data inclusiveness bias, homogeneity assumption in data structuring, coding bias etc., resulting from human imposed bias, and variance among many others. Recent pieces of evidence (computer vision - misclassification of people of colour, face recognition, among many others) have shown that there is indeed a need for concerns. Evidence suggests that algorithm bias typically can be introduced to learning algorithm during the assemblage of a dataset; such as how the data is collected, digitized, structured, adapted, and entered into a database according to human-designed cataloguing criteria. Therefore, addressing algorithm fairness, bias and variance in artificial intelligence imply addressing the training set bias. We propose a framework of data inclusiveness, participation and reciprocity.

Olalekan J. Akintande

Performance Issues in Scheduling of Real-Time Transactions

The multi-site real-time transactional data-analysis based applications and the underlying research efforts to improve the performance of such applications have got renewed attention by researchers in the last four years. It reveals that the current scenario possesses numerous unanswered and truly relevant issues and challenges requiring a multi-disciplinary research approach to work on and solve the core database transaction processing related issues. Our focus is to cover most of the issues and challenges with transaction scheduling algorithms in one place to put out the current research status. At a high level, the domains covered are—real-time priority assignment heuristics, real-time concurrency control protocols, and real-time commit processing. The article indeed guides towards the immediate-future directions requiring actions/ efforts by the modern data-driven research community.

Sarvesh Pandey, Udai Shanker

Semantic Integration of Heterogeneous and Complex Spreadsheet Tables

A great number of companies and institutions use spreadsheets for managing, publishing and sharing their data. Though effective, spreadsheets are mainly designed for being interpreted by humans, and the automatic extraction of their content and interpretation is a complex task. The task becomes even harder when tables present different kinds of mistakes and their layout is complex. In this paper, we outline the approach that we wish to develop during the PhD for answering the research question “how to semi-automatically extract coherent semantic information from heterogeneous and complex spreadsheets?”.

Sara Bonfitto

Abstract Model for Multi-model Data

In recent years, many so-called multi-model database management systems have emerged, mainly as extensions of the existing single-model systems, regardless they used to be relational or NoSQL. These new database systems make new demands on their users. From the point of view of the conceptual and logical representation, the so far widely used approaches, especially ER and UML, prove not to be sufficient enough in many aspects due to the specific properties of multi-model data. In addition, it is also difficult to query data that is represented in various and often overlapping data models at the logical level.

Pavel Čontoš

User Preference Translation Model for Next Top-k Items Recommendation with Social Relations

Recommendation systems are used to predict the interests of users through the analysis of historical preferences. Collaborative filtering-based approaches usually ignore the sequential information and sequential recommendation usually focus on the next item prediction. In this work, we would like to determine the next top-k recommendation problem. We propose User Preference Translation Model (UPTM) with item influence embedding and social relations between users. In addition, we will also solve the cold start problem in UPTM.

Hao-Shang Ma, Jen-Wei Huang

Backmatter

Titel: Database Systems for Advanced Applications
herausgegeben von: Christian S. Jensen
Ee-Peng Lim
De-Nian Yang
Wang-Chien Lee
Vincent S. Tseng
Vana Kalogeraki
Jen-Wei Huang
Chih-Ya Shen
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-73200-4
Print ISBN: 978-3-030-73199-1
DOI: https://doi.org/10.1007/978-3-030-73200-4