Top

2021 | Book

Database Systems for Advanced Applications. DASFAA 2021 International Workshops

BDQM, GDMA, MLDLDSA, MobiSocial, and MUST, Taipei, Taiwan, April 11–14, 2021, Proceedings

Editors: Christian S. Jensen, Ee-Peng Lim, De-Nian Yang, Chia-Hui Chang, Dr. Jianliang Xu, Wen-Chih Peng, Jen-Wei Huang, Chih-Ya Shen

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This volume constitutes the papers of several workshops which were held in conjunction with the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.

The 29 revised full papers presented in this book were carefully reviewed and selected from 84 submissions.

DASFAA 2021 presents the following five workshops:

6th International Workshop on Big Data Quality Management (BDQM 2021)

5th International Workshop on Graph Data Management and Analysis (GDMA 2021)

First International Workshop on Machine Learning and Deep Learning for Data Security Applications (MLDLDSA 2021)

6th International Workshop on Mobile Data Management, Mining, and Computing on Social Network (MobiSocial 2021)

2021 International Workshop on Mobile Ubiquitous Systems and Technologies (MUST 2021)

Due to the Corona pandemic this event was held virtually.

Frontmatter

The 6th International Workshop on Big Data Quality Management

Frontmatter

ASQT: An Efficient Index for Queries on Compressed Trajectories

Abstract

Nowadays, the amount of GPS-equipped devices is increasing dramatically and they generate raw trajectory data constantly. Many location-based services that use trajectory data are becoming increasingly popular in many fields. However, the amount of raw trajectory data is usually too large. Such a large amount of data is expensive to store, and the cost of transmitting and processing is quite high. To address these problems, the common method is to use compression algorithms to compress trajectories. This paper proposes a high efficient spatial index named ASQT, which is a quadtree index with adaptability. And based on ASQT, we propose a range query processing algorithm and a top-k similarity query processing algorithm. ASQT can effectively speed up both the trajectory range query processing and similarity query processing on compressed trajectories. Extensive experiments are done on a real dataset and results show the superiority of our methods.

Binghao Wang, Hongbo Yin, Kaiqi Zhang, Dailiang Jin, Hong Gao

ROPW: An Online Trajectory Compression Algorithm

Abstract

In smart phones, vehicles and wearable devices, GPS sensors are ubiquitous, which can collect a large amount of valuable trajectory data by tracking moving objects. Analysis of this valuable trajectory data can benefit many practical applications, such as route planning and transportation optimization. However, unprecedented large-scale GPS data poses a challenge to the effective storage of trajectories. Therefore, the necessity of trajectory compression (also called trajectory sampling) is reflected. However, the latest compression methods usually perform unsatisfactorily in terms of space-time complexity or compression rate, which leads to rapid exhaustion of memory, computing, storage, and energy. In response to this problem, this paper proposes an online trajectory compression algorithm (ROPW algorithm) with error bounded that traverses the sliding window backwards. This algorithm has significantly improved the trajectory compression rate, and its average time complexity and space complexity is O(NlogN) and O(1) respectively. Finally, we conducted experiments on three real data sets to verify that the ROPW algorithm performed very well in terms of compression rate and time efficiency.

Sirui Li, Kaiqi Zhang, Hongbo Yin, Dan Yin, Hongquan Zu, Hong Gao

HTF: An Effective Algorithm for Time Series to Recover Missing Blocks

Abstract

With the popularity of time series analysis, failure during data recording, transmission, and storage makes missing blocks in time series a problem to be solved. Therefore, it is of great significance to study effective methods to recover missing blocks in time series for better analysis and mining. In this paper, we focus on the situation of continuous missing blocks in multivariate time series. Aiming at the blackout missing block pattern, we propose a method called hankelized tensor factorization (HTF), based on singular spectrum analysis (SSA). After the hankelization of the time series, this method decomposes the intermediate result into the product of time-evolving embedding, time delaying embedding, and hidden variables embedding of multivariate variables in the low-dimensional space, to learn the essence of time series. In an experimental benchmark containing 5 data sets, the recovery effect of HTF and other baseline methods in three missing block patterns are compared to evaluate the performance of HTF. Results show that when the missing block pattern is blackout, the HTF method achieves the best recovery effect, and it can also have good results for other missing patterns.

Haijun Zhang, Hong Gao, Dailiang Jin

LAA: Inductive Community Detection Algorithm Based on Label Aggregation

Abstract

The research task of discovering nodes sharing the same attributes and dense connection is community detection, which has been proved to be a useful tool for network analysis. However, the existing approaches are transductive, even for original networks with structures or attributes changed, retraining was required to get the results. The rapid changes and explosive growth of information makes real-world application have great expectations for inductive community detection models that can quickly obtain results. In this paper, we proposed Label Aggregation Algorithm (LAA), an inductive community detection algorithm based on label aggregation. Like the traditional label propagation algorithm, LAA uses labels to indicate the community to which the node belongs. The difference is that LAA takes the advantages of network representation learning’s ability for information aggregation to generate nodes’ final labels by aggregating the labels propagated from local neighbors. The experimental results show that LAA has excellent generalization capabilities to handle overlapping community detection task.

Zhuocheng Ma, Dan Yin, Chanying Huang, Qing Yang, Haiwei Pan

Modeling and Querying Similar Trajectory in Inconsistent Spatial Data

Abstract

Querying clean spatial data is well-studied in database area. However, methods for querying clean data often fail to process the queries on inconsistent spatial data. We develop a framework for querying similar trajectories inconsistent spatial data. For any given entity, our method will provide a way to query its similar trajectories in the inconsistent spatial data. We propose a dynamic programming algorithm and a threshold filter for probabilistic mass function. The algorithm with the filter reduces the expensive cost of processing query by directly using the existing similar trajectory query algorithm designed for clean data. The effectiveness and efficiency of our algorithm are verified by experiments.

Weijia Feng, Yuran Geng, Ran Li, Maoyu Jin, Qiyi Tan

The 5th International Workshop on Graph Data Management and Analysis

Frontmatter

ESTI: Efficient k-Hop Reachability Querying over Large General Directed Graphs

Abstract

As a fundamental task in graph data mining, answering k-hop reachability queries is useful in many applications such as analysis of social networks and biological networks. Most of the existing methods for processing such queries can only deal with directed acyclic graphs (DAGs). However, cycles are ubiquitous in lots of real-world graphs. Furthermore, they may require unacceptable indexing space or expensive online search time when the input graph becomes very large. In order to solve k-hop reachability queries for large general directed graphs, we propose a practical and efficient method named ESTI (Extended Spanning Tree Index). It constructs an extended spanning tree in the offline phase and speeds up online querying based on three carefully designed pruning rules over the built index. Extensive experiments show that ESTI significantly outperforms the state-of-art in online querying, while ensuring a linear index size and stable index construction time.

Yuzheng Cai, Weiguo Zheng

NREngine: A Graph-Based Query Engine for Network Reachability

Abstract

A quick and intuitive understanding of network reachability is of great significance for network optimization and network security management. In this paper, we propose a query engine called NREngine for network reachability when considering the network security policies. NREngine constructs a knowledge graph based on the network security policies and designs an algorithm over the graph for the network reachability. Furthermore, for supporting a user-friendly interface, we also propose a structural query language named NRQL in NREngine for the network reachability query. The experimental results show that NREngine can efficiently support a variety of network reachability query services.

Wenjie Li, Lei Zou, Peng Peng, Zheng Qin

An Embedding-Based Approach to Repairing Question Semantics

Abstract

A question with complete semantics can be answered correctly. In other words, it contains all the basic semantic elements. In fact, the problem is not always complete due to the ambiguity of the user’s intentions. Unfortunately, there is very little research on this issue. In this paper, we present an embedding-based approach to completing question semantics by inspiring from knowledge graph completion based on our proposed representation of a complete basic question as unique type and subject and multiple possible constraints. Firstly, we propose a back-and-forth-based matching method to acknowledge the question type as well as a word2vec-based method to extract all constraints via question subject and its semantic relevant in knowledge bases. Secondly, we introduce a time-aware recommendation to choose the best candidates from vast possible constraints for capturing users’ intents precisely. Finally, we present constraint-independence-based attention to generate complete questions naturally. Experiments verifies the effectiveness of our approach.

Haixin Zhou, Kewen Wang

Hop-Constrained Subgraph Query and Summarization on Large Graphs

Abstract

We study the problem of hop-constrained relation discovery in a graph, i.e., finding the structural relation between a source node s and a target node t within k hops. Previously studied \(s-t\) graph problems, such as distance query and path enumeration, fail to reveal the \(s-t\) relation as a big picture. In this paper, we propose the k-hop \(s-t\) subgraph query, which returns the subgraph containing all paths from s to t within k hops. Since the subgraph may be too large to be well understood by the users, we further present a graph summarization method to uncover the key structure of the subgraph. Experiments show the efficiency of our algorithms against the existing path enumeration based method, and the effectiveness of the summarization.

Yu Liu, Qian Ge, Yue Pang, Lei Zou

Ad Click-Through Rate Prediction: A Survey

Abstract

Ad click-through rate prediction (CTR), as an essential task of charging advertisers in the field of E-commerce, provides users with appropriate advertisements according to user interests to increase users’ click-through rate based on user clicks. The performance of CTR models plays a crucial role in advertising. Recently, there are many approaches to improving the performance of CTR. In this paper, we present a survey to analyze state-of-art models of CTR via types of models comprehensively. Finally, we summarize some practical challenges and then open perspective problems of CTR.

Liqiong Gu

An Attention-Based Approach to Rule Learning in Large Knowledge Graphs

Abstract

This paper presents a method for rule learning in large knowledge graphs. It consists of an effective sampling of large knowledge graphs (KGs) based on the attention mechanism. The attention-based sampling is designed to reduce the search space of rule extraction and thus to improve efficiency of rule learning for a given target predicate. An implementation ARL (Attention-based Rule Learner) of rule learning for KGs is obtained by combining the new sampling with the advanced rule miner AMIE+. Experiments have been conducted to demonstrate the efficiency and efficacy of our method for both rule learning and KG completion, which show that ARL is very efficient for rule learning in large KGs while the precision is still comparable to major baselines.

Minghui Li, Kewen Wang, Zhe Wang, Hong Wu, Zhiyong Feng

The 1st International Workshop on Machine Learning and Deep Learning for Data Security Applications

Frontmatter

Multi-scale Gated Inpainting Network with Patch-Wise Spacial Attention

Abstract

Recently, deep-model-based image inpainting methods have achieved promising results in the realm of image processing. However, the existing methods produce fuzzy textures and distorted structures due to ignoring the semantic relevance and feature continuity of the holes region. To address this challenge, we propose a detailed depth generation model (GS-Net) equipped with a Multi-Scale Gated Holes Feature Inpainting module (MG) and a Patch-wise Spacial Attention module (PSA). Initially, the MG module fills the hole area globally and concatenates to the input feature map. Then, the module utilizes a multi-scale gated strategy to adaptively guide the information propagation at different scales. We further design the PSA module, which optimizes the local feature mapping relations step by step to clarify the image texture information. Not only preserving the semantic correlation among the features of the holes, the methods can also effectively predict the missing part of the holes while keeping the global style consistency. Finally, we extend the spatially discounted weight to the irregular holes and assign higher weights to the spatial points near the effective areas to strengthen the constraint on the hole center. The extensive experimental results on Places2 and CelebA have revealed the superiority of the proposed approaches.

Xinrong Hu, Junjie Jin, Mingfu Xiong, Junping Liu, Tao Peng, Zili Zhang, Jia Chen, Ruhan He, Xiao Qin

An Improved CNNLSTM Algorithm for Automatic Detection of Arrhythmia Based on Electrocardiogram Signal

Abstract

Arrhythmia is one of the most common types of cardiovascular disease and poses a significant threat to human health. An electrocardiogram (ECG) assessment is the most commonly used method for the clinical judgment of arrhythmia. Using deep learning to detect an ECG automatically can improve the speed and accuracy of such judgment. In this paper, an improved arrhythmia classification method named CNN-BiLSTM, based on convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM), is proposed that can automatically identify four types of ECG signals: normal beat (N), premature ventricular contraction (V), left bundle branch block beat (L), and right bundle branch block beat (R). Compared with traditional CNN and BiLSTM models, CNN-BiLSTM can extract the features and dependencies before and after data processing better to achieve a higher classification accuracy. The results presented in this paper demonstrate that an arrhythmia classification method based on CNN-BiLSTM achieves a good performance and has potential for application.

Jingyao Zhang, Fengying Ma, Wei Chen

Cross-Domain Text Classification Based on BERT Model

Abstract

Diversity of structure and classification are difficulties for information security data. With the popularization of big data technology, cross-domain text classification becomes increasingly important for the information security domain. In this paper, we propose a new text classification structure based on the BERT model. Firstly, the BERT model is used to generate the text sentence vector, and then we construct the similarity matrix by calculating the cosine similarity. Finally, the k-means and mean-shift clustering are used to extract the data feature structure. Through this structure, clustering operations are performed on the benchmark data set and the actual problems. The text information can be classified, and the effective clustering results can be obtained. At the same time, clustering evaluation indicators are used to verify the performance of the model on these datasets. Experimental results demonstrate the effectiveness of the proposed structure in the two indexes Silhouette coefficient and Calinski-Harabaz.

Kuan Zhang, Xinhong Hei, Rong Fei, Yufan Guo, Rui Jiao

Surface Defect Detection Method of Hot Rolling Strip Based on Improved SSD Model

Abstract

In order to reduce the influence of surface defects on the performance and appearance of hot-rolled steel strip, a surface defect detection method combining attention mechanism and multi-feature fusion network was proposed. In this method, the traditional SSD model was used as the basic framework, and the ResNet50 network after knowledge distillation was selected as the feature extraction network. The low-level features and high-level features were fused and complementary to improve the accuracy of detection. In addition, channel attention mechanism was introduced to filter and retain important information, which reduced the network computation and improves the network detection speed. The experimental results showed that the accuracy of RAF-SSD model for surface defect detection of hot rolled steel strip was significantly higher than that of traditional deep learning models, and the detection speed was 12.9% higher than that of SSD model, which can meet the real-time requirements of industrial detection.

Xiaoyue Liu, Jie Gao

Continuous Keystroke Dynamics-Based User Authentication Using Modified Hausdorff Distance

Abstract

Continuous keystroke dynamics-based user authentication methods are one of the most perspective means of user authentication in computer systems. Such methods do not require specialized equipment and allow detection of user change anytime during a user session. In this paper, we explore new approaches to solving the problem based on Hausdorff distance and its modification, including a new method, the sum of maximum coordinate deviations. We compare proposed methods to existing ones that are based on distance functions defined in feature space, statistical criteria, and neural networks. Based on the experiments, we observe that the proposed method based on the sum of maximum coordinate deviations with k nearest feature vector selection reports the highest accuracy of all reviewed methods.

Maksim Zhuravskii, Maria Kazachuk, Mikhail Petrovskiy, Igor Mashechkin

Deep Learning-Based Dynamic Community Discovery

Abstract

Recurrent neural networks (RNNs) have been effective methods for time series analyses. The network representation learning model and method based on deep learning can excellently analyze and predict the community structure of social networks. However, the node relationships of complex social networks in the real world often change over time. Therefore, this study proposes a dynamic community discovery method based on a recurrent neural network, which includes (1) spatio-temporal structure reconstruction strategy; (2) spatio-temporal feature extraction model; (3) dynamic community discovery method. Recurrent neural networks can be used to obtain the time features of the community network and help us build the network time feature extraction model. In this study, the recurrent neural network model is introduced into the time series feature learning of dynamic networks. This research constructs a network spatiotemporal feature learning model combining RNN, convolutional neural networks (CNN), and auto-encoder (AE), and then uses it to explore the dynamic community structure on the spatiotemporal feature vector. The experiment chose the Email-Enron data set of the Stanford Network Analysis Platform (SNAP) website to evaluate the method. The experimental results show that the proposed method has higher modularity than Auto-encoder in the dynamic community discovery of the real social network data set. Therefore, the dynamic community discovery method based on the recurrent neural network can be applied to analyze social networks, extract the time characteristics of social networks, and further improve the modularity of the community structure.

Ling Wu, Yubin Ouyang, Cheng Shi, Chi-Hua Chen

6th International Workshop on Mobile Data Management, Mining, and Computing on Social Network

Frontmatter

Deep Attributed Network Embedding Based on the PPMI

Abstract

The attributed network embedding aims to learn the latent low-dimensional representations of nodes, while preserving the neighborhood relationship of nodes in the network topology as well as the similarities of attribute features. In this paper, we propose a deep model based on the positive point-wise mutual information (PPMI) for attributed network embedding. In our model, attribute features are transformed into an attribute graph, such that attribute features and network topology can be handled in the same way. Then, we perform the random surfing and calculate the PPMI on the attribute/topology graph to effectively maintain the structural characteristics and the high-order proximity information. The node representations are learned by a shared Auto-Encoder. Besides, the local pairwise constraint is used in the shared Auto-Encoder to improve the quality of node representations. Extensive experimental results on four real-world networks show the superior performance of the proposed model over the 10 baselines.

Kunjie Dong, Tong Huang, Lihua Zhou, Lizhen Wang, Hongmei Chen

Discovering Spatial Co-location Patterns with Dominant Influencing Features in Anomalous Regions

Abstract

As one of the important exogenous factors that induce malignant tumors, environmental pollution poses a major threat to human health. In recent years, more and more studies have begun to use data mining techniques to explore the relationships among them. However, these studies tend to explore universally applicable pattern in the entire space, which will take a high time and space cost, and the results are blind. Therefore, this paper first divides the spatial data set, then combined with the attenuation effect of pollution influence with increasing distance, we proposed the concept of high-impact anomalous spatial co-location region mining. In these regions, industrial pollution sources and malignant tumor patients have a higher co-location degree. In order to better guide the actual work, the pollution factors that have a decisive influence on the occurrence of malignant tumors in the pattern is explored. Finally, a highly targeted new method to explore the dominant influencing factors when multiple pollution sources act on a certain tumor disease at the same time is proposed. And extensive experiments have been conducted on real and synthetic data sets. The results show that our method greatly improves the efficiency of mining while obtaining effective conclusions.

Lanqing Zeng, Lizhen Wang, Yuming Zeng, Xuyang Li, Qing Xiao

Activity Organization Queries for Location-Aware Heterogeneous Information Network

Abstract

Activity organization query for Location-Based Social Networks is an important research problem, which aims at selecting a suitable group of people relevant to the query user and activity according to their social and spatial information. However, current activity organization queries mainly consider simplistic and direct relationships to measure the relevance. Although Heterogeneous Information Networks capture complex relationships by meta-structures, the relevance is seldom measured from both social and spatial aspects and does not take the distinctiveness of meta-structure into account. To fill this gap, we first propose a new relevance measurement, named SIMER to more accurately measure the connection strength. Then, we formulate a new query, named MHS2Q, which considers the social and spatial factors as well as the distinctiveness of meta-structures. Furthermore, we extend the MHS2Q to Subsequent MHS2Q, to consider a series of queries with varying spatial constraints. We design an efficient algorithm MS2MU to answer the (Subsequent) MHS2Q, which exploits a new index structure named d-Table to boost the computation for subsequent queries, and a pruning strategy, MSR-pruning to avoid unnecessary computation. Experiments on real LBSNs show that MS2MU is more effective to retrieve a social group that is both relevant and socially tight to the query.

C. P. Kankeu Fotsing, Ya-Wen Teng, Sheng-Hao Chiang, Yi-Shin Chen, Bay-Yuan Hsu

Combining Oversampling with Recurrent Neural Networks for Intrusion Detection

Abstract

Previous studies on intrusion detection focus on analyzing features from existing datasets. With various types of fast-changing attacks, we need to adapt to new features for effective protection. Since the real network traffic is very imbalanced, it’s essential to train appropriate classifiers that can deal with rare cases. In this paper, we propose to combine oversampling techniques with deep learning methods for intrusion detection in imbalanced network traffic. First, after preprocessing with data cleaning and normalization, we use feature importance weights generated from ensemble decision trees to select important features. Then, the Synthetic Minority Oversampling Technique (SMOTE) is used for creating synthetic samples from minority class. Finally, we use Recurrent Neural Networks (RNNs) including Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) for classification. In our experimental results, oversampling improves the performance of intrusion detection for both machine learning and deep learning methods. The best performance can be obtained for CIC-IDS2017 dataset using LSTM classifier with an F1-score of 98.9%, and for CSE-CIC-IDS2018 dataset using GRU with an F1-score of 98.8%. This shows the potential of our proposed approach in detecting new types of intrusion from imbalanced real network traffic.

Jenq-Haur Wang, Tri Wanda Septian

Multi-head Attention with Hint Mechanisms for Joint Extraction of Entity and Relation

Abstract

In this paper, we propose a joint extraction model of entity and relation from raw texts without relying on additional NLP features, parameter threshold tuning, or entity-relation templates as previous studies do. Our joint model combines the language modeling for entity recognition and multi-head attention for relation extraction. Furthermore, we exploit two hint mechanisms for the multi-head attention to boost the convergence speed and the F1 score of relation extraction. Extensive experiment results show that our proposed model significantly outperforms baselines by having higher F1 scores on various datasets. We also provide ablation tests to analyze the effectiveness of components in our model.

Chih-Hsien Fang, Yi-Ling Chen, Mi-Yen Yeh, Yan-Shuo Lin

Maximum (L, K)-Lasting Cores in Temporal Social Networks

Abstract

Extracting dense structures in a social network is a fundamental task in graph mining and can find many real-world applications. The temporal social network augments the conventional social network with the temporal dimension, and extracting dense structures enables us to understand the period of time for which the dense structures exist. In this paper, we propose the new notion of (L, K)-lasting core, which is a densely connected subgraph lasting for a sufficiently long period of time in the temporal social network. We propose a polynomial-time algorithm to obtain the maximum (L, K)-lasting core with various processing strategies to boost the efficiency. We conduct extensive experiments on multiple datasets to validate the effectiveness and efficiency of the proposed approach. The experimental results show that our proposed approaches outperform the other baseline approaches in terms of solution quality and efficiency.

Wei-Chun Hung, Chih-Ying Tseng

The 2021 International Workshop on Mobile Ubiquitous Systems and Technologies

Frontmatter

A Tablet-Based Game Tool for Cognition Training of Seniors with Mild Cognitive Impairment

Abstract

The purpose of this study is to examine the acceptability and effectiveness of a cognitive training game application targeting elderly adults with Mild Cognitive Impairment. Such kind of impairment signifies one of the earliest stages of dementia and Alzheimer diseases. Ten serious games were designed and developed in the Android platform to train cognitive functions such as Attention, Visual Memory, Observation, Acoustic Memory, Language, Calculations, Orientation and Sensory Awareness. In particular this paper examines the feasibility of playing such games with the participation of a group of seniors (N = 6) in a pilot study. Usability assessment was also performed by collecting qualitative and quantitative data. Another dimension that investigated was the possibility of using this game tool as an alternative to traditional methods for evaluating cognitive functions. The results show that participants could learn quickly and understand the game mechanics. They also found the games easy to use and showed high enjoyment.

Georgios Skikos, Christos Goumopoulos

AntiPhiMBS-Auth: A New Anti-phishing Model to Mitigate Phishing Attacks in Mobile Banking System at Authentication Level

Abstract

In the era of digital banking, the advent of the latest technologies, utilization of social media, and mobile technologies became prime parts of our digital lives. Unfortunately, phishers exploit digital channels to collect login credentials from users and impersonate them to log on to the victim systems to accomplish phishing attacks. This paper proposes a novel anti-phishing model for Mobile Banking System at the authentication level (AntiPhiMBS-Auth) that averts phishing attacks in the mobile banking system. This model employs a novel concept of a unique id for authentication and application id that is known to users, banking app, and mobile banking system only. Phishers and phishing apps do not know the unique id or the application id, and consequently, this model mitigates the phishing attack in the mobile banking system. This paper utilized a process meta language (PROMELA) to specify system descriptions and security properties and built a verification model of AntiPhiMBS-Auth. The verification model of AntiPhiMBS-Auth is successfully verified using a simple PROMELA interpreter (SPIN). The SPIN verification results prove that the proposed AntiPhiMBS-Auth is error-free, and financial institutions can implement the verified model for mitigating the phishing attacks in the mobile banking system at the authentication level.

Tej Narayan Thakur, Noriaki Yoshiura

BU-Trace: A Permissionless Mobile System for Privacy-Preserving Intelligent Contact Tracing

Abstract

The coronavirus disease 2019 (COVID-19) pandemic has caused an unprecedented health crisis for the global. Digital contact tracing, as a transmission intervention measure, has shown its effectiveness on pandemic control. Despite intensive research on digital contact tracing, existing solutions can hardly meet users’ requirements on privacy and convenience. In this paper, we propose \(\mathsf {BU}\)-\(\mathsf {Trace}\), a novel permissionless mobile system for privacy-preserving intelligent contact tracing based on QR code and NFC technologies. First, a user study is conducted to investigate and quantify the user acceptance of a mobile contact tracing system. Second, a decentralized system is proposed to enable contact tracing while protecting user privacy. Third, an intelligent behavior detection algorithm is designed to ease the use of our system. We implement \(\mathsf {BU}\)-\(\mathsf {Trace}\) and conduct extensive experiments in several real-world scenarios. The experimental results show that \(\mathsf {BU}\)-\(\mathsf {Trace}\) achieves a privacy-preserving and intelligent mobile system for contact tracing without requesting location or other privacy-related permissions.

Zhe Peng, Jinbin Huang, Haixin Wang, Shihao Wang, Xiaowen Chu, Xinzhi Zhang, Li Chen, Xin Huang, Xiaoyi Fu, Yike Guo, Jianliang Xu

A Novel Road Segment Representation Method for Travel Time Estimation

Abstract

Road segment representation is important for evaluating travel time, route recovery and traffic anomaly detection. Recent works mainly consider topology information of road network based on graph neural network, while dynamic character of topology relationship is usually ignored. Especially, the relationship between road segments is evolving with time elapsing. To obtain road segment representation based on dynamic spatial information, we propose a model named temporal and spatial deep graph infomax network (ST-DGI). It not only captures road topology relationship, but also denotes road segment representation under different time intervals. Meanwhile, the global traffic status/flow will also affect local road segments’ traffic situation. Our model would learn the mutual relationship between them, with maximizing mutual information between road segment (local) representation and traffic status/flow (global) representation. Furthermore, it would make road segment representation more distinguishable by this kind of unsupervised learning, and be helpful for downstream application. Extensive experiments are conducted on two important traffic datasets. Compared with the state-of-the-arts models, the experiment results demonstrate the superior effectiveness of our model.

Wei Liu, Jiayu He, Haiming Wang, Huaijie Zhu, Jian Yin

Privacy Protection for Medical Image Management Based on Blockchain

Abstract

With the rapid development of medical research and the advance of information technology, Electronic Health Records (EHR) has attracted considerable attention in recent years due to its characteristics of easy storage, convenient access, and good shareability. The medical image is one of the most frequently used data format in the EHR data, which is closely relevant to patient personal data and involves many highly sensitive information such as patient names, ID numbers, diagnostic information and telephone numbers. A recent survey reveals that about 24.3 million medical images have been leaked from 50 countries all over the world. Moreover, these medical images can be easily modified or lost during the transmission, which seriously hinders the EHR data sharing. Blockchain is an emerging technology which integrates reliable storage, high security and non-tamperability. In this paper, we propose a privacy protection model that integrates data desensitization and multiple signatures based on blockchain to protect the patient’s medical image data. We evaluate the performance of our proposed method through extensive experiments, the results show that our proposed method achieves desirable performance.

Yifei Li, Yiwen Wang, Ji Wan, Youzhi Ren, Yafei Li

Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Abstract

High-dimensional approximate nearest neighbor search (ANNS) has drawn much attention over decades due to its importance in machine learning and massive data processing. Recently, the graph-based ANNS become more and more popular thanks to the outstanding search performance. While various graph-based methods use different graph construction strategies, the widely-accepted principle is to make the graph as sparse as possible to reduce the search cost. In this paper, we observed that the sparse graph incurs significant cost in the high recall regime (close or equal to 100%). To this end, we propose to judiciously control the minimum angle between neighbors of each point to create more dense graphs. To reduce the search cost, we perform K-means clustering for the neighbors of each point using cosine similarity and only evaluate neighbors whose centroids are close to the query in angular similarity, i.e., query-directed search. PQ-like method is adopted to optimize the space and time performance in evaluating the similarity of centroids and the query. Extensive experiments over a collection of real-life datasets are conducted and empirical results show that up to 2.2x speedup is achieved in the high recall regime.

Hongya Wang, Zeng Zhao, Kaixiang Yang, Hui Song, Yingyuan Xiao

Backmatter

Title: Database Systems for Advanced Applications. DASFAA 2021 International Workshops
Editors: Christian S. Jensen
Ee-Peng Lim
De-Nian Yang
Chia-Hui Chang
Dr. Jianliang Xu
Wen-Chih Peng
Jen-Wei Huang
Chih-Ya Shen
Publisher: Springer International Publishing
Electronic ISBN: 978-3-030-73216-5
Print ISBN: 978-3-030-73215-8
DOI: https://doi.org/10.1007/978-3-030-73216-5

Springer Professional

About this book

Table of Contents

Frontmatter

The 6th International Workshop on Big Data Quality Management

Frontmatter

ASQT: An Efficient Index for Queries on Compressed Trajectories

ROPW: An Online Trajectory Compression Algorithm

HTF: An Effective Algorithm for Time Series to Recover Missing Blocks

LAA: Inductive Community Detection Algorithm Based on Label Aggregation

Modeling and Querying Similar Trajectory in Inconsistent Spatial Data

The 5th International Workshop on Graph Data Management and Analysis

Frontmatter

ESTI: Efficient k-Hop Reachability Querying over Large General Directed Graphs

NREngine: A Graph-Based Query Engine for Network Reachability

An Embedding-Based Approach to Repairing Question Semantics

Hop-Constrained Subgraph Query and Summarization on Large Graphs

Ad Click-Through Rate Prediction: A Survey

An Attention-Based Approach to Rule Learning in Large Knowledge Graphs

The 1st International Workshop on Machine Learning and Deep Learning for Data Security Applications

Frontmatter

Multi-scale Gated Inpainting Network with Patch-Wise Spacial Attention

An Improved CNNLSTM Algorithm for Automatic Detection of Arrhythmia Based on Electrocardiogram Signal

Cross-Domain Text Classification Based on BERT Model

Surface Defect Detection Method of Hot Rolling Strip Based on Improved SSD Model

Continuous Keystroke Dynamics-Based User Authentication Using Modified Hausdorff Distance

Deep Learning-Based Dynamic Community Discovery

6th International Workshop on Mobile Data Management, Mining, and Computing on Social Network

Frontmatter

Deep Attributed Network Embedding Based on the PPMI

Discovering Spatial Co-location Patterns with Dominant Influencing Features in Anomalous Regions

Activity Organization Queries for Location-Aware Heterogeneous Information Network

Combining Oversampling with Recurrent Neural Networks for Intrusion Detection

Multi-head Attention with Hint Mechanisms for Joint Extraction of Entity and Relation

Maximum (L, K)-Lasting Cores in Temporal Social Networks

The 2021 International Workshop on Mobile Ubiquitous Systems and Technologies

Frontmatter

A Tablet-Based Game Tool for Cognition Training of Seniors with Mild Cognitive Impairment

AntiPhiMBS-Auth: A New Anti-phishing Model to Mitigate Phishing Attacks in Mobile Banking System at Authentication Level

BU-Trace: A Permissionless Mobile System for Privacy-Preserving Intelligent Contact Tracing

A Novel Road Segment Representation Method for Travel Time Estimation

Privacy Protection for Medical Image Management Based on Blockchain

Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Backmatter

Premium Partner