main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 31th Australasian Database Conference, ADC 2019, held in Melbourne, VIC, Australia, in February 2020. The 14 full and 5 short papers presented were carefully reviewed and selected from 30 submissions. The Australasian Database Conference is an annual international forum for sharing the latest research advancements and novel applications of database systems, data driven applications and data analytics between researchers and practitioners from around the globe, particularly Australia, New Zealand and in the World.

## Inhaltsverzeichnis

### Semantic Round-Tripping in Conceptual Modelling Using Restricted Natural Language

Abstract
Conceptual modelling plays an important role in information system design and is one of its key activities. The modelling process usually involves domain experts and knowledge engineers who work together to bring out the required knowledge for building the information system. The most popular modelling approaches to develop these models include entity relationship modelling, object role modelling, and object-oriented modelling. These conceptual models are usually constructed graphically but are often difficult to understand by domain experts. In this paper we show how a restricted natural language can be used for writing a precise and consistent specification that is automatically translated into a description logic representation from which a conceptual model can be derived. This conceptual model can be rendered graphically and then verbalised again in the same restricted natural language as the specification. This process can be achieved with the help of a bi-directorial grammar that allows for semantic round-tripping between the representations.
Bayzid Ashik Hossain, Rolf Schwitter

### PAIC: Parallelised Attentive Image Captioning

Abstract
Most encoder-decoder architectures generate the image description sentence based on the recurrent neural networks (RNN). However, the RNN decoder trained by Back Propagation Through Time (BPTT) is inherently time-consuming, accompanied by the gradient vanishing problem. To overcome these difficulties, we propose a novel Parallelised Attentive Image Captioning Model (PAIC) that purely employs the optimised attention mechanism to decode natural sentences without using RNNs. At each decoding phase, our model can precisely localise different areas of image utilising the well-defined spatial attention module, meanwhile capturing the word sequence powered by the well-attested multi-head self-attention model. In contrast to the RNNs, the proposed PAIC can efficiently exploit the parallel computation advantages of GPU hardware for training, and further facilitate the gradient propagation. Extensive experiments on MS-COCO demonstrate that the proposed PAIC significantly reduces the training time, while achieving competitive performance compared to conventional RNN-based models.
Ziwei Wang, Zi Huang, Yadan Luo

### Efficient kNN Search with Occupation in Large-Scale On-demand Ride-Hailing

Abstract
The intelligent ride-hailing systems, e.g., DiDi, Uber, have served as essential travel tools for customers, which foster plenty of studies for the location-based queries on road networks. Under the large demand of ride-hailing, the non-occupied vehicles might be insufficient for new-coming user requests. However, the occupied vehicles which are about to arrive their destinations could be the candidates to serve the requests close to their destinations. Consequently, in our work, we study the k Nearest Neighbor search for moving objects with occupation, notated as Approachable kNN (AkNN) Query, which to the best of our knowledge is the first study to consider the occupation of moving objects in relevant fields. In particular, we first propose a simple Dijkstra-based algorithm for the AkNN query. Then we improve the solution by developing a grid-based Destination-Oriented index, derived from GLAD [9], for the occupied and non-occupied moving objects. Accordingly, we propose an efficient grid-based expand-and-bound algorithm for the approachable kNN search and conduct extensive experiments on real-world data. The results demonstrate the effectiveness and efficiency of our proposed solutions.
Mengqi Li, Dan He, Xiaofang Zhou

### Trace-Based Approach for Consistent Construction of Activity-Centric Process Models from Data-Centric Process Models

Abstract
In recent years, artifact-centric paradigm as a data-centric approach to business process modeling has gained momentum. Compared to the traditional activity-centric paradigm that focuses on process control-flow and treats data as simple black boxes that act as input and output to these activities, the artifact-centric paradigm provides equal support to both the control-flow and data. Most of the existing process modeling is activity-centric, although the artifact-centric modeling enables higher process flexibility and reusability. This is mainly due to the existence of numerous notations, tools and technologies that provide increased support to activity-centric process modeling and execution. Therefore, this paper proposes a trace-based approach to transform artifact-centric process models into activity-centric process models and to analyse the consistency of transformed and base models. A case study is utilized to demonstrate the feasibility of the proposed approach.
Jyothi Kunchala, Jian Yu, Sira Yongchareon, Guiling Wang

### Approximate Fault Tolerance for Sensor Stream Processing

Abstract
Some distributed stream processing systems store their internal states (e.g., partial aggregation results) in non-volatile storage to guarantee fault tolerance, but such checkpointing has a negative effect on system performance. To solve this problem, an existing method proposed to support an approximate guarantee of fault tolerance by omitting some checkpoints based on user-specified thresholds. However, it is difficult for a user to set appropriate thresholds because it is unclear how the thresholds affect the final output. Hence, we propose a method to support approximate fault tolerance for sensor stream processing. In our method, since we use the error bounds and the confidence threshold of recovery as user-specified thresholds, a user can set these thresholds intuitively according to his/her service level agreement (SLA). Our method models the correlation between sensing data by using a multivariate gaussian distribution, and reduces backup data if we can recover such data from the partial backup data and the probabilistic model. In this paper, we focus on average, sum, max, and min queries and propose a greedy-based backup selection algorithm. We evaluate the validity and efficiency of our approach by using synthetic data. Our experimental study shows that our approach achieves both of the reduction of backup data and approximate recovery that satisfies SLA.
Daiki Takao, Kento Sugiura, Yoshiharu Ishikawa

### Function Interpolation for Learned Index Structures

Abstract
Range indexes such as B-trees are widely recognised as effective data structures for enabling fast retrieval of records by the query key. While such classical indexes offer optimal worst-case guarantees, recent research suggests that average-case performance might be improved by alternative machine learning-based models such as deep neural networks. This paper explores an alternative approach by modelling the task as one of function approximation via interpolation between compressed subsets of keys. We explore the Chebyshev and Bernstein polynomial bases, and demonstrate substantial benefits over deep neural networks. In particular, our proposed function interpolation models exhibit memory footprint two orders of magnitude smaller compared to neural network models, and 30–40% accuracy improvement over neural networks trained with the same amount of time, while keeping query time generally on-par with neural network models.
Naufal Fikri Setiawan, Benjamin I. P. Rubinstein, Renata Borovica-Gajic

### DEFINE: Friendship Detection Based on Node Enhancement

Abstract
Network representation learning (NRL) is a matter of importance to a variety of tasks such as link prediction. Learning low-dimensional vector representations for node enhancement based on nodes attributes and network structures can improve link prediction performance. Node attributes are important factors in forming networks, like psychological factors and appearance features affecting friendship networks. However, little to no work has detected friendship using the NRL technique, which combines students’ psychological features and perceived traits based on facial appearance. In this paper, we propose a framework named DEFINE (No$$\mathbf {d}$$e $$\mathbf {E}$$nhancement based $$\mathbf {F}$$r$$\mathbf {i}$$e$$\mathbf {n}$$dship D$$\mathbf {e}$$tection) to detect students’ friend relationships, which combines with students’ psychological factors and facial perception information. To detect friend relationships accurately, DEFINE uses the NRL technique, which considers network structure and the additional attributes information for nodes. DEFINE transforms them into low-dimensional vector spaces while preserving the inherent properties of the friendship network. Experimental results on real-world friendship network datasets illustrate that DEFINE outperforms other state-of-art methods.
Hanxiao Pan, Teng Guo, Hayat Dino Bedru, Qing Qing, Dongyu Zhang, Feng Xia

### Semi-supervised Cross-Modal Hashing with Graph Convolutional Networks

Abstract
Cross-modal hashing for large-scale approximate neighbor search has attracted great attention recently because of its significant computational and storage efficiency. However, it is still challenging to generate high-quality binary codes to preserve inter-modal and intra-modal semantics, especially in a semi-supervised manner. In this paper, we propose a semi-supervised cross-modal discrete code learning framework. This is the very first work of applying asymmetric graph convolutional networks (GCNs) for scalable cross-modal retrieval. Specifically, the architecture contains multiple GCN branches, each of which is for one data modality to extract modality-specific features and then to generate unified binary hash codes across different modalities, so that the underlying correlations and similarities across modalities are simultaneously preserved into the hash values. Moreover, the branches are built with asymmetric graph convolutional layers, which employ randomly sampled anchors to tackle the scalability and out-of-sample issue in graph learning, and reduce the complexity of cross-modal similarity calculation. Extensive experiments conducted on benchmark datasets demonstrate that our method can achieve superior retrieval performance in comparison with the state-of-the-art methods.
Jiasheng Duan, Yadan Luo, Ziwei Wang, Zi Huang

### Typical Snapshots Selection for Shortest Path Query in Dynamic Road Networks

Abstract
Finding the shortest paths in road network is an important query in our life nowadays, and various index structures are constructed to speed up the query answering. However, these indexes can hardly work in real-life scenario because the traffic condition changes dynamically, which makes the pathfinding slower than in the static environment. In order to speed up path query answering in the dynamic road network, we propose a framework to support these indexes. Firstly, we view the dynamic graph as a series of static snapshots. After that, we propose two kinds of methods to select the typical snapshots. The first kind is time-based and it only considers the temporal information. The second category is the graph representation-based, which considers more insights: edge-based that captures the road continuity, and vertex-based that reflects the region traffic fluctuation. Finally, we propose the snapshot matching to find the most similar typical snapshot for the current traffic condition and use its index to answer the query directly. Extensive experiments on real-life road network and traffic conditions validate the effectiveness of our approach.
Mengxuan Zhang, Lei Li, Wen Hua, Xiaofang Zhou

### A Survey on Map-Matching Algorithms

Abstract
The map-matching is an essential preprocessing step for most of the trajectory-based applications. Although it has been an active topic for more than two decades and, driven by the emerging applications, is still under development. There is a lack of categorisation of existing solutions recently and analysis for future research directions. In this paper, we review the current status of the map-matching problem and survey the existing algorithms. We propose a new categorisation of the solutions according to their map-matching models and working scenarios. In addition, we experimentally compare three representative methods from different categories to reveal how matching model affects the performance. Besides, the experiments are conducted on multiple real datasets with different settings to demonstrate the influence of other factors in map-matching problem, like the trajectory quality, data compression and matching latency.
Pingfu Chao, Yehong Xu, Wen Hua, Xiaofang Zhou

### Gaussian Embedding of Large-Scale Attributed Graphs

Abstract
Graph embedding methods transform high-dimensional and complex graph contents into low-dimensional representations. They are useful for a wide range of graph analysis tasks including link prediction, node classification, recommendation and visualization. Most existing approaches represent graph nodes as point vectors in a low-dimensional embedding space, ignoring the uncertainty present in the real-world graphs. Furthermore, many real-world graphs are large-scale and rich in content (e.g. node attributes). In this work, we propose GLACE, a novel, scalable graph embedding method that preserves both graph structure and node attributes effectively and efficiently in an end-to-end manner. GLACE effectively models uncertainty through Gaussian embeddings, and supports inductive inference of new nodes based on their attributes. In our comprehensive experiments, we evaluate GLACE on real-world graphs, and the results demonstrate that GLACE significantly outperforms state-of-the-art embedding methods on multiple graph analysis tasks.
Bhagya Hettige, Yuan-Fang Li, Weiqing Wang, Wray Buntine

### Geo-Social Temporal Top-k Queries in Location-Based Social Networks

Abstract
With recent advancements in location-acquisition techniques and smart phone devices, social networks such as Foursquare, Facebook and Twitter are acquiring the location dimension while minimizing the gap between physical world and virtual social networking. This in return, has resulted in the generation of geo-tagged data at unprecedented scale and has facilitated users to fully capture and share their geo-locations with timestamps on social media. Typical location-based social media allows users to check-in at a location of interest using smart devices which then is published on social network and this information can be exploited for recommendation. In this paper, we propose a new type of query called Geo-Social Temporal Top-k ($$GSTT_k$$) query, which enriches the semantics of the conventional spatial query by introducing social relevance and temporal component. In addition, we propose three different schemes to answer such a query. Finally, we conduct an exhaustive evaluation of proposed schemes and demonstrate the effectiveness of the proposed approaches.
Ammar Sohail, Muhammad Aamir Cheema, David Taniar

### Effective and Efficient Community Search in Directed Graphs Across Heterogeneous Social Networks

Abstract
Communities in social networks are useful for many real applications, like product recommendation. This fact has driven the recent research interest in retrieving communities online. Although certain effort has been put into community search, users’ information has not been well exploited for effective search. Meanwhile, existing approaches for retrieval of communities are not efficient when applied in huge social networks. Motivated by this, in this paper, we propose a novel approach for retrieving communities online, which makes full use of users’ relationship information across heterogeneous social networks. We first investigate an online technique to match pairs of users in different social network and create a new social network, which contains more complete information. Then, we propose k-Dcore, a novel framework of retrieving effective communities in the directed social network. Finally, we construct an index to search communities efficiently for queries. Extensive experiments demonstrate the efficiency and effectiveness of our proposed solution in directed graphs, based on heterogeneous social networks.
Zezhong Wang, Ye Yuan, Xiangmin Zhou, Hongchao Qin

### Entity Extraction with Knowledge from Web Scale Corpora

Abstract
Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several techniques as a post-processing step for improving the effectiveness of the existing entity extraction technique. These techniques utilise models trained with the web-scale corpora which makes our techniques robust and versatile. Experiments show that our techniques bring a notable improvement on efficiency and effectiveness.
Zeyi Wen, Zeyu Huang, Rui Zhang

### Graph-Based Relation-Aware Representation Learning for Clothing Matching

Abstract
Learning mix-and-match relationships between fashion items is a promising yet challenging task for modern fashion recommender systems, which requires to infer complex fashion compatibility patterns from a large number of fashion items. Previous work mainly utilises metric learning techniques to model the compatibility relationships, such that compatible items are closer to each other than incompatible ones in the latent space. However, they ignore the contextual information of the fashion items for compatibility prediction. In this paper, we propose a Graph-based Type-Relational Neural Network (GTR-NN) framework, which first generates item representations through multi-layer ChebNet considering k-hop neighbour information, and then outputs compatibility score by predicting the binary label of an edge between two nodes under a specific type relation. Extensive experiments for two fashion-related tasks demonstrate the effectiveness and superior performance of our model.
Yang Li, Yadan Luo, Zi Huang

### Evaluating Random Walk-Based Network Embeddings for Web Service Applications

Abstract
Network embedding models automatically learn low-dimensional and neighborhood graph representation in vector space. Even-though these models have shown improved performances in various applications such as link prediction and classification compare to traditional graph mining approaches, they are still difficult to interpret. Most works rely on visualization for the interpretation. Moreover, it is challenging to quantify how well these models can preserve the topological properties of real networks such as clustering, degree centrality and betweenness. In this paper, we study the performance of recent unsupervised network embedding models in Web service application. Specifically, we investigate and analyze the performance of recent random walk-based embedding approaches including node2vec, DeepWalk, LINE and HARP in capturing the properties of Web service networks and compare the performances of the models for basic web service prediction tasks. We based the study on the Web service networks constructed in our previous works. We evaluate the models with respect to the precision with which they unpack specific topological properties of the networks. We investigate the influence of each topological property on the accuracy of the prediction task. We conduct our experiment using the popular ProgrammableWeb dataset. The results present in this work are expected to provide insight into application of network embedding in service computing domain especially for applications that aim at exploiting machine learning models.
Olayinka Adeleye, Jian Yu, Ji Ruan, Quan Z. Sheng

### Query-Oriented Temporal Active Intimate Community Search

Abstract
Most of the existing research works on finding local community mainly focus on the network structure or the attributes of the social users. Some recent works considered users’ topical activeness in detecting communities. However, not enough attention is paid to the degree of temporal topical interactions among the members in the retrieved communities. We propose a method to search temporal active intimate community in which community members are densely-connected as well as actively participate and have active temporal interactions among them with respect to the given query consisting of a set of query nodes (users) and a set of attributes. Experiments on real datasets demonstrate the effectiveness of our proposed approach.
Md Musfique Anwar

### A Contextual Semantic-Based Approach for Domain-Centric Lexicon Expansion

Abstract
This paper presents a contextual semantic-based approach for expansion of an initial lexicon containing domain-centric seed words. Starting with a small lexicon containing some domain-centric seed words, the proposed approach models text corpus as a weighted word-graph, where the initial weight of a node (word) represents the contextual semantic-based association between the node and the target domain, and the weight of an edge represents the co-occurrence frequency of the respective nodes. The semantic-based association between a node and the target domain is calculated as a function of three contextual semantic-based association metrics. Thereafter, a random walk-based modified PageRank algorithm is applied on the weighted graph to rank and select the most relevant terms for domain-centric lexicon expansion. The proposed approach is evaluated over five datasets, and found to perform significantly better than three baselines and three state-of-the-art approaches.
Muhammad Abulaish, Mohd Fazil, Tarique Anwar

### Data-Driven Hierarchical Neural Network Modeling for High-Pressure Feedwater Heater Group

Abstract
This paper proposes a data-driven hierarchical neural network modeling method for a high-pressure feedwater heater group (HPFHG) in power generation industry. An HPFHG is usually made up of several cascaded high-pressure feedwater heaters (HPFH). The challenge of modeling an HPFHG is to formulate not only the HPFHG as a whole but also its components at the same time. Physical modeling techniques based on dynamic thermal calculation can hardly be applied in practice because of lacking necessary parameters. Based on big operating data, modeling by neural networks is feasible. However, traditional artificial neural networks are black boxes, which are difficult to describe the subsystems or inner components of a system. The proposed modeling approach is inspired by the physical cascade structure of the HPFHG to tackle this problem. Experimental results show that our modeling approach is effective for the entire HPFHG as well as its every single component.
Jiao Yin, Mingshan You, Jinli Cao, Hua Wang, MingJian Tang, Yong-Feng Ge

### Early Detection of Diabetic Eye Disease from Fundus Images with Deep Learning

Abstract
Diabetes is a life-threatening disease that affects various human body organs, including eye retina. Advanced Diabetic Eye disease (DED) leads to permanent vision loss, thus an early detection of DED symptoms is essential to prevent disease escalation and timely treatment. Up till now, research challenges in early DED detection can be summarised as follows: Firstly, changes in the eye anatomy during its early stage are frequently untraceable by human eye due to subtle nature of the features, and Secondly, large volume of fundus images puts a significant strain on limited specialist resources, rendering manual analysis practically infeasible. Thus, Deep Learning-based methods have been practiced to facilitate early DED detection and address the issues currently faced. Despite promising, highly accurate detection of early anatomical changes in the eye using Deep Learning remains a challenge in wide scale practical application. Consequently, in this research we aim to address the main three research gaps and propose the framework for early automated DED detection system on fundus images through Deep Learning.
Rubina Sarki, Khandakar Ahmed, Hua Wang, Sandra Michalska, Yanchun Zhang

### Backmatter

Weitere Informationen