Skip to main content

Über dieses Buch

This volume constitutes the refereed proceedings of the three workshops held at the 31st International Conference on Database and Expert Systems Applications, DEXA 2020, held in September 2020: The 11th International Workshop on Biological Knowledge Discovery from Data, BIOKDD 2020, the 4th International Workshop on Cyber-Security and Functional Safety in Cyber-Physical Systems, IWCFS 2020, the 2nd International Workshop on Machine Learning and Knowledge Graphs, MLKgraphs2019. Due to the COVID-19 pandemic the conference and workshop were held virtually.

The 10 papers were thoroughly reviewed and selected from 15 submissions, and discuss a range of topics including: knowledge discovery, biological data, cyber security, cyber-physical system, machine learning, knowledge graphs, information retriever, data base, and artificial intelligent.



Biological Knowledge Discovery from Data


An In-Memory Cognitive-Based Hyperdimensional Approach to Accurately Classify DNA-Methylation Data of Cancer

With Next Generation DNA Sequencing techniques (NGS) we are witnessing a high growth of genomic data. In this work, we focus on the NGS DNA methylation experiment, whose aim is to shed light on the biological process that controls the functioning of the genome and whose modifications are deeply investigated in cancer studies for biomarker discovery. Because of the abundance of DNA methylation public data and of its high dimension in terms of features, new and efficient classification techniques are highly demanded. Therefore, we propose an energy efficient in-memory cognitive-based hyperdimensional approach for classification of DNA methylation data of cancer. This approach is based on the brain-inspired Hyperdimensional (HD) computing by adopting hypervectors and not single numerical values. This makes it capable of recognizing complex patterns with a great robustness against mistakes even with noisy data, as well as the human brain can do. We perform our experimentation on three cancer datasets (breast, kidney, and thyroid carcinomas) extracted from the Genomic Data Commons portal, the main repository of tumoral genomic and clinical data, obtaining very promising results in terms of accuracy (i.e., breast 97.7%, kidney 98.43%, thyroid 100%, respectively) and low computational time. For proving the validity of our approach, we compare it to another state-of-the-art classification algorithm for DNA methylation data. Finally, processed data and software are freely released at https://​github.​com/​fabio-cumbo/​HD-Classifier for aiding field experts in the detection and diagnosis of cancer.
Fabio Cumbo, Emanuel Weitschek

TopicsRanksDC: Distance-Based Topic Ranking Applied on Two-Class Data

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.
Malik Yousef, Jamal Al Qundus, Silvio Peikert, Adrian Paschke

Cyber-Security and Functional Safety in Cyber-Physical Systems


YASSi: Yet Another Symbolic Simulator Large (Tool Demo)

Safety critical systems have finally made their way into our daily life. While recent industrial and academic research could already improve the design cycle for such systems, ensuring the functionality of such systems still remains an open question. Such systems which are composed of hardware as well as software components have to be checked since any wrong behavior of the system could end up in harming human life. To this end, program analysis techniques can be applied in order to ensure that the program works as intended and that no unwanted behavior is executed. However, approaches like static or dynamic program analysis which are widely applied for this purpose still lead a large number of fault positive results. To overcome such limitations an alternative approach called symbolic execution has been proposed. In this work, we present a tool called YASSi which implements this approach. Applying YASSi allows to symbolically execute programs written in the C/C++ language. By this, YASSi can be applied for several applications needed for the checking program for safety critical properties like (1) assertion checking, (2) reachability analysis, or (3) stimuli generation for digital circuits.
Sebastian Pointner, Pablo Gonzalez-de-Aledo, Robert Wille

Variational Optimization of Informational Privacy

The datasets containing sensitive information can’t be publicly shared as a privacy-risk posed by several types of attacks exists. The data perturbation approach uses a random noise adding mechanism to preserve privacy, however, results in distortion of useful data. There remains the challenge of studying and optimizing privacy-utility tradeoff especially in the case when statistical distributions of data are unknown. This study introduces a novel information theoretic framework for studying privacy-utility tradeoff suitable for multivariate data and for the cases with unknown statistical distributions. We consider an information theoretic approach of quantifying privacy-leakage by the mutual information between sensitive data and released data. At the core of privacy-preserving framework lies a variational Bayesian fuzzy model approximating the uncertain mapping between released noise added data and private data such that the model is employed for variational approximation of informational privacy. The suggested privacy-preserving framework consists of three components: 1) Optimal Noise Adding Mechanism; 2) Modeling of Uncertain Mapping Between Released Noise Added Data and Private Data; and 3) Variational Approximation of Information Privacy.
Mohit Kumar, David Brunner, Bernhard A. Moser, Bernhard Freudenthaler

An Architecture for Automated Security Test Case Generation for MQTT Systems

Message Queuing Telemetry Transport (MQTT) protocol is among the preferred publish/subscribe protocols used for Machine-to-Machine (M2M) communication and Internet of Things (IoT). Although the MQTT protocol itself is quite simple, the concurrent iteration of brokers and clients and its intrinsic non-determinism, coupled with the diversity of platforms and programming languages in which the protocol is implemented and run, makes the necessary task of security testing challenging. We address precisely this problem by proposing an architecture for security test generation for systems relying on the MQTT protocol. This architecture enables automated test case generation to reveal vulnerabilities and discrepancies between different implementations. As a desired consequence, when implemented, our architectural design can be used to uncover erroneous behaviours that entail latent security risks in MQTT broker and client implementations. In this paper we describe the key components of our architecture, our prototypical implementation using a random test case generator, core design decisions and the use of security attacks in testing. Moreover, we present first evaluations of the architectural design and the prototypical implementation with encouraging initial results.
Hannes Sochor, Flavio Ferrarotti, Rudolf Ramler

Mode Switching from a Security Perspective: First Findings of a Systematic Literature Review

With increased interoperability of cyber-physical systems (CPSs), security becomes increasingly critical for many of these systems. We know mode switching from domains like aviation and automotive, and we imagine to use this mechanism for the development of resilient systems that continue to function correctly even if under malicious attack. If vulnerabilities are detected or even known, modes can be switched to reduce the attack surface and to minimize attackers’ range of activity. We propose to engineer CPSs with multi-modal software architectures to overcome the interval between the time when zero-day vulnerabilities become known and the time when corresponding updates become available. Thus, affected companies, operators and people will be able to protect themselves and their customers without having to wait for security updates. This paper presents first findings of a systematic literature review (SLR) on mode switching from a security perspective.
Michael Riegler, Johannes Sametinger

Exploiting MQTT-SN for Distributed Reflection Denial-of-Service Attacks

Distributed Denial-of-Service attacks are a dramatically increasing threat to Internet-based services and connected devices. In the form of reflection attacks they are abusing other systems to perform the actual attack, often with an additional amplification factor. In this work we describe a reflection attack exploiting the industrial Message Queuing Telemetry Transport for Sensor Networks (MQTT-SN) protocol, which theoretically allows to achieve an unlimited amplification rate. This poses a significant risk not only for the organizations which are running a MQTT-SN broker but also for possible targets of such DRDoS attacks. Countermeasures are limited as the underlying weakness is rooted in the specification of MQTT-SN itself.
Hannes Sochor, Flavio Ferrarotti, Rudolf Ramler

Machine Learning and Knowledge Graphs


Exploring the Influence of Data Aggregation in Parking Prediction

Parking occupancy is influenced by many external factors that make the availability prediction task difficult. We want to investigate how information from different data sources, such as events, weather and geographical entities interrelate in affecting parking prediction and thereby form a knowledge graph for the parking prediction problem.
In this paper, we try to tackle this problem by answering the following questions; What is the effect of the external features on different models? Is there a correlation between the amount of historical training data and external features? These questions are evaluated by applying three well-known time series forecasting models; long short term memory, convolutional neural network and multilayer perceptron. Additionally we introduce gradient boosted regression trees with handcrafted features. Experimental results on two real-world datasets showed that external features have a significant effect throughout the experiments and that the extent of the effectiveness varies across training histories and tested models. The findings show that the models are able to outperform recent work in the parking prediction literature. Furthermore, a comparison of the feature-engineered gradient boosted decision trees to other potential models has shown its advantage in the field of time series forecasting.
Shereen Elsayed, Daniela Thyssens, Shabanaz Chamurally, Arslan Tariq, Hadi Samer Jomaa

Building Knowledge Graph in Spark Without SPARQL

Knowledge graphs, powerful assets for enhancing search and various data integration, are being essential in both academia and industry. In this paper we will demonstrate that knowledge graph abilities are much wider than search and data integration. We will do it in a twofold manner: 1) we will show how to build knowledge graph in Spark instead of using SPARQL language and how to explore data in DataFrames and GraphFrames; and 2) we will reveal Spark knowledge graph as a bridge between logical thinking and graph thinking for data mining.
Alex Romanova

Open Information Extraction for Knowledge Graph Construction

An open information extraction approach for knowledge graph construction is presented. The motivation for the work is that large quantities of scholarly documents are available within many domains of discourse, and the subsequent challenge is to identify the most relevant articles concerning a particular topic. The proposed approach takes a document corpus and identifies triples within this corpus which are then processed to generate a literature knowledge graph. The extraction of triples is conducted using an open information extraction approach. The proposed OIE4KGC approach was evaluated using a bespoke clinical research methodology dataset and a benchmark dataset. A f-score of 51% was achieved on a clinical research methodology dataset and a f-score of 37% was achieved on the benchmark dataset.
Iqra Muhammad, Anna Kearney, Carrol Gamble, Frans Coenen, Paula Williamson


Weitere Informationen

Premium Partner