Skip to main content

2021 | Buch

Database and Expert Systems Applications - DEXA 2021 Workshops

BIOKDD, IWCFS, MLKgraphs, AI-CARES, ProTime, AISys 2021, Virtual Event, September 27–30, 2021, Proceedings

herausgegeben von: Gabriele Kotsis, A Min Tjoa, Ismail Khalil, Dr. Bernhard Moser, Atif Mashkoor, Johannes Sametinger, Dr. Anna Fensel, Prof. Jorge Martinez-Gil, Lukas Fischer, Gerald Czech, Florian Sobieczky, Sohail Khan

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This volume constitutes the refereed proceedings of the workshops held at the 32nd International Conference on Database and Expert Systems Applications, DEXA 2021, held in a virtual format in September 2021: The 12th International Workshop on Biological Knowledge Discovery from Data (BIOKDD 2021), the 5th International Workshop on Cyber-Security and Functional Safety in Cyber-Physical Systems (IWCFS 2021), the 3rd International Workshop on Machine Learning and Knowledge Graphs (MLKgraphs 2021), the 1st International Workshop on Artificial Intelligence for Clean, Affordable and Reliable Energy Supply (AI-CARES 2021), the 1st International Workshop on Time Ordered Data (ProTime2021), and the 1st International Workshop on AI System Engineering: Math, Modelling and Software (AISys2021). Due to the COVID-19 pandemic the conference and workshops were held virtually.

The 23 papers were thoroughly reviewed and selected from 50 submissions, and discuss a range of topics including: knowledge discovery, biological data, cyber security, cyber-physical system, machine learning, knowledge graphs, information retriever, data base, and artificial intelligence.

Inhaltsverzeichnis

Frontmatter
Correction to: A Comparative Study of Deep Learning Approaches for Day-Ahead Load Forecasting of an Electric Car Fleet
Ahmad Mohsenimanesh, Evgueniy Entchev, Alexei Lapouchnian, Hajo Ribberink

Cyber-Security and Functional Safety in Cyber-Physical Systems

Frontmatter
Mode Switching for Secure Web Applications – A Juice Shop Case Scenario
Abstract
Switching modes is a general mechanism that is used in many domains. We have suggested to use it for security purposes to make systems more resilient when vulnerabilities are known or when attacks are performed. OWASP provides several vulnerable web applications for testing and training security skills. We have the idea of applying mode switching to one of these applications in order to demonstrate its usefulness in increasing security. We have chosen Juice Shop as our sample application. In this paper (i) we suggest a multi-modal architecture for web applications; (ii) we present Juice Shop as our web application scenario; and (iii) we show first reflections on how mode switching can reduce attack surfaces and, thus, increase resilience.
Michael Riegler, Johannes Sametinger
A Conceptual Model for Mitigation of Root Causes of Uncertainty in Cyber-Physical Systems
Abstract
Cyber-Physical Systems (CPS) are widely used in different domains. The major application domains of CPS are healthcare, transportation, manufacturing, industrial control systems, automatic pilot avionics, robotics systems, and so on. Uncertainty is one of the major issues that challenge the reliability of a CPS. In the literature, various approaches have been proposed to deal with uncertainty. However, fewer studies have focused on handling the root cause analysis of uncertainty and also suggesting the corresponding mitigation strategies. Inspired by this, we propose a conceptual model effective in mitigating the root causes of uncertainty in CPS. Moreover, some potential future research dimensions are outlined.
Mah Noor Asmat, Saif Ur Rehman Khan, Atif Mashkoor
Security-Based Safety Hazard Analysis Using FMEA: A DAM Case Study
Abstract
Safety and security emerge to be the most significant features of a Cyber-Physical System (CPS). Safety and security of a system are interlaced concepts and have mutual impact on each other. In the last decade, there are many cases where security breach resulted in safety hazards. There have been very few studies in the literature that address the integrated safety security risk assessment. Since, the need of the time is to consider both safety and security concurrently not even consequently. To close this gap, we aim to: (i) perform hazard analysis using Failure Mode Effect Analysis (FMEA) of a cyber physical system case i.e., Dam case study, and (ii) perform risk identification, risk analysis and mitigation for the said case. As a result, we extracted the potential failure modes, failure causes, failure effects, and the risk priority number. In addition, we also identified the safety requirements for the modes of the subject.
Irum Inayat, Muhammad Farooq, Zubaria Inayat, Muhammad Abbas
Privacy Preserving Machine Learning for Malicious URL Detection
Abstract
Phishing remains the most prominent attack causing loss of billions of dollars for organizations and users every year. Attackers use phishing to obtain sensitive information from users, install malware and obtain control over their systems. Currently, web browsers counter this attack using blacklisting method, however it fails to detect newly generated malicious websites, hence ineffective. In the recent times, machine learning based URL classification techniques where trained models are deployed on server side, emerged as an effective solution to detect new malicious URLs and provide it as a service to the user. While malicious URL detection continues to be a problem, another potential concern is the user’s query privacy (when offered as malicious URL detection as a service, where server can learn about the URL) . Hence to address the query privacy, we propose privacy enabled malicious URL detection.
In this work, we focus on privacy enabled malicious URL detection based on FHE using 3 methods (i) Deep Neural Network (DNN) (ii) Logistic regression (iii) Hybrid. In the hybrid approach, the feature extraction is done using DNN and classification is done using logistic regression model, gives practical performance. We designed the models based on split architecture (client/server). We present our experiments with the models trained using a dataset of 100,000 URLs (50,000 valid and 50,000 phished URLs). Our experiments show that malicious URL detection in encrypted domain is practical in terms of accuracy and efficiency.
Imtiyazuddin Shaik, Nitesh Emmadi, Harshal Tupsamudre, Harika Narumanchi, Rajan Mindigal Alasingara Bhattachar
Remote Attestation of Bare-Metal Microprocessor Software: A Formally Verified Security Monitor
Abstract
Remote attestation is a protocol to verify that a remote algorithm satisfies security properties, allowing to establish dynamic root of trust. Modern architectures for remote attestation combine signature or MAC primitives with hardware monitors to enforce secret confidentiality.
Our works are based on a verified hardware/software co-design for remote attestation, VRASED. Its proof is established using formal methods and its implementation is conducted on a simple embedded device based on a single core microcontroller. A heavy modification of the core, along with a hardware monitor, enforces security properties.
We propose to extend this method to microprocessors where cores cannot be modified. In this paper, we tackle this problem with support from the microprocessor’s debug interface and demonstrate that the same security properties also hold.
Jonathan Certes, Benoît Morgan
Provenance and Privacy in ProSA
A Guided Interview on Privacy-Aware Provenance
Abstract
Consciously collecting (research) data and respecting privacy aspects are two contradictions, which seem to be mutually exclusive at first moment. However, this does not have to be the case. But before we can address this conflict and its resolution, we want to understand what the terms privacy, provenance, and research data management actually mean. We are not interested in the formal definitions but in the community’s understanding of these terms. We have the intention to explore how far the theoretical definitions are known in science and economy. Hence, we interviewed 20 people – scientists and non-scientists – and evaluated their answers for discussing the relevance of combining provenance and privacy in the field of research data management. We discovered that provenance is generally understood as the origin of data or physical objects, and privacy often refers to the protection of personal data. We found that all participants have a very good understanding of their own research data, which in most cases is based on a well-developed research data management. Nevertheless, there is still some uncertainty, especially in the area of provenance and privacy.
Tanja Auge, Nic Scharlau, Andreas Heuer

Machine Learning and Knowledge Graphs

Frontmatter
Placeholder Constraint Evaluation in Simulation Graphs
Abstract
Simulations can be represented in the form of a graph structure of components. Placeholders, where components can be added to the simulation, contain dependency constraint knowledge which is stored in a graph database. In this paper an application for automatic guided simulation creation is presented in the form of a three-step process to evaluate constraints of placeholders and therefore suggest suitable components.
Stefan Nadschläger, Markus Jäger, Daniel Hofer, Josef Küng
Walk Extraction Strategies for Node Embeddings with RDF2Vec in Knowledge Graphs
Abstract
As Knowledge Graphs are symbolic constructs, specialized techniques have to be applied in order to make them compatible with data mining techniques. RDF2Vec is an unsupervised technique that can create task-agnostic numerical representations of the nodes in a KG by extending successful language modeling techniques. The original work proposed the Weisfeiler-Lehman kernel to improve the quality of the representations. However, in this work, we show that the Weisfeiler-Lehman kernel does little to improve walk embeddings in the context of a single Knowledge Graph. As an alternative, we examined five alternative strategies to extract information complementary to basic random walks and compare them on several benchmark datasets to show that research within this field is still relevant for node classification tasks.
Bram Steenwinckel, Gilles Vandewiele, Pieter Bonte, Michael Weyns, Heiko Paulheim, Petar Ristoski, Filip De Turck, Femke Ongenae
Bridging Semantic Web and Machine Learning: First Results of a Systematic Mapping Study
Abstract
Both symbolic and subsymbolic AI research have seen a recent surge driven by innovative approaches, such as neural networks and knowledge graphs. Further opportunities lie in the combined use of these two paradigms in ways that benefit from their complementary strengths. Accordingly, there is much research at the confluence of these two research areas and a number of efforts were already made to survey and analyze the resulting research area. However, to our knowledge, none of these surveys rely on methodologies that aim to capture an evidence-based characterization of the area while at the same time being reproducible. To fill in this gap, in this paper we report on our ongoing work to apply a systematic mapping study methodology to better characterise systems in this area. Given the breadth of the area, we scope the study to focus on systems that combine semantic web technologies and machine learning, which we call SWeML Systems. While the study is still ongoing, we hereby report on its design and the first results obtained.
Laura Waltersdorfer, Anna Breit, Fajar J. Ekaputra, Marta Sabou
On the Quality of Compositional Prediction for Prospective Analytics on Graphs
Abstract
Recently, micro-learning has been successfully applied to various scenarios, such as graph optimization (e.g. power grid management). In these approaches, ad-hoc models of local data are built instead of one large model on the overall data set. Micro-learning is typically useful for incremental, what-if/prospective scenarios, where one has to perform step-by-step decisions based on local properties. A common feature of these applications is that the predicted properties (such as speed of a bus line) are compositions of smaller parts (e.g. the speed on each bus inter-stations along the line). But little is known about the quality of such predictions when generalized at a larger scale.
In this paper we propose a generic technique that embeds machine-learning for graph-based compositional prediction, that allows 1) the prediction of the behaviour of composite objects, based on the predictions of their sub-parts and appropriate composition rules, and 2) the production of rich prospective analytics scenarios, where new objects never observed before can be predicted based on their simpler parts. We show that the quality of such predictions compete with macro-learning ones, while enabling prospective scenarios. We assess our work on a real size, operational bus network data set.
Gauthier Lyan, David Gross Amblard, Jean-Marc Jezequel
Semantic Influence Score: Tracing Beautiful Minds Through Knowledge Diffusion and Derivative Works
Abstract
Articles judged on the basis of raw citations or citation counts (or similar) are biased with “Rich gets Richer” conjecture, and continue to propagate a perceived notion of paper quality and influence among scientific communities. This perception of preferential attachment, overlooking important factors such as context and the age of the paper has been criticized recently. In this paper, we propose ‘Semantic Influence Score (SIS)’, an unbiased alternative to metrics which rely on raw citation counts. We compute the semantic influence of a paper on its derivative works by developing a multilevel influence network, which takes into account references, domain intersection and influence scores of the articles in the network. SIS provides a robust alternative to the widely used mechanism of raw citation counts i.e., the number of citations it receives.
Pragnya Sridhar, Deepika Karanji, Gambhire Swati Sampatrao, Sravan Danda, Snehanshu Saha

AI System Engineering: Math, Modelling and Software

Frontmatter
Robust and Efficient Bio-Inspired Data-Sampling Prototype for Time-Series Analysis
Abstract
Data acquisition is crucial for efficient AI systems. We present a bio-inspired prototype implementation of discrepancy-based adaptive threshold-based sampling on a low-cost microcontroller. We show measurement results demonstrating that an adaptive threshold-based sampling approach can be performed only using onboard components of the microcontroller. To measure the sampling precision, we used sinusoidal signals output by a waveform generator and compared the signals after reconstruction to exact signals with the set parameters. These measurements show that, even with such low-cost components, discrepancy-based adaptive threshold-based sampling can be performed with high precision.
Michael Lunglmayr, Günther Lindorfer, Bernhard Moser
Membership-Mappings for Data Representation Learning: Measure Theoretic Conceptualization
Abstract
A fuzzy theoretic analytical approach was recently introduced that leads to efficient and robust models while addressing automatically the typical issues associated to parametric deep models. However, a formal conceptualization of the fuzzy theoretic analytical deep models is still not available. This paper introduces using measure theoretic basis the notion of membership-mapping for representing data points through attribute values (motivated by fuzzy theory). A property of the membership-mapping, that can be exploited for data representation learning, is of providing an interpolation on the given data points in the data space. An analytical approach to the variational learning of a membership-mappings based data representation model is considered.
Mohit Kumar, Bernhard Moser, Lukas Fischer, Bernhard Freudenthaler
Membership-Mappings for Data Representation Learning: A Bregman Divergence Based Conditionally Deep Autoencoder
Abstract
This paper suggests to use membership-mapping as the building block of deep models. An alternative idea of deep autoencoder, referred to as Bregman Divergence Based Conditionally Deep Autoencoder (that consists of layers such that each layer learns data representation at certain abstraction level through a membership-mappings based autoencoder), is presented. A multi-class classifier is presented that employs a parallel composition of conditionally deep autoencoders to learn data representation for each class. Experiments are provided to demonstrate the competitive performance of the proposed framework in classifying high-dimensional feature vectors and in rendering robustness to the classification.
Mohit Kumar, Bernhard Moser, Lukas Fischer, Bernhard Freudenthaler
Data Catalogs: A Systematic Literature Review and Guidelines to Implementation
Abstract
In enterprises, data is usually distributed across multiple data sources and stored in heterogeneous formats. The harmonization and integration of data is a prerequisite to leverage it for AI initiatives. Recently, data catalogs pose a promising solution to semantically classify and organize data sources across different environments and to enrich raw data with metadata. Data catalogs therefore allow to create a single, clear, and easy-accessible interface for training and testing computational models. Despite a lively discussion among practitioners, there is little research on data catalogs. In this paper, we systematically review existing literature and answer the following questions: (1) What are the conceptual components of a data catalog? and (2) Which guidelines can be recommended to implement a data catalog? The results benefit practitioners in implementing a data catalog to accelerate any AI initiative and researchers with a compilation of future research directions.
Lisa Ehrlinger, Johannes Schrott, Martin Melichar, Nicolas Kirchmayr, Wolfram Wöß
Task-Specific Automation in Deep Learning Processes
Abstract
Recent advances in deep learning facilitate the training, testing, and deployment of models through so-called pipelines. Those pipelines are typically orchestrated with general-purpose machine learning frameworks (e.g., Tensorflow Extended), where developers manually call the single steps for each task-specific application. The diversity of task- and technology-specific requirements in deep learning projects increases the orchestration effort. There are recent advances to automate the orchestration with machine learning, which are however, still immature and do not support task-specific applications. Hence, we claim that partial automation of pipeline orchestration with respect to specific tasks and technologies decreases the overall development effort. We verify this claim with the ALOHA tool flow, where task-specific glue code is automated. The gains of the ALOHA tool flow pipeline are evaluated with respect to human effort, computing performance, and security.
Georg Buchgeher, Gerald Czech, Adriano Souza Ribeiro, Werner Kloihofer, Paolo Meloni, Paola Busia, Gianfranco Deriu, Maura Pintor, Battista Biggio, Cristina Chesta, Luca Rinelli, David Solans, Manuel Portela

Time Ordered Data

Frontmatter
Approximate Fault Tolerance for Edge Stream Processing
Abstract
Existing distributed stream processing systems generally guarantee fault tolerance by switching to standby machines and reprocessing lost data. In edge computing environments, however, we have to duplicate each edge for this conventional approach. This duplication cost increases sharply with expansion in the system scale. To solve this problem, we propose an approach to support approximate fault tolerance without edge duplication. We focus on environmental monitoring applications and utilize the correlation between sensors. In this paper, we assume that each edge estimates missing data from the observed data and aggregates them approximately. We provide a method to estimate the outputs of failed edges taking care of the uncertainty of the processing results at each edge. Our method allows the server to continue processing without waiting for the recovery of failed edges. We also show that the validity of our method by experiments using synthetic data.
Daiki Takao, Kento Sugiura, Yoshiharu Ishikawa
Deep Learning Rule for Efficient Changepoint Detection in the Presence of Non-Linear Trends
Abstract
This study presents our ongoing research on designing new methods for changepoint detection in industrial environments using a CUSUM method variant. The changepoint detection refers to identifying the location of change of some aspect in a given time series. The significant difference concerning a state-of-the-art time series prediction technique (using an LSTM) is that our method can handle anomalies masked by non-trivial trends. We have evaluated our proposal with a systematic series of test data and an example set with wear-induced anomalies.
Salma Mahmoud, Jorge Martinez-Gil, Patrick Praher, Bernhard Freudenthaler, Alexander Girkinger
Time Series Pattern Discovery by Deep Learning and Graph Mining
Abstract
Outstanding success of CNN image classification affected using it as an instrument for time series classification. Powerful graph clustering methods have capabilities to come across entity relationships. In this study we propose time series pattern discovery approach as a hybrid of independent CNN image classification and graph mining. Our experiments are based on Electroencephalography (EEG) channel signals data from research of Alcoholic and Control person behaviors. For image classification we used techniques of transforming vectors to images on Gramian Angular Fields (GAF) and for graph mining we built time series graphs on pairs of vectors with high cosine similarities. We unlocked EEG time series patterns that not just validate differences in stimuli reactions of persons from Alcoholic or Control groups but also indicate similarities or dissimilarities between EEG channel signals located in different scalp landscape positions.
Alex Romanova

Biological Knowledge Discovery from Big Data

Frontmatter
Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis
Abstract
Recent advances in the high throughput technologies resulted in the production of large gene expression data sets for several phenotypes. Via comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc., one could identify biomarkers. As opposed to traditional gene selection approaches, integrative gene selection approaches incorporate domain knowledge from external biological resources during gene selection, which improves interpretability and predictive performance. In this respect, Gene Ontology provides cellular component, molecular function and biological process terms for the products of each gene. In this study, we present Gene Ontology based feature selection approach for gene expression data analysis. In our approach, we used the ontology information as grouping (term) information and embedded this information into a machine learning algorithm for selecting the most significant groups (terms) of ontology. Those groups are used to build the machine learning model in order to perform the classification task. The output of the tool is a significant ontology group for the task of 2-class classification applied on the gene expression data. This knowledge allows the researcher to perform more advanced gene expression analyses. We tested our approach on 8 different gene expression datasets. In our experiments, we observed that the tool successfully found the significant Ontology terms that would be used as a classification model. We believe that our tool will help the geneticists to identify affected genes in transcriptomic data and this information could enable the design of platforms to assist diagnosis, to assess patients’ prognoses, and to create patient treatment plans.
Malik Yousef, Ahmet Sayıcı, Burcu Bakir-Gungor
SVM-RCE-R-OPT: Optimization of Scoring Function for SVM-RCE-R
Abstract
Gene expression data classification provides a challenge in classification due to it having high dimensionality and a relatively small sample size. Different feature selection approaches have been used to overcome this issue and SVM-RCE being one of the more successful approach. This study is a continuation of two previous research studies SVM-RCE and SVM-RCE-R. SVM-RCE-R suggests a new approach in the scoring function for the clusters, showing that for some different combination of weights the performance was improved. The aim of this study is to find the optimal weights for the scoring function suggested in the study of SVM-RCE-R using optimization approaches. We have discovered that finding the optimal weights for the scoring function would improve the performance of the SVM-RCE- in most cases. We have shown that in some cases the performance is increased dramatically by 10% in terms of accuracy and AUC. By increasing the performance of the algorithm, it is more likely that we can extract subset genes relating to the class association of a microarray sample.
Malik Yousef, Amhar Jabeer, Burcu Bakir-Gungor

Artificial Intelligence for Clean, Affordable and Reliable Energy Supply

Frontmatter
Short-Term Renewable Energy Forecasting in Greece Using Prophet Decomposition and Tree-Based Ensembles
Abstract
Energy production using renewable sources exhibits inherent uncertainties due to their intermittent nature. Nevertheless, the unified European energy market promotes the increasing penetration of renewable energy sources (RES) by the regional energy system operators. Consequently, RES forecasting can assist in the integration of these volatile energy sources, since it leads to higher reliability and reduced ancillary operational costs for power systems. This paper presents a new dataset for solar and wind energy generation forecast in Greece and introduces a feature engineering pipeline that enriches the dimensional space of the dataset. In addition, we propose a novel method that utilizes the innovative Prophet model, an end-to-end forecasting tool that considers several kinds of nonlinear trends in decomposing the energy time series before a tree-based ensemble provides short-term predictions. The performance of the system is measured through representative evaluation metrics, and by estimating the model’s generalization under an industry-provided scheme of absolute error thresholds. The proposed hybrid model competes with baseline persistence models, tree-based regression ensembles, and the Prophet model, managing to outperform them, presenting both lower error rates and more favorable error distribution.
Argyrios Vartholomaios, Stamatis Karlos, Eleftherios Kouloumpris, Grigorios Tsoumakas
A Comparative Study of Deep Learning Approaches for Day-Ahead Load Forecasting of an Electric Car Fleet
Abstract
The charging of electric cars affects the performance, efficiency, and required capacity of the electric grid especially where a large electric car fleet located close together simultaneously charges off the same local transformer. Therefore, an accurate load forecasting is required for the reliable and efficient operation of a power system. In this study, three deep learning algorithms, including long short term memory, bidirectional long short term memory, and gated recurrent units are employed and compared in forecasting the aggregate load for the charging of a fleet of electric cars. The developed models were trained and tested on a real-world data set that was collected from 1000 electric vehicles across Canada during 2017–2019. The bidirectional long short term memory algorithm possesses the lowest mean absolute error, mean absolute percentage error and root mean square error among the used methods and is best suited for forecasting the load of electric cars fleet.
Ahmad Mohsenimanesh, Evgueniy Entchev, Alexei Lapouchnian, Hajo Ribberink
Backmatter
Metadaten
Titel
Database and Expert Systems Applications - DEXA 2021 Workshops
herausgegeben von
Gabriele Kotsis
A Min Tjoa
Ismail Khalil
Dr. Bernhard Moser
Atif Mashkoor
Johannes Sametinger
Dr. Anna Fensel
Prof. Jorge Martinez-Gil
Lukas Fischer
Gerald Czech
Florian Sobieczky
Sohail Khan
Copyright-Jahr
2021
Electronic ISBN
978-3-030-87101-7
Print ISBN
978-3-030-87100-0
DOI
https://doi.org/10.1007/978-3-030-87101-7