Skip to main content

2022 | Buch

Database and Expert Systems Applications - DEXA 2022 Workshops

33rd International Conference, DEXA 2022, Vienna, Austria, August 22–24, 2022, Proceedings

herausgegeben von: Prof. Dr. Gabriele Kotsis, A Min Tjoa, Ismail Khalil, Dr. Bernhard Moser, Prof. Dr. Alfred Taudes, Atif Mashkoor, Johannes Sametinger, Jorge Martinez-Gil, Florian Sobieczky, Lukas Fischer, Rudolf Ramler, Maqbool Khan, Gerald Czech

Verlag: Springer International Publishing

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This volume constitutes the refereed proceedings of the workshops held at the 33rd International Conference on Database and Expert Systems Applications, DEXA 2022, held in Vienna, Austria, in August 2022: The 6th International Workshop on Cyber-Security and Functional Safety in Cyber-Physical Systems (IWCFS 2022); 4th International Workshop on Machine Learning and Knowledge Graphs (MLKgraphs 2022); 2nd International Workshop on Time Ordered Data (ProTime2022); 2nd International Workshop on AI System Engineering: Math, Modelling and Software (AISys2022); 1st International Workshop on Distributed Ledgers and Related Technologies (DLRT2022); 1st International Workshop on Applied Research, Technology Transfer and Knowledge Exchange in Software and Data Science (ARTE2022).

The 40 papers were thoroughly reviewed and selected from 62 submissions, and discuss a range of topics including: knowledge discovery, biological data, cyber security, cyber-physical system, machine learning, knowledge graphs, information retriever, data base, and artificial intelligence.

Inhaltsverzeichnis

Frontmatter

AI System Engineering: Math, Modelling and Software

Frontmatter
Unboundedness of Linear Regions of Deep ReLU Neural Networks

Recent work concerning adversarial attacks on ReLU neural networks has shown that unbounded regions and regions with a sufficiently large volume can be prone to containing adversarial samples. Finding the representation of linear regions and identifying their properties are challenging tasks. In practice, one works with deep neural networks and high-dimensional input data that leads to polytopes represented by an extensive number of inequalities, and hence demanding high computational resources. The approach should be scalable, feasible and numerically stable. We discuss an algorithm that finds the H-representation of each region of a neural network and identifies if the region is bounded or not.

Anton Ponomarchuk, Christoph Koutschan, Bernhard Moser
Applying Time-Inhomogeneous Markov Chains to Math Performance Rating

In this paper, we present a case study in collaboration with mathematics education and probability theory, with one providing the use case and the other providing tests and data. The application deals with the reason and execution of automation of difficulty classification of mathematical tasks and their users’ skills based on the Elo rating system. The basic method is to be extended to achieve numerically fast converging ranks as opposed to the usual weak convergence of Elo numbers. The advantage over comparable state-of-the-art ranking methods is demonstrated in this paper by rendering the system an inhomogeneous Markov Chain. The usual Elo ranking system, which for equal skills (Chess, Math, ...) defines an asymptotically stationary time-inhomogeneous Markov process with a weakly convergent probability law. Our main objective is to modify this process by using an optimally decreasing learning rate by experiment to achieve fast and reliable numerical convergence. The time scale on which these ranking numbers converge then may serve as the basis for enabling digital applicability of established theories of learning psychology such as spiral principal and Cognitive Load Theory. We argue that the so further developed and tested algorithm shall lay the foundation for easier and better digital assignment of tasks to the individual students and how it is to be researched and tested in more detail in future.

Eva-Maria Infanger, Gerald Infanger, Zsolt Lavicza, Florian Sobieczky
A Comparative Analysis of Anomaly Detection Methods for Predictive Maintenance in SME

Predictive maintenance is a crucial strategy in smart industries and plays an important role in small and medium-sized enterprises (SMEs) to reduce the unexpected breakdown. Machine failures are due to unexpected events or anomalies in the system. Different anomaly detection methods are available in the literature for the shop floor. However, the current research lacks SME-specific results with respect to comparison between and investment in different available predictive maintenance (PdM) techniques. This applies specifically to the task of anomaly detection, which is the crucial first step in the PdM workflow. In this paper, we compared and analyzed multiple anomaly detection methods for predictive maintenance in the SME domain. The main focus of the current study is to provide an overview of different unsupervised anomaly detection algorithms which will enable researchers and developers to select appropriate algorithms for SME solutions. Different Anomaly detection algorithms are applied to a data set to compare the performance of each algorithm. Currently, the study is limited to unsupervised algorithms due to limited resources and data availability. Multiple metrics are applied to evaluate these algorithms. The experimental results show that Local Outlier Factor and One-Class SVM performed better than the rest of the algorithms.

Muhammad Qasim, Maqbool Khan, Waqar Mehmood, Florian Sobieczky, Mario Pichler, Bernhard Moser
A Comparative Study Between Rule-Based and Transformer-Based Election Prediction Approaches: 2020 US Presidential Election as a Use Case

Social media platforms (SMPs) attracted people from all over the world for they allow them to discuss and share their opinions about any topic including politics. The comprehensive use of these SMPs has radically transformed new-fangled politics. Election campaigns and political discussions are increasingly held on these SMPs. Studying these discussions aids in predicting the outcomes of any political event. In this study, we analyze and predict the 2020 US Presidential Election using Twitter data. Almost 2.5 million tweets are collected and categorized into Location-considered (LC) (USA only), and Location-unconsidered (LUC) (either location not mentioned or out of USA). Two different sentiment analysis (SA) approaches are employed: dictionary-based SA, and transformers-based SA. We investigated if the deployment of deep learning techniques can improve prediction accuracy. Furthermore, we predict a vote-share for each candidate at LC and LUC levels. Afterward, the predicted results are compared with the five polls’ predicted results as well as the real results of the election. The results show that dictionary-based SA outperformed all the five polls’ predicted results including the transformers with MAE 0.85 at LC and LUC levels, and RMSE 0.867 and 0.858 at LC and LUC levels.

Asif Khan, Huaping Zhang, Nada Boudjellal, Lin Dai, Arshad Ahmad, Jianyun Shang, Philipp Haindl
Detection of the 3D Ground Plane from 2D Images for Distance Measurement to the Ground

Obtaining 3D ground plane equations from remote sensing device data is crucial in scene-understanding tasks (e.g. camera parameters, distance of an object to the ground plane). Equations describing the orientation of the ground plane of a scene in 2D or 3D space can be reconstructed from multiple sensor output data as collected from 2D or 3D sensors such as; RGB-D cameras, T-o-F cameras or LiDAR sensors. In our work, we propose a modular and simple pipeline for 3D ground plane detection from 2D-RGB images for subsequent distance estimation of a given object to the ground plane. As the proposed algorithm can be applied on 2D-RGB images, provided by common devices such as surveillance cameras, we provide evidence that the algorithm has the potential to advance automated surveillance systems such as devices used for fall detection without the need to change existing hardware.

Ozan Cakiroglu, Volkmar Wieser, Werner Zellinger, Adriano Souza Ribeiro, Werner Kloihofer, Florian Kromp
Towards Practical Secure Privacy-Preserving Machine (Deep) Learning with Distributed Data

A methodology for practical secure privacy-preserving distributed machine (deep) learning is proposed via addressing the core issues of fully homomorphic encryption, differential privacy, and scalable fast machine learning. Considering that private data is distributed and the training data may contain directly or indirectly an information about private data, an architecture and a methodology are suggested for 1. mitigating the impracticality issue of fully homomorphic encryption (arising from large computational overhead) via very fast gate-by-gate bootstrapping and introducing a learning scheme that requires homomorphic computation of only efficient-to-evaluate functions; 2. addressing the privacy-accuracy tradeoff issue of differential privacy via optimizing the noise adding mechanism; 3. defining an information theoretic measure of privacy-leakage for the design and analysis of privacy-preserving schemes; and 4. addressing the optimal model size determination issue and computationally fast training issue of scalable and fast machine (deep) learning with an alternative approach based on variational learning. A biomedical application example is provided to demonstrate the application potential of the proposed methodology.

Mohit Kumar, Bernhard Moser, Lukas Fischer, Bernhard Freudenthaler

Applied Research, Technology Transfer and Knowledge Exchange in Software and Data Science

Frontmatter
Collaborative Aspects of Solving Rail-Track Multi-sensor Data Fusion

Multi-sensor data fusion depicts a challenge if the data collected from different remote sensors present with diverging properties such as point density, noise and outliers. To overcome a time-consuming, manual sensor-registration procedure in rail-track data, the TransMVS COMET project was initiated in a joint collaboration between the company Track Machines Connected and the research institute Software Competence Center Hagenberg. One of the project aims was to develop a semi-automated and robust data fusion workflow allowing to combine multi-sensor data and to extract the underlying matrix transformation solving the multi-sensor registration problem. In addition, the buildup and transfer of knowledge with respect to 3D point cloud data analysis and registration was desired. Within a highly interactive approach, a semi-automated workflow fulfilling all requirements could be developed, relying on a close collaboration between the partners. The knowledge gained within the project was transferred in multiple partner meetings, leading to a knowledge leap in 3D point cloud data analysis and registration for both parties.

Florian Kromp, Fabian Hinterberger, Datta Konanur, Volkmar Wieser
From Data to Decisions - Developing Data Analytics Use-Cases in Process Industry

Nowadays, large amounts of data are generated in the manufacturing industry. In order to make these data usable for data-driven analysis tasks such as smart data discovery, a suitable system needs to be developed in a multi-stage process - starting with data acquisition and storage, data processing and analysis, suitable definition of use cases and project goals, and finally utilization and integration of the analysis results to the productive system. Experience from different industrial projects shows that close interaction between all these sub-tasks over the whole process and intensive and steady knowledge transfer between domain experts and data experts are essential for successful implementation. This paper proposes a stakeholder-aware methodology for developing data-driven analytics use-cases by combining an optimal project-development strategy with a generic data analytics infrastructure. The focus lies on including all stakeholders in every part of the use-case development. Using the example of a concrete industry project where we work towards a system for monitoring process stability of the whole machinery at the customer side, we show best practice guidance and lessons learned for this kind of digitalization process in industry.

Johannes Himmelbauer, Michael Mayr, Sabrina Luftensteiner
Challenges in Mass Flow Estimation on Conveyor Belts in the Mining Industry: A Case Study

This paper presents a case study in indirect mass flow estimation of bulk material on conveyor belts, based on measuring the electric net energy demand of the drive motor. The aim is to replace traditional expensive measurement hardware, which results in benefits such as lowering overall costs as well as the possibility of working under harsh environmental conditions, such as dust, vibrations, weather, humidity, or temperature fluctuations. The data-driven model uses a dynamic estimation of the idle power in order to take into account time-varying influences. The case study has been developed in close collaboration between industry and scientific partners. Experiences gained from a first field prototype were used and incorporated to create an improved prototype setup, including a modular software infrastructure for automatically capturing all relevant measurement data. We discuss some of the challenges in development, like data quality, as well as our experiences in academia-industry collaboration. The presented case study showcases the importance to bring research into real-world applications for generating technology innovations.

Bernhard Heinzl, Christian Hinterreiter, Michael Roßbory, Christian Hinterdorfer
A Table Extraction Solution for Financial Spreading

Financial spreading is a necessary exercise for financial institutions to break up the analysis of financial data in making decisions like investment advisories, credit appraisals, and more. It refers to the collection of data from financial statements, where their extraction capabilities are largely manual. In today’s fast-paced banking environment, inefficient manual data extraction is a major obstacle, as it is time-consuming and error-prone. In this paper, we, therefore, address the problem of automatically extracting data for Financial Spreading. More specifically, we propose a solution to extract financial tables including Balance Sheet, Income Statement and Cash Flow Statement from financial reports in Portable Document Format (PDF). First, we propose a new extraction diagram to detect and extract financial tables from documents like annual reports; second, we build a system to extract the table using machine learning and post-processing algorithms; and third, we propose an evaluation method for assessing the performance of the extraction system.

Duc-Tuyen Ta, Siwar Jendoubi, Aurélien Baelde
Synthetic Data in Automatic Number Plate Recognition

Machine learning has proven to be an enormous asset in industrial settings time and time again. While these methods are responsible for some of the most impressive technical advancements in recent years, machine learning and in particular deep learning, still heavily rely on big datasets, containing all the necessary information to learn a particular task. However, the procurement of useful data often imposes costly adjustments in production in case of internal collection, or has copyright implications in case of external collection. In some cases, the collection fails due to insufficient data quality, or simply availability. Moreover, privacy can be an ethical as well as a legal concern. A promising approach that deals with all of these challenges is to artificially generate data. Unlike real-world data, purely synthetic data does not prompt privacy considerations, allows for better quality control, and in many cases the number of synthetic datapoints is theoretically unlimited. In this work, we explore the utility of synthetic data in industrial settings by outlining several use-cases in the field of Automatic Number Plate Recognition. In all cases synthetic data has the potential of improving the results of the respective deep learning algorithms, substantially reducing the time and effort of data acquisition and preprocessing, and eliminating privacy concerns in a field as sensitive as Automatic Number Plate Recognition.

David Brunner, Fabian Schmid
An Untold Tale of Scientific Collaboration: SCCH and ACT

In the last two decades, the application of artificial intelligence (AI) in different fields has increased significantly. As an interdisciplinary field, AI methods are improving toward efficiency and applicability due to a vast number of high-quality research and adopting themselves with multiple use cases. Therefore, the industry is investing in AI-based companies to overcome their use cases efficiently. This paper discusses the collaboration between two research-based companies with different expertise, one in the fields of Data and Software Science and the other in tribology. We show how both companies benefit from such collaboration to tackle the problem at hand.

Somayeh Kargaran, Anna-Christina Glock, Bernhard Freudenthaler, Manuel Freudenberger, Martin Jech
On the Creation and Maintenance of a Documentation Generator in an Applied Research Context

Reverse engineering-based documentation generation extracts facts from software artefacts to generate suitable representations in another level of abstraction. Although the tool perspective in documentation generation has been studied before by many others, these studies mostly report on constructive aspects from case studies, e.g. how tools are built and evaluated. However, we believe a long-term perspective is important to cover issues that arise after initial deployment of a tool.In this paper, we present challenges and observations made during prototyping, development and maintenance of a documentation generator in an applied research project. Insights are drawn from different project phases over a period of 4-years and cover topics related to tool implementation as well as topics related to knowledge transfer in an applied research project. A key observation is that the maintenance of the system to be documented often triggers maintenance effort on the documentation generator.

Bernhard Dorninger, Michael Moser, Josef Pichler, Michael Rappl, Jakob Sautter
Towards the Digitalization of Additive Manufacturing

Additive manufacturing (AM) is a trending technology that is being adopted by many companies around the globe. The high level of product customization that this technology can provide, added to its link with key green targets such as the reduction of emissions or materials waste, makes AM a very attractive vehicle towards the transition to more adaptive and sustainable manufacturing. However, such a level of customization and this fast acceptance, raise new needs and challenges on how to monitor and digitalize the AM product life cycles and processes, which are essential features of a flexible factory that address adaptive and first-time-right manufacturing through the exploitation of knowledge gathered with the deep analysis of large amounts of data. How to organize and transfer such amounts of information becomes particularly complex in AM given not just its volume but also its level of heterogeneity. This work proposes a common methodology matching with specific data formats to solve the integration of all the information from AM processes in industrial digital frameworks. The scenario proposed in this work deals with the AM of metallic parts as a specially complex process due to the thermal properties of metals and the difficulties of predicting defects within their manipulation, making metal AM particularly challenging for stability and repeatability reasons but at the same time, a hot topic within AM research in general due to the large impact of such customized production in sectors like aeronautical, automotive, or medical. Also, in this work, we present a dataset developed following the proposed methodology that constitutes the first public available one of multi-process Metal AM components.

Carlos González-Val, Christian Eike Precker, Santiago Muíños-Landín
Twenty Years of Successful Translational Research: A Case Study of Three COMET Centers

The term ‘translational research’ is traditionally applied to medicine referring to a transfer of scientific results into practice, namely from ‘bench to bedside’. Only in recent years there have been attempts to define Translational (research in) Computer Science (TCS), aside from applied or basic research and mere commercialisation. In practice, however, funding programs for academia-industry collaboration in several European countries – like the Austrian COMET Program and its predecessors which include opportunities for Computer Science institutions – already provided a unique framework for TCS for over two decades. Although the COMET Program was initially set up as a means of temporary funding, a majority of the partaking institutions have managed to stay in the program over several funding periods – turning it in a de facto long-term funding framework that provides a successful structure for academia-industry collaboration. How to (i) identify the key factors to the success of individual Competence Centers and (ii) maintain fruitful relationships with industry and other academic partners are the main aims of this paper.

Katja Bühler, Cornelia Travniceck, Veronika Nowak, Edgar Weippl, Lukas Fischer, Rudolf Ramler, Robert Wille
Data Integration, Management, and Quality: From Basic Research to Industrial Application

Data integration, data management, and data quality assurance are essential tasks in any data science project. However, these tasks are often not treated with the same priority as core data analytics tasks, such as the training of statistical models. One reason is that data analytics generate directly reportable results and data management is only the precondition without clear notion about its corporate value. Yet, the success of both aspects is strongly connected and in practice many data science projects fail since too little emphasis is put on the integration, management, and quality assurance of the data to be analyzed.In this paper, we motivate the importance of data integration, data management, and data quality by means of four industrial use cases that highlight key challenges in industrial applied-research projects. Based on the use cases, we present our approach on how to successfully conduct such projects: how to start the project by asking the right questions, and how to apply and develop appropriate tools that solve the aforementioned challenges. To this end, we summarize our lessons learned and open research challenges to facilitate further research in this area.

Lisa Ehrlinger, Christian Lettner, Werner Fragner, Günter Gsellmann, Susanne Nestelberger, Franz Rauchenzauner, Stefan Schützeneder, Martin Tiefengrabner, Jürgen Zeindl
Building a YouTube Channel for Science Communication

Sharing results of research projects with the public and transferring solutions into practice is part of research projects. Typical ways are scientific publications, presentations at industry events, or project websites. However, in the last one and a half years, we have created videos and shared video material on our YouTube channel as a new way of disseminating results in our project setting. We have observed that many research projects do not follow this path, do this only with a very limited number of videos, or provide rather poor quality of the videos. In this article, we want to share our experience and our first steps, together with open issues for new and more videos we are planning to produce. We would also like to encourage discussions in the research community on whether this is a worthwhile direction to pursue further in the future.

Frank Elberzhager, Patrick Mennig, Phil Stüpfert
Introduction of Visual Regression Testing in Collaboration Between Industry and Academia

Cyber-physical systems (CPSs) connect the computer world with the physical world. Increasing scale, complexity, and heterogeneity makes it more and more challenging to test them to guarantee reliability, security, and safety. In this paper we report about a successful industry-academia collaboration to improve an existing hardware-in-the-loop (HIL) test system with visual regression testing capabilities. For this purpose, we tried to follow an existing model for technology transfer from academia to industry and show its applicability in the context of CPSs. In addition, we provide a list of special collaboration challenges that should be considered in this context for successful industry-academia collaboration. And we discuss some key success factors of our collaboration, emphasizing that the CPS’s system and expertise knowledge are of great importance.

Thomas Wetzlmaier, Claus Klammer, Hermann Haslauer
Vibration Analysis for Rotatory Elements Wear Detection in Paper Mill Machine

Vibration analysis (VA) techniques have aroused great interest in the industrial sector during the last decades. In particular, VA is widely used for rotatory components failure detection, such as rolling bearings, gears, etc. In the present work, we propose a novel data-driven methodology to process vibration-related data, in order to detect rotatory components failure in advance, using spectral data. Vibration related data is first transformed to the frequency domain. Then, a feature called severity is calculated from the spectra. Based on the relation of this feature with respect to the production condition variables, a specific prediction model is trained. These models are used to estimate the thresholds for the severity values. If the real value of the severity exceeds the estimated threshold, the spectra associated to the severity is analyzed thoroughly, in order to determine whether this data shows any failure evidence or not. The proposed data processing system is validated in a real failure context, using data monitored in a paper mill machine. We conclude that a maintenance plan based on the proposed would enable to predict a failure of a rotatory component in advance.

Amaia Arregi, Iñaki Inza, Iñigo Bediaga
Introducing Data Science Techniques into a Company Producing Electrical Appliances

Industry-academia collaboration in the field of software engineering is posing many challenges. In this paper, we describe our experience in introducing data science techniques into a company producing electrical appliances. During the collaboration, we worked on-site together with the engineers of the company, focusing on steady communication with the domain experts and setting up regular meetings with the stakeholders. The continuous exchange of expertise and domain knowledge was a key factor in our collaboration. This paper presents the adopted collaboration approach, the technology transfer process, the results of the collaboration, and discusses lessons learned.

Tim Kreuzer, Andrea Janes
A Technology Transfer Portal to Promote Industry-Academia Collaboration in South-Tyrol

Technology transfer is a complex and multifaceted activity whose main goal is to promote academic knowledge transfer from academia to industry. In this context, one of the most challenging parts of technology transfer activities is to inform stakeholders from the industry about the availability of academic results. Traditionally, this occurs through academic publications, and companies with a research department already use this knowledge source. Nonetheless, Small and Medium Enterprises (SMEs) do not often have the time or the resources to study and interpret results from academia. This paper describes a technology transfer Web portal that promotes technology transfer offers in a industry-friendly format. The portal aims at fostering innovation and collaboration between academia and industry.

Roberto Confalonieri, Andrea Janes
Fast and Automatic Object Registration for Human-Robot Collaboration in Industrial Manufacturing

We present an end-to-end framework for fast retraining of object detection models in human-robot-collaboration. Our Faster R-CNN based setup covers the whole workflow of automatic image generation and labeling, model retraining on-site as well as inference on a FPGA edge device. The intervention of a human operator reduces to providing the new object together with its label and starting the training process. Moreover, we present a new loss, the intraspread-objectosphere loss, to tackle the problem of open world recognition. Though it fails to completely solve the problem, it significantly reduces the number of false positive detections of unknown objects.

Manuela Geiß, Martin Baresch, Georgios Chasparis, Edwin Schweiger, Nico Teringl, Michael Zwick

Distributed Ledgers and Related Technologies

Frontmatter
Sending Spies as Insurance Against Bitcoin Pool Mining Block Withholding Attacks

Theoretical studies show that a block withholding attack is a considerable weakness of pool mining in Proof-of-Work consensus networks. Several defense mechanisms against the attack have been proposed in the past with a novel approach of sending sensors suggested by Lee and Kim in 2019. In this work we extend their approach by including mutual attacks of multiple pools as well as a deposit system for miners forming a pool. In our analysis we show that block withholding attacks can be made economically irrational when miners joining a pool are required to provide deposits to participate which can be confiscated in case of malicious behavior. We investigate minimal thresholds and optimal deposit requirements for various scenarios and conclude that this defense mechanism is only successful, when collected deposits are not redistributed to the miners.

Isamu Okada, Hannelore De Silva, Krzysztof Paruch
Risks in DeFi-Lending Protocols - An Exploratory Categorization and Analysis of Interest Rate Differences

According to well-established principles of risk management, interest rates reflect different levels of risk. Understanding this level of risk is crucial for investors when making investment decisions. In this paper, risk factors in DeFi-lending are identified and categorized within a framework which is not only focused on technical but also on financial aspects. In a subsequent step, the influence of unsystematic risk factors on interest rates are explored to tentatively assess the validity of the literature-derived framework. Our observations indicate that operational risks emanating from the underlying layer-1-solution do seem to have a strong influence. Furthermore, first indications for the validity of scalability challenges and smart contract risks were found. For oracle risks as well as governance risks our approach yielded no results.

Marco Huber, Vinzenz Treytl
Battling the Bullwhip Effect with Cryptography

In real-world supply chains it is often observed that orders placed with suppliers tend to fluctuate more than sales to customers and that this deviation builds up in the upstream direction of the supply chain. This bullwhip effect arises because local decision-making based on orders of the immediate customer leads to overreaction. Literature shows that supply chain wide sharing of order or inventory information can help to stabilize the system and reduce inventories and stockouts. However, sharing this information can make a stakeholder vulnerable in other areas like the bargaining over prices. To overcome this dilemma we propose the usage of cryptographic methods like secure multiparty computation or homomorphic encryption to compute and share average order/inventory levels without leaking of sensitive data of individual actors. Integrating this information into the stylized beer game supply chain model, we show that the bullwhip effect is reduced also under this limited information sharing. Besides presenting results regarding the savings in supply chain costs achieved, we describe how blockchain technology can be used to implement such a novel supply chain management system.

Martin Hrušovský, Alfred Taudes
Reporting of Cross-Border Transactions for Tax Purposes via DLT

Finding the right balance between effective cross-border exchange of tax information and limiting tax authorities access to sensitive data of foreign taxpayers is among the key issues for international tax policy. The legal protection of private and commercially sensitive information, as well as the need to demonstrate that data is foreseeably relevant before requesting it are amongst the main backstops to having a symmetrical data flow between purely domestic and cross-border tax information. Our paper suggests a technological solution that strikes a balance between privacy and cross-border transparency. All taxpayers involved in a cross-border transaction need to report the transactional data to their domestic tax authorities. The tax authorities transform the standardized transactional data with a hashing algorithm and upload the resulting hash to a shared, permissioned blockchain platform. If both parties to a transaction reported it to their domestic authorities, two identical hashes would appear on the blockchain, raising no concern. If one of the parties fails to report, only one hash would appear, demonstrating non-reporting on one side of the border. This would give sufficient grounds to consider that the sensitive information underpinning the transaction is foreseeably relevant for establishing tax liability leading to traditional exchange of information between the authorities. Based on this, a failure to report would be detected both for income and VAT purposes thereby substantially reducing the possibility for tax evasion.

Ivan Lazarov, Quentin Botha, Nathalia Oliveira Costa, Jakob Hackel
Securing File System Integrity and Version History Via Directory Merkle Trees and Blockchains

In our data-driven world, the secure storage of information becomes more and more important. Digital data is especially affected by this topic, as digital records can be simply manipulated in the absence of special securing mechanisms. Hence, for digital archives, a verifiable mechanism to guarantee data integrity is of great importance. While it must be able to rule out manipulation, in many scenarios data updates are desirable. In this case, the version history of data must be traceable. In this paper, we propose an approach based on blockchains and Merkle Trees that fulfills both criteria: It provides a verifier with a proof of data integrity while allowing traceability of changes in the stored data.

Andreas Lackner, Seyed Amid Moeinzadeh Mirhosseini, Stefan Craß
Taxation of Blockchain Staking Rewards: Propositions Based on a Comparative Legal Analysis

Blockchain technology is seen as an essential part of a decentralized society. Given the relatively high energy consumption of proof-of-work consensus algorithms, proof-of-stake blockchains are becoming increasingly popular to reach consensus on the current state of the blockchain. With proof-of-stake, a randomized process determines which node is allowed to validate the next block of transactions. In order to participate, coin holders must stake their coins and receive staking rewards in return. Yet, there still is high uncertainty around the taxation of these staking rewards. This analysis compares the taxation of staking rewards in Germany, Austria, and Switzerland and derives four propositions under the premise of reaching a higher degree of tax neutrality. The legal comparison illustrates the heterogeneity in terms of taxing staking rewards. To achieve a more neutral taxation, (1) staking should only qualify as a business activity under clearly defined and restrictive circumstances. (2) Whenever staking does not represent a main business activity, staking rewards should not be taxed upon receipt. (3) Instead, they should be taxed upon disposal. (4) Moreover, staking should not cause further tax consequences for staked coins.

Pascal René Marcel Kubin
Comparison Framework for Blockchain Interoperability Implementations

Blockchain interoperability has gained importance in practice, is increasingly discussed in literature, and serves as basis for new use cases such as manufacturing and financial services. However, many of the blockchain interoperability solutions discussed in literature are still in the design phase, are unpopular or have a small developer community. Therefore, this study proposes a comparison framework and examines implemented public blockchain interoperability solutions, focusing on data from published GitHub repositories. The results show that these implementations vary significantly in terms of popularity, their developer communities as well as their source code, indicating differences in quality. The insights gained in this work facilitate the selection of an appropriate implementation to enable blockchain interoperability use cases.

Alexander Neulinger

Cyber-security and Functional Safety in Cyber-physical Systems

Frontmatter
Towards Strategies for Secure Data Transfer of IoT Devices with Limited Resources

Many Cyber Physical Systems (CPSs) and Internet of Things (IoT) devices are constrained in terms of computation speed, memory, power, area and bandwidth. As they interact with the physical world, various aspects such as safety, security, and privacy should be considered while processing personal data. Systems should continue operating even under harsh conditions and when the network connections (e.g., to the cloud) are lost. If that happens and the storage capacity is limited, sensor data may be overwritten irrevocably. This paper presents preliminary ideas and the planned research methodology to examine and define strategies to secure the data transfer from IoT devices which have limitations to edge devices and the cloud, and to overcome the situation when a device loses its connection, to mitigate data loss.

Nasser S. Albalawi, Michael Riegler, Jerzy W. Rozenblit
Application of Validation Obligations to Security Concerns

Our lives become increasingly dependent on safety- and security-critical systems, so formal techniques are advocated for engineering such systems. One of such techniques is validation obligations that enable formalizing requirements early in development to ensure their correctness. Furthermore, validation obligations help hold requirements consistent in an evolving model and create assurances about the model’s completeness. Although initially proposed for safety properties, this paper shows how the technique of validation obligations enables us to also reason about security concerns through an example from the medical domain.

Sebastian Stock, Atif Mashkoor, Alexander Egyed
Mode Switching for Secure Edge Devices

Many devices in various domains operate in different modes. We have suggested to use mode switching for security purposes to make systems more resilient when vulnerabilities are known or when attacks are performed. We will demonstrate the usefulness of mode switching in the context of industrial edge devices. These devices are used in the industry to connect industrial machines like cyber-physical systems to the Internet and/or the vendor’s network to allow condition monitoring and big data analytics. The connection to the Internet poses security threats to edge devices and, thus, to the machines they connect to. In this paper (i) we suggest a multi-modal architecture for edge devices; (ii) we present an application scenario; and (iii) we show first reflections on how mode switching can reduce attack surfaces and, thus, increase resilience.

Michael Riegler, Johannes Sametinger, Christoph Schönegger

Machine Learning and Knowledge Graphs

Frontmatter
A Lifecycle Framework for Semantic Web Machine Learning Systems

Semantic Web Machine Learning Systems (SWeMLS) characterise applications, which combine symbolic and subsymbolic components in innovative ways. Such hybrid systems are expected to benefit from both domains and reach new performance levels for complex tasks. While existing taxonomies in this field focus on building blocks and patterns for describing the interaction within the final systems, typical lifecycles describing the steps of the entire development process have not yet been introduced. Thus, we present our SWeMLS lifecycle framework, providing a unified view on Semantic Web, Machine Learning, and their interaction in a SWeMLS. We further apply the framework in a case study based on three systems, described in literature. This work should facilitate the understanding, planning, and communication of SWeMLS designs and process views.

Anna Breit, Laura Waltersdorfer, Fajar J. Ekaputra, Tomasz Miksa, Marta Sabou
Enhancing TransE to Predict Process Behavior in Temporal Knowledge Graphs

Temporal knowledge graphs allow to store process data in a natural way since they also model the time aspect. An example for such data are registration processes in the area of intellectual property protection. A common question in such settings is to predict the future behavior of a (yet unfinished) process. However, traditional process mining techniques require structured data, which is typically not available in this form in such communication-intensive domains. In addition, there exists a number of knowledge graph embedding methods based on neural networks, which are too performance-demanding for large real-world graphs. In this paper, we propose several extensions for preprocessing process data that will be embedded in the traditional triple-based TransE knowledge graph embedding model to predict process behavior in temporal knowledge graphs. We evaluate our approach by means of a real-world trademark registration process in a patent office and show its improved performance compared to the TransE base model.

Aleksei Karetnikov, Lisa Ehrlinger, Verena Geist
An Explainable Multimodal Fusion Approach for Mass Casualty Incidents

During a Mass Casualty Incident, it is essential to make effective decisions to save lives and nursing the injured. This paper presents a work in progress on the design and development of an explainable decision support system, intended for the medical personnel and care givers, that capitalises on multiple modalities to achieve situational awareness and pre-hospital life support. Our novelty is two-fold: first, we use state-of-the-art techniques for combining static and time-series data in deep recurrent neural networks, and second we increase the trustworthiness of the system by enriching it with neurosymbolic explainable capabilities.

Zoe Vasileiou, Georgios Meditskos, Stefanos Vrochidis, Nick Bassiliades

Time Ordered Data

Frontmatter
Log File Anomaly Detection Based on Process Mining Graphs

In process industry, it is quite common that manufacturing machines repeat a certain order of process steps consistently. These process steps are often declared in programs and have a linear workflow. During the production process, logs are generated for monitoring and further analysis, where not only production steps but also incidents and other deviations are logged. As the manual monitoring of such processes is quite time consuming and tedious, the demand for automatic anomaly and deviation detection is rising. A potential approach is the usage of Process Mining, whereat not all requirements are met. In this paper, we propose a new approach based on spectral gap analysis for the detection of anomalies in log files using the adjacency matrix generated by Process Mining techniques. Furthermore, the experiments section covers their application on a linear process and non-linear processes with deviating paths.

Sabrina Luftensteiner, Patrick Praher
A Scalable Microservice Infrastructure for Fleet Data Management

Modern Internet of Things solutions using edge devices produce large amounts of raw data. In order to utilize this data, it needs to be processed, aggregated, and categorized to enable decision making for management and end-users. This data management is a non-trivial task, as the computational load is directly proportional to the amount of data. In order to tackle this issue, we provide an extensible and scalable microservice architecture that can receive, normalize, and filter the raw data and persist it in different levels of aggregation, as well as for time series analysis.

Rainer Meindl, Konstantin Papesh, David Baumgartner, Emmanuel Helm
Learning Entropy: On Shannon vs. Machine-Learning-Based Information in Time Series

The paper discusses the Learning-based information ( $${\varvec{L}}$$ L ) and Learning Entropy ( $${\varvec{L}}{\varvec{E}}$$ L E ) in contrast to classical Shannon probabilistic Information ( $${\varvec{I}}$$ I ) and probabilistic entropy ( $${\varvec{H}}$$ H ). It is shown that $${\varvec{L}}$$ L corresponds to the recently introduced Approximate Individual Sample-point Learning Entropy ( $${\varvec{A}}{\varvec{I}}{\varvec{S}}{\varvec{L}}{\varvec{E}}$$ A I S L E ). For data series, then, the LE should be defined as the mean value of L that is finally in proper accordance with Shannon's concept of entropy $${\varvec{H}}$$ H . The distinction of $${\varvec{L}}$$ L against $${\varvec{I}}$$ I is explained by the real-time anomaly detection of individual time series data points (states). First, the principal distinction of the information concept of $${\varvec{I}}\boldsymbol{ }{\varvec{v}}{\varvec{s}}.\boldsymbol{ }{\varvec{L}}$$ I v s . L is demonstrated in respect to data governing law that $${\varvec{L}}$$ L considers explicitly (while $${\varvec{I}}$$ I does not). Second, it is shown that $${\varvec{L}}$$ L has the potential to be applied on much shorter datasets than $${\varvec{I}}$$ I because of the learning system being pre-trained and being able to generalize from a smaller dataset. Then, floating window trajectories of the covariance matrix norm, the trajectory of approximate variance fractal dimension, and especially the windowed Shannon Entropy trajectory are compared to $${\varvec{L}}{\varvec{E}}$$ L E on multichannel EEG featuring epileptic seizure. The results on real time series show that $${\varvec{L}}$$ L , i.e., $${\varvec{A}}{\varvec{I}}{\varvec{S}}{\varvec{L}}{\varvec{E}}$$ A I S L E , can be a useful counterpart to Shannon entropy allowing us also for more detailed search of anomaly onsets (change points).

Ivo Bukovsky, Ondrej Budik
Using Property Graphs to Segment Time-Series Data

Digitization of industrial processes requires an ever increasing amount of resources to store and process data. However, integration of the business process including expert knowledge and (real-time) process data remains a largely open challenge. Our study is a first step towards better integration of these aspects by means of knowledge graphs and machine learning. In particular we describe the framework that we use to operate with both: conceptual representation of the business process, and the sensor data measured in the process. Considering the existing limitations of graph data storage in processing large time-series data volumes, we suggest an approach that creates a bridge between a graph database, that models the processes as concepts, and a time-series database, that contains the sensor data. The main difficulty of this approach is the creation and maintenance of the vast number of links between these databases. We introduce the method of smart data segmentation that i) reduces the number of links between the databases, ii) minimizes data pre-processing overhead and iii) integrates graph and time-series databases efficiently.

Aleksei Karetnikov, Tobias Rehberger, Christian Lettner, Johannes Himmelbauer, Ramin Nikzad-Langerodi, Günter Gsellmann, Susanne Nestelberger, Stefan Schützeneder
A Synthetic Dataset for Anomaly Detection of Machine Behavior

Logs have been used in modern software solutions for development and maintenance purposes as they are able to represent a rich source of information for subsequent analysis. A line of research focuses on the application of artificial intelligence techniques on logs to predict system behavior and to perform anomaly detection. Successful industrial applications are rather sparse due to the lack of publicly available log datasets. To fill this gap, we developed a method to synthetically generate a log dataset, which resembles a linear program execution log file. In this paper, the method is described as well as existing datasets are discussed. The generated dataset should enable a possibility for researcher to have a common base for new approaches.

Sabrina Luftensteiner, Patrick Praher
Backmatter
Metadaten
Titel
Database and Expert Systems Applications - DEXA 2022 Workshops
herausgegeben von
Prof. Dr. Gabriele Kotsis
A Min Tjoa
Ismail Khalil
Dr. Bernhard Moser
Prof. Dr. Alfred Taudes
Atif Mashkoor
Johannes Sametinger
Jorge Martinez-Gil
Florian Sobieczky
Lukas Fischer
Rudolf Ramler
Maqbool Khan
Gerald Czech
Copyright-Jahr
2022
Electronic ISBN
978-3-031-14343-4
Print ISBN
978-3-031-14342-7
DOI
https://doi.org/10.1007/978-3-031-14343-4

Premium Partner