Skip to main content
Top

2018 | Book

On the Move to Meaningful Internet Systems. OTM 2018 Conferences

Confederated International Conferences: CoopIS, C&TC, and ODBASE 2018, Valletta, Malta, October 22-26, 2018, Proceedings, Part II

Editors: Hervé Panetto, Christophe Debruyne, Henderik A. Proper, Dr. Claudio Agostino Ardagna, Dumitru Roman, Robert Meersman

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This double volumes LNCS 11229-11230 constitutes the refereed proceedings of the Confederated International Conferences: Cooperative Information Systems, CoopIS 2018, Ontologies, Databases, and Applications of Semantics, ODBASE 2018, and Cloud and Trusted Computing, C&TC, held as part of OTM 2018 in October 2018 in Valletta, Malta.
The 64 full papers presented together with 22 short papers were carefully reviewed and selected from 173 submissions. The OTM program every year covers data and Web semantics, distributed objects, Web services, databases, informationsystems, enterprise workflow and collaboration, ubiquity, interoperability, mobility, grid and high-performance computing.

Table of Contents

Frontmatter
Correction to: ChIP: A Choreographic Integration Process

The chapter was inadvertently published with error and the same has been corrected later.

Saverio Giallorenzo, Ivan Lanese, Daniel Russo

International Conference on Cooperative Information Systems (CoopIS) 2018

Frontmatter
Optimized Container-Based Process Execution in the Cloud

A key challenge for elastic business processes is the resource-efficient scheduling of cloud resources in such a way that Quality-of-Service levels are met. So far, this has been difficult, since existing approaches use a coarse-granular resource allocation based on virtual machines.In this paper, we present a technique that provides fine-granular resource scheduling for elastic processes based on containers. In order to address the increased complexity of the respective scheduling problem, we develop a novel technique called GeCo based on genetic algorithms. Our evaluation demonstrates that in comparison to a baseline that follows an ad hoc approach a cost saving between 32.90% and 47.45% is achieved by GeCo while considering a high service level.

Philipp Waibel, Anton Yeshchenko, Stefan Schulte, Jan Mendling
ChIP: A Choreographic Integration Process

Over the years, organizations acquired disparate software systems, each answering one specific need. Currently, the desirable outcomes of integrating these systems (higher degrees of automation and better system consistency) are often outbalanced by the complexity of mitigating their discrepancies. These problems are magnified in the decentralized setting (e.g., cross-organizational cases) where the integration is usually dealt with ad-hoc “glue” connectors, each integrating two or more systems. Since the overall logic of the integration is spread among many glue connectors, these solutions are difficult to program correctly (making them prone to misbehaviors and system blocks), maintain, and evolve. In response to these problems, we propose ChIP, an integration process advocating choreographic programs as intermediate artifacts to refine high-level global specifications (e.g., UML Sequence Diagrams), defined by the domain experts of each partner, into concrete, distributed implementations. In ChIP, once the stakeholders agree upon a choreographic integration design, they can automatically generate the respective local connectors, which are guaranteed to faithfully implement the described distributed logic. In the paper, we illustrate ChIP with a pilot from the EU EIT Digital project SMAll, aimed at integrating pre-existing systems from government, university, and transport industry.

Saverio Giallorenzo, Ivan Lanese, Daniel Russo
PerceptRank: A Real-Time Learning to Rank Recommender System for Online Interactive Platforms

In highly interactive platforms with continuous and frequent content creation and obsolescence, other factors besides relevance may alter users’ perceptions and choices. Besides, making personalized recommendations in these application domains imposes new challenges when compared to classic recommendation use cases. In fact, the required recommendation approaches should be able to ingest and process continuous streams of data online, at scale and with low latency while making context dependent dynamic suggestions. In this work, we propose a generic approach to deal jointly with scalability, real-time and cold start problems in highly interactive online platforms. The approach is based on several consumer decision-making theories to infer users’ preferences. In addition, it tackles the recommendation problem as a learning-to-rank problem that exploits a heterogeneous information graph to estimate users’ perceived value towards items. Although the approach is addressed to streaming environments, it has been validated in both offline batch and online streaming scenarios. The first evaluation has been carried out using the MovieLens dataset and the latter targeted the news recommendation domain using a high-velocity stream of usage data collected by a marketing company from several large scale online news portals. Experiments show that our proposition meets real world production environments constraints while delivering accurate suggestions and outperforming several state-of-the-art approaches.

Hemza Ficel, Mohamed Ramzi Haddad, Hajer Baazaoui Zghal
Discovering Microservices in Enterprise Systems Using a Business Object Containment Heuristic

The growing impact of IoT and Blockchain platforms on business applications has increased interest in leveraging large enterprise systems as Cloud-enabled microservices. However, large and monolithic enterprise systems are unsuitable for flexible integration with such platforms. This paper presents a technique to support the re-engineering of an enterprise system based on the fundamental mechanisms for structuring its architecture, i.e., business objects managed by software functions and their relationships which influence business object interactions via the functions. The technique relies on a heuristic for deriving business object exclusive containment relationships based on analysis of source code and system logs. Furthermore, the paper provides an analysis of distributing enterprise systems based on the business object containment relationships using the NSGA II software clustering and optimization technique. The heuristics and the software clustering and optimization techniques have been validated against two open-source enterprise systems: SugarCRM and ChurchCRM. The experiments demonstrate that the proposed approach can identify microservice designs which support multiple desired microservice characteristics, such as high cohesion, low coupling, high scalability, high availability, and processing efficiency.

Adambarage Anuruddha Chathuranga De Alwis, Alistair Barros, Colin Fidge, Artem Polyvyanyy
Engineering a Highly Scalable Object-Aware Process Management Engine Using Distributed Microservices

Scalability of information systems has been a research topic for many years and is as relevant as ever with the dramatic increases in digitization of business processes and data. This also applies to process-aware information systems, most of which are currently incapable of scaling horizontally, i.e., over multiple servers. This paper presents the design science artifact that resulted from engineering a highly scalable process management system relying on the object-aware process management paradigm. The latter allows for distributed process execution by conceptually encapsulating process logic and data into multiple interacting objects that may be processed concurrently. These objects, in turn, are represented by individual microservices at run-time, which can be hosted transparently across entire server clusters. We present measurement data that evaluates the scalability of the artifact on a compute cluster, demonstrating that the current prototypical implementation of the run-time engine can handle very large numbers of users and process instances concurrently in single-case mechanism experiments with large amounts of simulated user input. Finally, the development of scalable process execution engines will further the continued maturation of the data-centric business process management field.

Kevin Andrews, Sebastian Steinau, Manfred Reichert
Applying Sequence Mining for Outlier Detection in Process Mining

One of the challenges in applying process mining algorithms on real event data, is the presence of outlier behavior. Such behaviour often leads to complex, incomprehensible, and, sometimes, even inaccurate process mining results. As a result, correct and/or important behaviour of the process may be concealed. In this paper, we exploit sequence mining techniques for the purpose of outlier detection in the process mining domain. Using the proposed approach, it is even possible to detect outliers in case of heavy parallelism and/or long-term dependencies between business process activities. Our method has been implemented in both the ProM- and the RapidProM framework. Using these implementations, we conducted a collection of experiments that show that we are able to detect and remove outlier behaviour in event data. Our evaluation clearly demonstrates that the proposed method accurately removes outlier behaviour and, indeed, improves process discovery results.

Mohammadreza Fani Sani, Sebastiaan J. van Zelst, Wil M. P. van der Aalst
Transparent Execution of Data Transformations in Data-Aware Service Choreographies

Due to recent advances in data science, IoT, and Big Data, the importance of data is steadily increasing in the domain of business process management. Service choreographies provide means to model complex conversations between collaborating parties from a global viewpoint. However, the involved parties often rely on their own data formats. To still enable the interaction between them within choreographies, the underlying business data has to be transformed between the different data formats. The state-of-the-art in modeling such data transformations as additional tasks in choreography models is error-prone, time consuming and pollutes the models with functionality that is not relevant from a business perspective but technically required. As a first step to tackle these issues, we introduced in previous works a data transformation modeling extension for defining data transformations on the level of choreography models independent of their control flow as well as concrete technologies or tools. However, this modeling extension is not executable yet. Therefore, this paper presents an approach and a supporting integration middleware which enable to provide and execute data transformation implementations based on various technologies or tools in a generic and technology-independent manner to realize an end-to-end support for modeling and execution of data transformations in service choreographies.

Michael Hahn, Uwe Breitenbücher, Frank Leymann, Vladimir Yussupov
In the Search of Quality Influence on a Small Scale – Micro-influencers Discovery

Discovery and detection of different social behaviors, such as influence in on-line social networks, have drawn much focus in the current research. While there are many methods tackling the issue of influence evaluation, most of them base on the underline assumption that a large audience is indispensable for an influencer to have much impact. However, in many cases, users with smaller but highly involved audience still are highly impactful. In this work, we target a novel problem of finding micro-influencers – exactly those users that have much influence on others despite of a limited range of followers. Therefore, we propose a new concept of micro-influencers in the context of Social Network Analysis, define the notion and present a flexible method aiming to discover them. The approach is tested on two real-world datasets of Facebook [24] and Pinterest [31]. The established results are promising and demonstrate the usefulness of the micro-influencer-oriented approach for potential applications.

Monika Ewa Rakoczy, Amel Bouzeghoub, Alda Lopes Gancarski, Katarzyna Wegrzyn-Wolska
Execution of Multi-perspective Declarative Process Models

A Process-Aware Information System is a system that executes processes involving people, applications, and data on the basis of process models. At least two process modeling paradigms can be distinguished: procedural models define exactly the execution order of process steps. Declarative process models allow flexible process executions that are restricted by constraints. Execution engines for declarative process models have been extensively investigated in research with a strong focus on behavioral aspects. However, execution approaches for multi-perspective declarative models that involve constraints on data values and resource assignments are still not existing. In this paper, we present an approach for the execution of multi-perspective declarative process models in order to close this gap. The approach builds on a classification strategy for different constraint types evaluating their relevance in different execution contexts. For execution, all constraints are transformed into the execution language Alloy that is used to solve satisfiability (SAT) problems. We implemented a modeling tool including the transformation functionality and the process execution engine itself. The approach has been evaluated in terms of expressiveness and efficiency.

Lars Ackermann, Stefan Schönig, Sebastian Petter, Nicolai Schützenmeier, Stefan Jablonski
Combining Model- and Example-Driven Classification to Detect Security Breaches in Activity-Unaware Logs

Current approaches to the security-oriented classification of process log traces can be split into two categories: (i) example-driven methods, that induce a classifier from annotated example traces; (ii) model-driven methods, based on checking the conformance of each test trace to security-breach models defined by experts. These categories are orthogonal and use separate information sources (i.e. annotated traces and a-priori breach models). However, as these sources often coexist in real applications, both kinds of methods could be exploited synergistically. Unfortunately, when the log traces consist of (low-level) events with no reference to the activities of the breach models, combining (i) and (ii) is not straightforward. In this setting, to complement the partial views of insecure process-execution patterns that an example-driven and a model-driven methods capture separately, we devise an abstract classification framework where the predictions provided by these methods separately are combined, according to a meta-classification scheme, into an overall one that benefits from all the background information available. The reasonability of this solution is backed by experiments performed on a case study, showing that the accuracy of the example-driven (resp., model-driven) classifier decreases appreciably when the given example data (resp., breach models) do not describe exhaustively insecure process behaviors.

Bettina Fazzinga, Francesco Folino, Filippo Furfaro, Luigi Pontieri
CMMI-DEV v1.3 Reference Model in ArchiMate

Reference models allow the verification of existing concepts of a model and how these concepts relate to each other, giving an idea of how a model works. The purpose of this paper is to address the problem of the perceived complexity of Capability Maturity Model Integration (CMMI) by proposing a graphical reference model using ArchiMate as the chosen Enterprise Architecture (EA) modeling language. This paper will focus on the part of CMMI related to the development of both products and services, more known as CMMI-DEV in the version 1.3. With ArchiMate as the EA modeling language, we develop using the Design Science Research Methodology (DSRM) the CMMI-DEV v1.3 reference model to reduce the perceived complexity of the framework by representing their concepts and relationships with graphical concepts of ArchiMate. In this paper, we demonstrate our proposed reference model (artifact) with the use of an EA of an organization and evaluate it with well know techniques to evaluate design science artifacts. The paper concludes with some findings and future work on this topic.

Luís Valverde, Miguel Mira da Silva, Margarida Rolão Gonçalves
Data Analytics Challenges in Industry 4.0: A Case-Based Approach

Creating business value with data analytics is a continuous process that requires to effectively consider the design and deployment of powerful analytics solutions. This requires a significant effort in understanding, customizing, assembling and adapting these solutions to the specific environment. However, this might be different from one context to another. The objective of this paper is to discuss the use of data analytics in Industry 4.0 by harvesting some challenges and lessons-learnt. A case-based approach is followed, as a research methodology to explore and understand complex and common issues related to data analytics. Scalability, interoperability and standardization are among the topics that are reviewed.

Manel Brichni, Wided Guedria
Speech Acts Featuring Decisions in Knowledge-Intensive Processes

A Knowledge-Intensive Process (KiP) is specified as a composition of a set of prospective activities (events) whose execution contributes to achieving a goal and whose control-flow, at the instance level, typically presents a high degree of variability among its several past executions. Variability is a consequence of a combination of decision points and informal interactions among participants on collaborative and innovative activities. These interactions may occur through message exchange, thus understanding the interplay of illocutionary acts within messages may bring insights on how participants make decisions. In this paper, we propose mechanisms that identify speech acts in the set of messages that mostly lead to decision points in a KiP providing an understanding of conversational patterns. We empirically evaluate our proposal considering data from a company that provides IT services to several customers.

Tatiana Barboza, Pedro Richetti, Fernanda Baião, Flavia Maria Santoro, João Carlos Gonçalves, Kate Revoredo, Anton Yeshchenko

Cloud and Trusted Computing (C&TC) 2018

Frontmatter
Latency-Aware Placement Heuristic in Fog Computing Environment

With the rise of IoT applications popularity, a new paradigm has emerged so-called Fog Computing. To facilitate their deployment on fog nodes, IoT applications are decomposed into a set of modules. These modules interact with each other in order to achieve a global goal. Placing these modules without a prior strategy may affect the overall performance of the application. Moreover, the restricted capacity of the fog nodes vis-a-vis the modules’ requirements arises the problem of placement. In this paper, we focus on minimizing the overall latency of the application while placing modules on fog nodes. In order to address the module placement problem, we propose both exact and approximate solutions. Experiments were conducted using CPLEX and iFogSim-simulated Fog environment respectively. The results show the effectiveness of our final approach.

Amira Rayane Benamer, Hana Teyeb, Nejib Ben Hadj-Alouane
Requirements for Legally Compliant Software Based on the GDPR

We identify 74 generic, reusable technical requirements based on the GDPR that can be applied to software products which process personal data. The requirements can be traced to corresponding articles and recitals of the GDPR and fulfill the key principles of lawfulness and transparency. Therefore, we present an approach to requirements engineering with regard to developing legally compliant software that satisfies the principles of privacy by design, privacy by default as well as security by design.

Sandra Domenique Ringmann, Hanno Langweg, Marcel Waldvogel
A Review of Distributed Ledger Technologies

Recently the race toward trusted distributed systems has attracted a huge interest, mostly due to the advances in crypto-currencies platforms such as Bitcoin. Currently, different Distributed Ledger Technologies (DLTs) are competing to demonstrate their capabilities and show how they can overcome the limitations faced by others. The common denominator among all distributed ledger technologies is their reliance on a distributed, decentralized peer-to-peer network and a set of modular mechanisms such as cryptographic hashes and consensuses mechanisms. However, their implementations vary substantially in terms of the used data structure, fault tolerance and consensus approaches. This divergence affects the nature of each instance of the DLT in terms of cost, security, latency and performance. In this paper, we present a snapshot of four existing implementations of DLTs. The particularities of each technology and an initial comparison between them is discussed.

Nabil El Ioini, Claus Pahl
A Complete Evaluation of the TAM3 Model for Cloud Computing Technology Acceptance

In this work, we examine the technology acceptance of Cloud computing by using the third iteration of the technology acceptance model, from now on referred to as TAM3. TAM is a well established methodology widely used for the acceptance of technology. Empirical data was analyzed from 138 Cloud developers, IT professionals and managers using factor analysis. The results indicate that user acceptance of Cloud computing can be explained and predicted by variables concerning Perceived Usefulness, Perceived Ease of Use, Subjective Norm, Job Relevance, Image, Output quality, Result Demonstrability, Experience, Computer Self-Efficacy, Perception of external control, Cloud Anxiety, Perceived Enjoyment, Voluntariness, Intention to Use. These results further advance the theory and add to the bases for further research targeted at enhancing our knowledge of technology adoption for Cloud computing. They also provide a first base for companies and governments on how to adopt and successfully integrate Cloud technologies and specifically how users adopt Cloud technologies according to organization size, type and employee job role.

Fotios Nikolopoulos, Spiridon Likothanassis
Towards the Blockchain Technology for Ensuring the Integrity of Data Storage and Transmission

Ensuring the security and integrity of data storage and transmission is a big challenge. One of the methods of its provision is the use of cryptographic techniques, which unfortunately require an additional time associated with the need for data encryption. The paper presents the solutions that ensure the data integrity, using blockchain technology. It describes two cases – an end-to-end verifiable blockchain-based electronic voting system with the ability to follow and verify votes and election results and a lightweight blockchain based protocol for secure data transfer to ensure the integrity of transferred data with a minimal cryptographic overhead.

Michał Pawlak, Jakub Guziur, Aneta Poniszewska-Marańda
A Taxonomy of Security as a Service

With the evolving expansion of threat landscape (i.e., internal and external) and the growing shortage of cybersecurity resources (i.e., tools and skills), Security as a Service (SecaaS) is gaining a momentum to fill this pressing gap. In this paper, we propose a taxonomy of existing research work in SecaaS. The taxonomy explores the current state-of-the-art in SecaaS to reason about SecaaS work with respect to three main dimensions: service operation, security solution, and threat. This taxonomy enables the SecaaS consumers and researchers to better differentiate among existing approaches and assess if they meet their security needs.

Marwa Elsayed, Mohammad Zulkernine

International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE) 2018

Frontmatter
Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.

Daniel Vasile, Thomas Lukasiewicz
Generating Executable Mappings from RDF Data Cube Data Structure Definitions

Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.

Christophe Debruyne, Dave Lewis, Declan O’Sullivan
Optimization of Queries Based on Foundational Ontologies

Using ontologies for enterprise data integration brings an opportunity to use them for efficient data query evaluation. The rationale is that ontologies represent common-sense structures that are reflected also in queries posed by data users and thus can be used for query optimization. This paper presents how proper foundational-ontology-based knowledge can be used to design a generic index, which can help in answering a wide range of queries compliant with the foundational ontology. We discuss several indexing techniques and evaluate our proposal for different UFO-based queries extracted from different query sets.

Jana Ahmad, Petr Křemen, Martin Ledvinka
Semantically Enhanced Interoperability in Health Emergency Management

Health Emergency Management is a domain which involves a number of stakeholders operating under different protocols, rules, and languages forming a complex world where incident coordination and decision making is a vital requirement. The data that is used during a health emergency include heterogeneous datasets from various data sources. Thus, data harmonization techniques must be adopted against a common reference schema to assure data consistency. Moreover, the need for interoperability between the involved agencies at national and international level is strong. Currently, there is no reference schema which captures all the dimensions of the Health Emergency Management domain and also aligned with the common incident management interoperability protocols. The HERMES Semantic Model consists of an ontological representation of the conceptual model of the Health Emergency Management domain and aims at: (a) providing an integral conceptual model of Health Emergency Management covering all the involved knowledge domains and, (b) addressing the aforementioned interoperability and integration issues. HERMES reuses existing ontologies and offers a new upper model, a set of vertical models and a data facet. The model is used by a specific mechanism which imports data from the various resources in order to provide an integrated and homogenized view of the data. The final harmonized data may be used by various incident management platforms to assist in decision making during an emergency. Finally, the model and the data harmonization procedure are evaluated using open data from open data repositories. The results of the evaluation verify the correctness of the approach.

Danai Vergeti, Dimitrios Ntalaperas, Dimitrios Alexandrou
Explanation of Action Plans Through Ontologies

In recent years, more and more AI systems have been included in various aspects of human life, forming human-machine partnership and collaboration. The term Digital Companion can be referred to the embodiment of AI as human’s co-worker. Explanations why the AI arrived at specific decisions will be highly beneficial in enabling AI to operate more robustly, clarifying to the user why the AI brought certain choices, and significantly increase the trust between humans and AI. A number of symbolic planners exist, which use heuristic search methods to come up with a sequence of actions to reach a certain goal. So far the explanations to why a planner follows certain decision making series are mostly embedded within the planner’s operating style, composing so called glass box explanations. The integration of AI Planning (using PDDL) and Ontologies (using OWL) gives the possibility to use reasoning and generate explanations why, subsequently why-not, certain actions were considered by the AI planner, without relying on the planner’s functionality. An extended knowledge base is proportional to aiding the construct of more precise clarifications of the decision making process of the planner. In this paper we present a general architecture for black box plan explanations independent of the nature of the planner and illustrate the approach of integrating PDDL and OWL, as well as using justifications in ontologies to explain why a planner has taken certain actions.

Ivan Gocev, Stephan Grimm, Thomas A. Runkler
Rule Module Inheritance with Modification Restrictions

Adapting rule sets to different settings, yet avoiding uncontrolled proliferation of variations, is a key challenge of rule management. One fundamental concept to foster reuse and simplify adaptation is inheritance. Building on rule modules, i.e., rule sets with input and output schema, we formally define inheritance of rule modules by incremental modification in single inheritance hierarchies. To avoid uncontrolled proliferation of modifications, we introduce formal modification restrictions which flexibly regulate the degree to which a child module may be modified in comparison to its parent. As concrete rule language, we employ Datalog $$^\pm $$ which can be regarded a common logical core of many rule languages. We evaluate the approach by a proof-of-concept prototype.

Felix Burgstaller, Bernd Neumayr, Emanuel Sallinger, Michael Schrefl
Computing Exposition Coherence Across Learning Resources

With increasing numbers of open learning resources on the web that are created and published independently by different sources, stringing together coherent learning pathways is a challenging task. Coherence in this context means the semantic “smoothness” of transition from one learning resource to the next, i.e., the change in topic distribution and exposition styles between consecutive resources is minimal, and the overall sequence of resources together provides a good learning experience. Towards this end, we present a model to compute exposition coherence between a pair of learning resources, based on representing exposition styles in the form of a random walk. It is based on an underlying hypothesis about exposition styles modelled as a sequence of topical entailments. Evaluation of the presented model on the dataset of learning pathways curated by the teachers of the educational platform Gooru.org, show promising results.

Chaitali Diwan, Srinath Srinivasa, Prasad Ram
STypeS: Nonrecursive Datalog Rewriter for Linear TGDs and Conjunctive Queries

We present STypeS, a system that rewrites ontology-mediated queries with linear tuple-generating dependencies and conjunctive queries to equivalent nonrecursive datalog (NDL) queries. The main feature of STypeS is that it produces polynomial-size rewritings whenever the treewidth of the input conjunctive queries and the size of the chases for the ontology atoms as well as their arity are bounded; moreover, the rewritings can be constructed and executed in LogCFL, indicating high parallelisability in theory. We show experimentally that Apache Flink on a cluster of machines with 20 virtual CPUs is indeed able to parallelise execution of a series of NDL-rewritings constructed by STypeS, with the time decreasing proportionally to the number of CPUs available.

Stanislav Kikot, Roman Kontchakov, Salvatore Rapisarda, Michael Zakharyaschev
Knowledge Authoring for Rule-Based Reasoning

Modern knowledge bases have matured to the extent of being capable of complex reasoning at scale. Unfortunately, wide deployment of this technology is still hindered by the fact that specifying the requisite knowledge requires skills that most domain experts do not have, and skilled knowledge engineers are in short supply. A way around this problem could be to acquire knowledge from text. However, the current knowledge acquisition technologies for information extraction are not up to the task because logic reasoning systems are extremely sensitive to errors in the acquired knowledge, and existing techniques lack the required accuracy by too large of a margin. Because of the enormous complexity of the problem, controlled natural languages (CNLs) were proposed in the past, but even they lack high enough accuracy. Instead of tackling the general problem of text understanding, our interest is in a related, but different, area of knowledge authoring—a technology designed to enable domain experts to manually create formalized knowledge using CNL. Our approach adopts and formalizes the FrameNet methodology for representing the meaning, enables incrementally-learnable and explainable semantic parsing, and harnesses rich knowledge graphs like BabelNet in the quest to obtain unique, disambiguated meaning of CNL sentences. Our experiments show that this approach is 95.6% accurate in standardizing the semantic relations extracted from CNL sentences—far superior to alternative systems.

Tiantian Gao, Paul Fodor, Michael Kifer
On Generating Stories from Semantically Annotated Tourism-Related Content

In online marketing communication, publication consistency and content diversity are two important factors for marketing success. Especially in the tourism industry, having a strong online presence through the dissemination of high-quality content is highly desired. A method to maintain these two factors is by collecting and remixing various user-generated contents available on the Web and presenting them more interestingly. This method, known as content curation, has been widely used in social media. Multiple social media content can be aggregated for further consumption, for instance by listing them in historical order or grouping them according to particular topics. While the amount of user-generated content available on the Web is continuously increased, finding and selecting content to be mixed into a meaningful story are mainly performed manually by humans. These are challenging tasks due to the vast amount of accessible content on the Web, presented in various formats, and available in distributed sources. In this paper, we propose a method to automatically generate stories in the tourism industry by leveraging rule-based system over a collection of semantically annotated content. The method utilizes data dynamics of annotations, detected through a rule-based system, to identify the relevant content to be selected and mixed. We evaluated our method with a collection of semantically annotated tourism-related content from the region of Tyrol, Austria with promising results.

Zaenal Akbar, Anna Fensel, Dieter Fensel
A Big Linked Data Toolkit for Social Media Analysis and Visualization Based on W3C Web Components

Social media generates a massive amount of data at a very fast pace. Objective information such as news, and subjective content such as opinions and emotions are intertwined and readily available. This data is very appealing from both a research and a commercial point of view, for applications such as public polling or marketing purposes. A complete understanding requires a combined view of information from different sources which are usually enriched (e.g. sentiment analysis) and visualized in a dashboard.In this work, we present a toolkit that tackles these issues on different levels: (1) to extract heterogeneous information, it provides independent data extractors and web scrapers; (2) data processing is done with independent semantic analysis services that are easily deployed; (3) a configurable Big Data orchestrator controls the execution of extraction and processing tasks; (4) the end result is presented in a sensible and interactive format with a modular visualization framework based on Web Components that connects to different sources such as SPARQL and ElasticSearch endpoints. Data workflows can be defined by connecting different extractors and analysis services. The different elements of this toolkit interoperate through a linked data principled approach and a set of common ontologies. To illustrate the usefulness of this toolkit, this work describes several use cases in which the toolkit has been successfully applied.

J. Fernando Sánchez-Rada, Alberto Pascual, Enrique Conde, Carlos A. Iglesias
Modeling Industrial Business Processes for Querying and Retrieving Using OWL+SWRL

Process modeling forms a core activity in many organizations in which different entities and stakeholders interact for smooth operation and management of enterprises. There have been few work on semantically labeling business processes using OWL-DL that formalize business process structure and query them. However, all these methods suffer from few limitations such as lack of a modular approach of ontology design, no guarantee of a consistent ontology development with TBox and ABox axioms and no provision of combining control flow relations of the main process and its sub-processes. In this work, we propose an approach for labeling and specifying business processes by using hybrid programs which offers modular ontology design, consistent ontology design of each module and unified control flow for process and its sub-processes. This formalism of hybrid programs integrates ontology specified in OWL-DL with SWRL (Semantic Web Rules Language) rules. Further we report on our experimental effort on modeling industrial business processes with this hybrid formalism. We also present a case study of an industrial business process to illustrate our modeling approach which can aid in business knowledge management.

Suman Roy, Gabriel Silvatici Dayan, V. Devaraja Holla
Understanding Information Professionals: A Survey on the Quality of Linked Data Sources for Digital Libraries

In this paper we provide an in-depth analysis of a survey related to Information Professionals (IPs) experiences with Linked Data quality. We discuss and highlight shortcomings in linked data sources following a survey related to the quality issues IPs find when using such sources for their daily tasks such as metadata creation.

Jeremy Debattista, Lucy McKenna, Rob Brennan
Challenges in Value-Driven Data Governance

Data is quite popularly considered to be the new oil since it has become a valuable commodity. This has resulted in many entities and businesses that hoard data with the aim of exploiting it. Yet, the ‘simple’ exploitation of data results in entities who are not obtaining the highest benefits from the data, which as yet is not considered to be a fully-fledged enterprise asset. Such data can exist in a duplicated, fragmented, and isolated form, and the sheer volume of available data further complicates the situation. Issues such as the latter highlight the need for value-based data governance, where the management of data assets is based on the quantification of the data value. This paper has the purpose of creating awareness and further understanding of challenges that result in untapped data value. We identify niches in related work, and through our experience with businesses who use data assets, we here analyse four main context-independent challenges that hinder entities from achieving the full benefits of using their data. This will aid in the advancement of the field of value-driven data governance and therefore directly affect data asset exploitation.

Judie Attard, Rob Brennan
Automatic Extraction of Data Governance Knowledge from Slack Chat Channels

This paper describes a data governance knowledge extraction prototype for Slack channels based on an OWL ontology abstracted from the Collibra data governance operating model and the application of statistical techniques for named entity recognition. This addresses the need to convert unstructured information flows about data assets in an organisation into structured knowledge that can easily be queried for data governance. The abstract nature of the data governance entities to be detected and the informal language of the Slack channel increased the knowledge extraction challenge. In evaluation, the system identified entities in a Slack channel with precision but low recall. This has shown that it is possible to identify data assets and data management tasks in a Slack channel so this is a fruitful topic for further research.

Rob Brennan, Simon Quigley, Pieter De Leenheer, Alfredo Maldonado
Factors of Efficient Semantic Web Application Development

Creating domain-specific Linked Data applications is a complex endeavor as they need to work with ontological knowledge, consume/produce Linked Data and perform nontrivial business logic. In this work, we analyze several domain-specific Linked Data applications and introduce a set of features which influence the efficiency of development and maintenance of these applications. For each feature, we also list examples of software libraries supporting it.

Martin Ledvinka, Miroslav Blaško, Petr Křemen
Evaluating a Faceted Search Index for Graph Data

We discuss the problem of implementing real-time faceted search interfaces over graph data, specifically the “value suggestion problem” of presenting the user with options that makes sense in the context of a partially constructed query. For queries that include many object properties, this task is computationally expensive. We show that good approximations to the value suggestion problem can be achieved by only looking at parts of queries, and we present an index structure that supports this approximation and is designed to scale gracefully to both very large datasets and complex queries. In a series of experiments, we show that the loss of accuracy is often minor, and additional accuracy can in many cases be achieved with a modest increase of index size.

Vidar Klungre, Martin Giese
Object-Relational Rules for Medical Devices: Classification and Conformity

This work focuses on formalizing the rules enacted by Regulation (EU) 2017/745, for risk-based classification and for class-based conformity assessment options regarding medical devices marketability, in Positional-Slotted Object-Applicative (PSOA) RuleML. The knowledge base represents knowledge by integrating F-logic-like frames with Prolog-like relationships for atoms used as facts and in the conditions and conclusions of rules. We tested this open-source knowledge base by querying it in the open-source PSOATransRun system which provided a feedback loop for refinement and extension. This can support the licensing process for stakeholders and the registration of medical devices with a CE conformity mark.

Sofia Almpani, Petros Stefaneas, Harold Boley, Theodoros Mitsikas, Panayiotis Frangos
Reducing the Cost of the Linear Growth Effect Using Adaptive Rules with Unlinking and Lazy Rule Evaluation

The match cost of Rete [8] increases significantly and approximately linearly with the number of rules [2]. A major part of that cost is the eager creation of cross products within the join nodes in an attempt to materialize rule instantiations. This paper builds on the idea of adaptive rules [1] using the unlinking of nodes, segments of nodes and rules to delay join attempts, which helps mitigate some aspects of the linear growth effect. By delaying the evaluation of a rule until after it’s linked and restricting the propagation to a single path, a lazy goal-driven evaluation behaviour is introduced. The algorithm also preserves node sharing by organising the network into segments and paths of segments; with memory now at node, segment and path levels. This paper presents the design and implementation of this work within the popular Open Source Drools rule engine. Drools also provides a baseline Rete implementation, without these enhancements, against which this work can be benchmarked. The evaluation of the results shows positive improvements over Rete, with no downsides.

Mark Proctor, Mario Fusco, Davide Sottara, Tibor Zimányi
Backmatter
Metadata
Title
On the Move to Meaningful Internet Systems. OTM 2018 Conferences
Editors
Hervé Panetto
Christophe Debruyne
Henderik A. Proper
Dr. Claudio Agostino Ardagna
Dumitru Roman
Robert Meersman
Copyright Year
2018
Electronic ISBN
978-3-030-02671-4
Print ISBN
978-3-030-02670-7
DOI
https://doi.org/10.1007/978-3-030-02671-4

Premium Partner