Skip to main content
Top

2017 | Book

On the Move to Meaningful Internet Systems. OTM 2017 Conferences

Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, October 23-27, 2017, Proceedings, Part II

Editors: Hervé Panetto, Christophe Debruyne, Walid Gaaloul, Mike Papazoglou, Prof. Dr. Adrian Paschke, Dr. Claudio Agostino Ardagna, Robert Meersman

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This double volumes LNCS 10573-10574 constitutes the refereed proceedings of the Confederated International Conferences: Cooperative Information Systems, CoopIS 2017, Ontologies, Databases, and Applications of Semantics, ODBASE 2017, and Cloud and Trusted Computing, C&TC, held as part of OTM 2017 in October 2017 in Rhodes, Greece.
The 61 full papers presented together with 19 short papers were carefully reviewed and selected from 180 submissions. The OTM program every year covers data and Web semantics, distributed objects, Web services, databases, information systems, enterprise workflow and collaboration, ubiquity, interoperability, mobility, grid and high-performance computing.

Table of Contents

Frontmatter

Cloud and Trusted Computing (C&TC) 2017

Frontmatter
Property Preserving Encryption in NoSQL Wide Column Stores

Property preserving encryption (PPE) can enable database systems to process queries over encrypted data. While a lot of research in this area focusses on doing so with SQL databases, NoSQL (Not only SQL) cloud databases are good candidates either. On the one hand, they usually provide enough space to store the typically larger ciphertexts and special indexes of PPE-schemes. On the other hand in contrast to approaches for SQL systems, despite PPE the query expressiveness remains almost unaffected. Thus, in this paper we investigate (i) how PPE can be used in the popular NoSQL sub-category of so-called wide column stores in order to protect sensitive data in the threat model of a persistent honest-but-curious database provider, (ii) what PPE schemes are suited for this task and (iii) what performance levels can be expected.

Tim Waage, Lena Wiese
Towards a JSON-Based Fast Policy Evaluation Framework
(Short Paper)

In this paper we evaluate experimentally the performance of JACPoL, a previously introduced JSON-based access control policy language. The results show that JACPoL requires much less processing time and memory space than XACML by testing generic families of policies expressed in both languages.

Hao Jiang, Ahmed Bouabdallah
Gibbon: An Availability Evaluation Framework for Distributed Databases

Driven by new application domains, the database management systems (DBMSs) landscape has significantly evolved from single node DBMS to distributed database management systems (DDBMSs). In parallel, cloud computing became the preferred solution to run distributed applications. Hence, modern DDBMSs are designed to run in the cloud. Yet, in distributed systems the probability of failures is the higher the more entities are involved and by using cloud resources the probability of failures increases even more. Therefore, DDBMSs apply data replication across multiple nodes to provide high availability. Yet, high availability limits consistency or partition tolerance as stated by the CAP theorem. As the decision for two of the three attributes in not binary, the heterogeneous landscape of DDBMSs gets even more complex when it comes to their high availability mechanisms. Hence, the selection of a high available DDBMS to run in the cloud becomes a very challenging task, as supportive evaluation frameworks are not yet available. In order to ease the selection and increase the trust in running DDBMSs in the cloud, we present the Gibbon framework, a novel availability evaluation framework for DDBMSs. Gibbon defines quantifiable availability metrics, a customisable evaluation methodology and a novel evaluation framework architecture. Gibbon is discussed by an availability evaluation of MongoDB, analysing the take over and recovery time.

Daniel Seybold, Christopher B. Hauser, Simon Volpert, Jörg Domaschka
Locality-Aware GC Optimisations for Big Data Workloads

Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help processing massive amounts of data. In addition, managed runtimes (e.g. JVM) are now widely used to support the execution of these NoSQL storage solutions, particularly when dealing with Big Data key-value store-driven applications. The benefits of such runtimes can however be limited by automatic memory management, i.e., Garbage Collection (GC), which does not consider object locality, resulting in objects that point to each other being dispersed in memory. In the long run this may break the service-level of applications due to extra page faults and degradation of locality on system-level memory caches. We propose, LAG1 (short for Locality-Aware G1), an extension of modern heap layouts to promote locality between groups of related objects. This is done with no previous application profiling and in a way that is transparent to the programmer, without requiring changes to existing code. The heap layout and algorithmic extensions are implemented on top of the Garbage First (G1) garbage collector (the new by-default collector) of the HotSpot JVM. Using the YCSB benchmarking tool to benchmark HBase, a well-known and widely used Big Data application, we show negligible overhead in frequent operations such as the allocation of new objects, and significant improvements when accessing data, supported by higher hits in system-level memory structures.

Duarte Patrício, Rodrigo Bruno, José Simão, Paulo Ferreira, Luís Veiga
FairCloud: Truthful Cloud Scheduling with Continuous and Combinatorial Auctions

With Cloud Computing, access to computational resources has become increasingly facilitated and applications could offer improved scalability and availability. The datacenters that support this model have a huge energy consumption and a limited pricing model. One way of improving energy efficiency is by reducing the idle time of resources - resources are active but serve a limited useful business purpose. This can be done by improving the scheduling across datacenters. We present FairCloud, a scalable Cloud-Auction system that facilitates the allocation by allowing the adaptation of VM requests (through conversion to other VM types and/or resource capping - degradation), depending on the User profile. Additionally, this system implements an internal reputation system, to detect providers with low Quality of Service (QoS). FairCloud was implemented using CloudSim and the extensions CloudAuctions. FairCloud was tested with the Google Cluster Data. We observed that we achieved more quality in the requests while maintaining the CPU Utilization. Our reputation mechanism proved to be effective by lowering the Order on the Providers with lower quality.

Artur Fonseca, José Simão, Luís Veiga
A Novel WebGIS-Based Situational Awareness Platform for Trustworthy Big Data Integration and Analytics in Mobility Context

The availability of big amounts of dynamic data from several sources in mobility context and their real time integration can deliver a picture for emergency management in urban and extra-urban areas. A WebGIS portal is able to support the perception of all elements in current situation. However, in general, during the observation decision maker’s attention capacity is not sufficient to address concerns due to information overload. A situational picture is necessary to go beyond the simple perception of the elements in the environment, supporting the overall comprehension of the current situation and providing predictions and decision support. In this paper we present MAGNIFIER, a WebGIS-based intelligent system for emergency management, to entirely support the real-time situational awareness. Starting from the current situation and by using the practical reasoning model by Bratman, MAGNIFIER is able to suggest the appropriate course of actions to be executed to meet decision maker’s goals.

Susanna Bonura, Giuseppe Cammarata, Rosolino Finazzo, Giuseppe Francaviglia, Vito Morreale
On the Verification of Software Vulnerabilities During Static Code Analysis Using Data Mining Techniques
(Short Paper)

Software assurance analysts deal with thousands of potential vulnerabilities many of which could be false positives during the process of static code analysis. Manual review of all such potential vulnerabilities is tedious, time consuming, and frequently impractical. Several experiments were conducted using a production code base with the aid of a variety of static code analysis tools. A data mining process was created, which employed different classifiers for comparison. Furthermore, a selection process identified the most important features that led to significant improvements in accuracy, precision, and recall, as evidenced by the experimental data. This paper proposes machine learning algorithms to minimize false positives with a high degree of accuracy.

Foteini Cheirdari, George Karabatis

International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE) 2017

Frontmatter
Linked Data and Ontology Reference Model for Infectious Disease Reporting Systems

Linked data and ontologies are already in wide use in many fields. Especially systems based on medical data can be valuably improved by enhancing their contents and meta-models semantically, using ontologies in their backbone. This semantic enhancement brings an add-on value to standard systems, enabling an overall better data management and allowing a more intelligent data processing. In our work we focus on such a standard system, which we enhance semantically, transferring its classic relational models together with its data into a semantic model. This information system processes and analyzes data related to infectious disease reports in Germany. Data from reports on infectious diseases not only contains specific parts of microbiological and medical information, but also a combination of various aspects of contextual knowledge, that is needed in order to take measures preventing a wider spread and reducing further transmissions. In this paper we describe our practical approach for transferring the relational data models into ontologies, establishing an improved data standard for the current system in use. Moreover, we propose a semantic reference model based on different contexts, covering the requirements of semantified data from infectious disease reports.

Olga Streibel, Felix Kybranz, Göran Kirchner
PFSgeo: Preference-Enriched Faceted Search for Geographical Data

In this paper we show how an exploratory search process, specifically the Preference-enriched Faceted Search (PFS) process, can be enriched for exploring datasets that also contain geographic information. In the introduced extension, that we call PFSgeo, the objects can have geographical coordinates, the interaction model is extended, and the web-interface is enriched with a map which the user can use for inspecting and restricting his focus, as well as for expressing preferences. Preference inheritance is supported as well as an automatic scope-based resolution of conflicts. We detail the implementation of the interaction model, elaborate on performance and report the positive results of a task-based evaluation with users. The value of PFSgeo is that it provides a generic and interactive method for aiding users to select the desired option(s) among a set of options that are described by several attributes including geographical ones, and it is the first model that supports map-based preferences.

Panagiotis Lionakis, Yannis Tzitzikas
Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL

There is a multitude of tools for preparation of Linked Data from data sources such as CSV and XML files. These tools usually perform as expected when processing examples, or smaller real world data. However, a majority of these tools become hard to use when faced with a larger dataset such as hundreds of megabytes large CSV file. Tools which load the entire resulting RDF dataset into memory usually have memory requirements unsatisfiable by commodity hardware. This is the case of RDF-based ETL tools. Their limits can be avoided by running them on powerful and expensive hardware, which is, however, not an option for majority of data publishers. Tools which process the data in a streamed way tend to have limited transformation options. This is the case of text-based transformations, such as XSLT, or per-item SPARQL transformations such as the streamed version of TARQL. In this paper, we show how the power and transformation options of RDF-based ETL tools can be combined with the possibility to transform large datasets on common consumer hardware for so called chunkable data - data which can be split in a certain way. We demonstrate our approach in our RDF-based ETL tool, LinkedPipes ETL. We include experiments on selected real world datasets and a comparison of performance and memory consumption of available tools.

Jakub Klímek, Petr Škoda
A Particle Swarm-Based Approach for Semantic Similarity Computation

Semantic similarity plays a vital role within a myriad of shared data applications, such as data and information integration. A first step towards building such applications is to determine concepts, which are semantically similar to each other. One way to compute this similarity of two concepts is to assess their word similarity by exploiting different knowledge sources, e.g., ontologies, thesauri, domain corpora, etc. Over the last few years, several approaches to similarity assessment based on quantifying information content of concepts have been proposed and have shown encouraging performance. For all these approaches, the Least Common Subsumer (LCS) of two concepts plays an important role in determining their similarity. In this paper, we investigate the influence the choice of this node (or a set of nodes) on the quality of the similarity assessment. In particular, we develop a particle swarm optimization approach that optimally discovers LCSs. An empirical evaluation, based on well-established biomedical benchmarks and ontologies, illustrates the accuracy of the proposed approach, and demonstrates that similarity estimations provided by our approach are significantly more correlated with human ratings of similarity than those obtained via related works.

Samira Babalou, Alsayed Algergawy, Birgitta König-Ries
Agent-Based Assistance in Ambient Assisted Living Through Reinforcement Learning and Semantic Technologies
(Short Paper)

For impaired people, the conduction of certain daily life activities is problematic due to motoric and cognitive handicaps. For that reason, assistive agents in ambient assisted environments provide services that aim at supporting elderly and impaired people. However, these agents act in complex stochastic and indeterministic environments where the concrete effects of a performed action are usually unknown at design time. Furthermore, they have to perform varying tasks according to the user’s context and needs, wherefore an agent has to be flexible and able to recognize required capabilities in a certain situation in order to provide adequate, unobtrusive assistance. Hence, an expressive representation framework is required that relates user-specific impairments to required agent capabilities. This work presents an approach which (a) describes and links user impairments and capabilities using the formal, model-theoretic semantics expressed in OWL2 DL ontologies, (b) computes optimal policies through Reinforcement Learning and propagates these in an agent network. The presented approach improves the collaborative, personalized and adequate assistance of assistive agents and tailors the agent-based services to the user’s missing capabilities.

Nicole Merkle, Stefan Zander
On the Need for Applications Aware Adaptive Middleware in Real-Time RDF Data Analysis (Short Paper)

Nowadays a handful applications are designed to consume dynamic real-time continuous stream data from IoT, Social network, Smart sensors and more. Several RDF Stream Processing (RSP) engines are available to query those data streams. Application designers have the freedom to select the best available RSP engine based on their application requirements. However, this selection needs to be done at design time resulting in early bound rigid solutions that are unable to adapt to changing application requirements. In this paper, we have evaluated two most popular RSP engines to proof that adaptivity is required to bridge the gap between RSP engines and applications requirement. Then we propose an adaptive middleware to adapt to dynamic application requirements during run-time. Moreover, adaptive middleware includes input and output control, monitoring the status of the underlying RSP engines, as a result the adaptive middleware is essential when single or multiple instances of same type engines are available.

Zia Ush Shamszaman, Muhammad Intizar Ali
Learning Probabilistic Relational Models Using an Ontology of Transformation Processes

Probabilistic Relational Models (PRMs) extend Bayesian networks (BNs) with the notion of class of relational databases. Because of their richness, learning them is a difficult task. In this paper, we propose a method that learns a PRM from data using the semantic knowledge of an ontology describing these data in order to make the learning easier. To present our approach, we describe an implementation based on an ontology of transformation processes and compare its performance to that of a method that learns a PRM directly from data. We show that, even with small datasets, our approach of learning a PRM using an ontology is more efficient.

Melanie Munch, Pierre-Henri Wuillemin, Cristina Manfredotti, Juliette Dibie, Stephane Dervaux
ORDAIN: An Ontology for Trust Management in the Internet of Things
(Short Paper)

The Internet of Things is coming and it has the potential to change our daily life. Yet, such a large scaled environment needs a semantic background to achieve interoperability and knowledge diffusion. Furthermore, this open, distributed and heterogeneous environment raises important challenges, such as trustworthiness among the various types of devices and participants. Developing and sharing ontologies that support trust management models and applications would be an effective step in achieving semantic interoperability on a large scale. Currently, most of the ontologies and semantic description frameworks in the Internet of Things are either context-based or at an early stage. This paper reports on identifying and incorporating social and non-social parameters involved in the Internet of Things in a general-purpose ontology that will support trust management. This ontology will include among others data and semantics about trust principles, involved parties, characteristics of entities, rating parameters, rule-based mechanisms, confidence and dishonesty in the environment. Defining an ontology and using semantic descriptions for data related to trustworthiness issues will provide an important instrument in developing distributed trust (reputation) models.

Kalliopi Kravari, Nick Bassiliades
APOPSIS: A Web-Based Platform for the Analysis of Structured Dialogues

A vast amount of opinions are surfacing on the Web but the lack of mechanisms for managing them leads to confusing and often chaotic dialogues. This creates the need for further semantic infrastructure and analysis of the views expressed in large-volume discussions. In this paper, we describe a web platform for modeling and analyzing argumentative discussions by offering different means of opinion analysis, allowing the participants to obtain a complete picture of the validity, the justification strength and the acceptance of each individual opinion. The system applies a semantic representation for modeling the user-generated arguments and their relations, a formal framework for evaluating the strength value of each argument and a collection of Machine Learning algorithms for the clustering of features and the extraction of association rules.

Elisjana Ymeralli, Giorgos Flouris, Theodore Patkos, Dimitris Plexousakis
Identifying Opinion Drivers on Social Media

Social media is increasingly playing a central role in commercial and political strategies, making it an imperative to understand its dynamics. In our work, we propose a model of social media as a “marketplace of opinions.” Online social media is a participatory medium where several vested interests invest their opinions on disparate issues, and actively seek to establish a narrative that yields them positive returns from the population. This paper focuses on the problem of identifying such potential “drivers” of opinions for a given topic on social media. The intention to drive opinions are characterized by the following observable parameters: (a) significant level of proactive interest in the issue, and (b) narrow focus in terms of their distribution of topics. We test this hypothesis by building a computational model over Twitter data. Since we are trying to detect an intentional entity (intention to drive opinions), we resort to human judgment as the benchmark, against which we compare the algorithm. Opinion drivers are also shown to reflect the topical distribution of the trend better than users with high activity or impact. Identifying opinion drivers helps us reduce a trending topic to its “signature” comprising of the set of its opinion-drivers and the opinions driven by them.

Anish Bhanushali, Raksha Pavagada Subbanarasimha, Srinath Srinivasa
Representing Fashion Product Data with Schema.org: Approach and Use Cases

The last decade has seen a considerable increase of online shops for fashion goods. Technological advancements, improvements in logistics, and changes in buyer behavior have led to a dissemination of apparel goods and respective data on the Web. In numerous domains of knowledge management, ontologies have proven to be very useful for sharing meaning among organizations and individuals, and for inferencing. With schema.org, there already exists a collection of widely accepted Web vocabularies for the fields of gastronomy, accommodation, entertainment, sports, or products. Yet, schema.org still lacks a dedicated fashion ontology, which would allow for greater interoperability, higher visibility, and better comparison of fashion products on the Web. In this paper, we design and evaluate a Web ontology for garments as a compatible extension of schema.org. For our proposal, we take into account current best practices of Web ontology engineering, we formally evaluate our conceptual model, and we present practical use cases. We further contextualize our work by comparing our approach with state-of-the-art vocabularies for the fashion industry.

Alex Stolz, Martin Hepp, Aleksei Hemminger
Semantic Modeling and Inference with Episodic Organization for Managing Personal Digital Traces
(Short Paper)

Many individuals generate a flood of personal digital traces (e.g., emails, social media posts, web searches, calendars) as a byproduct of their daily activities. To facilitate querying and to support natural retrospective and prospective memory of these, a key problem is to integrate them in some sensible manner. For this purpose, based on research in the cognitive sciences, we propose a conceptual modeling language whose novel features include (i) the super-properties “who, what, when, where, why, how” applied uniformly to both documents and autobiographic events; and (ii) the ability to describe prototypical plans (“scripts”) for common everyday events, which in fact generate personal digital documents as traces. The scripts and wh-questions support the hierarchical organization and abstraction of the original data, thus helping end-users query it. We illustrate the use of our language through examples, provide formal semantics, and present an algorithm to recognize script instances.

Varvara Kalokyri, Alexander Borgida, Amélie Marian, Daniela Vianna
Linked Open Data for Linguists: Publishing the Hartmann von Aue-Portal in RDF

The Hartmann von Aue-portal is a decade-long initiative to employ Web technology in order to support the study of the early German. It provides a comprehensive knowledge base on lexicographic and other aspects of the works of Hartmann von Aue, one of the key epic poets of Middle High German literature; namely lemmata, word forms, tagmemes, adverbs, and the like, including original contexts for entries. The portal is available for human users in the form of a Web application. Linked Open Data (LOD) is a recent approach in the evolution of Web technology that supports the publication of information on the Web in a way suitable for the intelligent consumption and processing of contents by computers instead of humans using Web browsers. In this paper, we study the use of modern LOD approaches for linguistics, describe the conversion of the complete Hartmann von Aue-portal into LOD, and show the usage for data-driven analyses via SPARQL queries and literate programming with Python.

Alex Stolz, Martin Hepp, Roy A. Boggs
A Survey of Approaches to Representing SPARQL Variables in SQL Queries

RDF is a universal data model for publishing structured data on the Web. On the other hand, many structured data is stored in relational database systems. To support publishing data in the RDF model, it is essential to close the gap between the relational and RDF worlds. A virtual SPARQL endpoint over relational data is a promising approach to achieve that. To build a virtual SPARQL endpoint, we need to know how to translate SPARQL queries to corresponding SQL queries. There exist several approaches to such transformation. Most of them are focused on the processing of user-defined mapping. The user-defined mapping gives an user the ability to define a mapping of a stored relation data to almost any RDF representation. In this paper we focus on one of the core problems of the transformation: how to represent variables from a given SPARQL query in the corresponding SQL query. We survey variable representations from existing approaches; how the selected representation affects the soundness and performance of the whole transformation approach.

Miloš Chaloupka, Martin Nečaský
Semantic OLAP Patterns: Elements of Reusable Business Analytics

Online analytical processing (OLAP) allows domain experts to gain insights into a subject of analysis. Domain experts are often casual users who interact with OLAP systems using standardized reports covering most of the domain experts’ information needs. Analytical questions not answered by standardized reports must be posed as ad hoc queries. Casual users, however, are typically not familiar with OLAP data models and query languages, preferring to formulate questions in business terms. Experience from industrial research projects shows that ad hoc queries frequently follow certain patterns which can be leveraged to provide assistance to domain experts. For example, queries in many domains focus on the relationships between a set of interest and a set of comparison. This paper proposes a pattern definition framework which allows for a machine-readable representation of recurring, domain-independent patterns in OLAP. Semantic web technologies serve for the definition of OLAP patterns as well as the data models and business terms used to instantiate the patterns. Ad hoc query composition then amounts to selecting an appropriate pattern and instantiating that pattern by reference to semantic predicates that encode business terms. Pattern instances eventually translate into a target language, e.g., SQL.

Christoph G. Schuetz, Simon Schausberger, Ilko Kovacic, Michael Schrefl
An Extensible Ontology Modeling Approach Using Post Coordinated Expressions for Semantic Provenance in Biomedical Research

Provenance metadata describing the source or origin of data is critical to verify and validate results of scientific experiments. Indeed, reproducibility of scientific studies is rapidly gaining significant attention in the research community, for example biomedical and healthcare research. To address this challenge in the biomedical research domain, we have developed the Provenance for Clinical and Healthcare Research (ProvCaRe) using World Wide Web Consortium (W3C) PROV specifications, including the PROV Ontology (PROV-O). In the ProvCaRe project, we are extending PROV-O to create a formal model of provenance information that is necessary for scientific reproducibility and replication in biomedical research. However, there are several challenges associated with the development of the ProvCaRe ontology, including: (1) Ontology engineering: modeling all biomedical provenance-related terms in an ontology has undefined scope and is not feasible before the release of the ontology; (2) Redundancy: there are a large number of existing biomedical ontologies that already model relevant biomedical terms; and (3) Ontology maintenance: adding or deleting terms from a large ontology is error prone and it will be difficult to maintain the ontology over time. Therefore, in contrast to modeling all classes and properties in an ontology before deployment (also called precoordination), we propose the “ProvCaRe Compositional Grammar Syntax” to model ontology classes on-demand (also called postcoordination). The compositional grammar syntax allows us to re-use existing biomedical ontology classes and compose provenance-specific terms that extend PROV-O classes and properties. We demonstrate the application of this approach in the ProvCaRe ontology and the use of the ontology in the development of the ProvCaRe knowledgebase that consists of more than 38 million provenance triples automatically extracted from 384,802 published research articles using a text processing workflow.

Joshua Valdez, Michael Rueschman, Matthew Kim, Sara Arabyarmohammadi, Susan Redline, Satya S. Sahoo
Complete Semantics to Empower Touristic Service Providers

The tourism industry has a significant impact on the world’s economy, contributes 10.2% of the world’s gross domestic product in 2016. It becomes a very competitive industry, where having a strong online presence is an essential aspect for business success. To achieve this goal, the proper usage of latest Web technologies, particularly schema.org annotations is crucial. In this paper, we present our effort to improve the online visibility of touristic service providers in the region of Tyrol, Austria, by creating and deploying a substantial amount of semantic annotations according to schema.org, a widely used vocabulary for structured data on the Web. We started our work from Tourismusverband (TVB) Mayrhofen-Hippach and all touristic service providers in the Mayrhofen-Hippach region and applied the same approach to other TVBs and regions, as well as other use cases. The rationale for doing this is straightforward. Having schema.org annotations enables search engines to understand the content better, and provide better results for end users, as well as enables various intelligent applications to utilize them. As a direct consequence, the region of Tyrol and its touristic service increase their online visibility and decrease the dependency on intermediaries, i.e. Online Travel Agency (OTA).

Zaenal Akbar, Elias Kärle, Oleksandra Panasiuk, Umutcan Şimşek, Ioan Toma, Dieter Fensel
Distributed Holistic Clustering on Linked Data

Link discovery is an active field of research to support data integration in the Web of Data. Due to the huge size and number of available data sources, efficient and effective link discovery is a very challenging task. Common pairwise link discovery approaches do not scale to many sources with very large entity sets. We propose a distributed holistic approach to link many data sources based on a clustering of entities that represent the same real-world object. Our approach provides a compact and fused representation of entities, and can identify errors in existing links as well as many new links. We support distributed execution, show scalability for large real-world data sets and evaluate our methods with respect to effectiveness and efficiency for two domains.

Markus Nentwig, Anika Groß, Maximilian Möller, Erhard Rahm
Ontologies for Commitment-Based Smart Contracts

Smart contracts gain rapid exposure since the inception of blockchain technology. Yet there is no unified ontology for smart contracts. Being categorized as coded contracts or substitutes of conventional legal contracts, there is a need to reduce the conceptual ambiguity of smart contracts. We applied enterprise ontology and model driven architectures to abstract smart contracts at the essential, infological and datalogical level to explain the system behind computation and platform independent smart contracts rather than its functional behavior. This conceptual paper introduces commitment-based smart contracts, in which a contract is viewed as a business exchange consisting of a set of reciprocal commitments. A smart contract ensures the automated execution of most of these commitments.

Joost de Kruijff, Hans Weigand
A Framework for User-Driven Mapping Discovery in Rich Spaces of Heterogeneous Data

Data analysis in rich spaces of heterogeneous data sources is an increasingly common activity. Examples include exploratory data analysis and personal information management. Mapping specification is one of the key issues in this data management setting that answer to the need of a unified search over the full spectrum of relevant knowledge. Indeed, while users in data analytics are engaged in an open-ended interaction between data discovery and data orchestration, most of the solutions for mapping specification available so far are intended for expert users.This paper proposes a general framework for a novel paradigm for user-driven mapping discovery where mapping specification is interactively driven by the information seeking activities of users and the exclusive role of mappings is to contribute to users satisfaction. The underlying key idea is that data semantics is in the eye of the consumers. Thus, we start from user queries which we try to satisfy in the dataspace. In this process of satisfaction, we often need to discover new mappings, to expose the user to the data thereby discovered for their feedback, and possibly continued towards user satisfaction.The framework is made up of (a) a theoretical foundation where we formally introduce the notion of candidate mapping sets for a user query, and (b) an interactive and incremental algorithm that, given a user query, finds a candidate mapping set that satisfies the user. The algorithm incrementally builds the candidate mapping set by searching in the dataspace data samples and deriving mapping lattices that are explored to deliver mappings for user feedback. With the aim of fitting the user information need in a limited number of interactions, the algorithm provides for a multi-criteria selection strategy for candidate mapping sets. Finally, a proof of the correctness of the algorithm is provided in the paper.

Federica Mandreoli
Ontologies and Human Users: A Systematic Analysis of the Influence of Ontology Documentation on Community Agreement About Type Membership

In this paper, we study the impact of the human-readable documentation of Web ontologies on the ability of human users to agree on the membership of instances according to a given ontology. We first introduce a model of the problem and then present a user study, in which we measured the impact of documentation features in schema.org on the quality of annotations with n = 73 study participants. The paper concludes with a discussion of implications for ontology design in the context of the Semantic Web.

Francesca Zarl, Martin Hepp, Alex Stolz, Walter Gerbino
DLUBM: A Benchmark for Distributed Linked Data Knowledge Base Systems

Linked Data is becoming a stable technology alternative and is no longer only an innovation trend. More and more companies are looking into adapting Linked Data as part of the new data economy. Driven by the growing availability of data sources, solutions are constantly being newly developed or improved in order to support the necessity for data exchange both in web and enterprise settings. Unfortunately, currently the choice whether to use Linked Data is more an educated guess than a fact-based decision. Therefore, the provisioning of open benchmarking tools and reports, which allow developers to assess the fitness of existing solutions, is key for pushing the development of better Linked Data-based approaches and solutions. To this end we introduce a novel Linked Data benchmark – Distributed LUBM, which enables the reproducible creation and deployment of distributed interlinked LUBM datasets. We provide a system architecture for distributed Linked Data benchmark environments, accompanied by guiding design requirements. We instantiate the architecture with the actual DLUBM implementation and evaluate a Linked Data query engine via DLUBM.

Felix Leif Keppmann, Maria Maleshkova, Andreas Harth
Norwegian State of Estate Report as Linked Open Data

This paper presents the Norwegian State of Estate (SoE) dataset containing data about real estates owned by the central government in Norway. The dataset is produced by integrating cross-domain government datasets including data from sources such as the Norwegian business entity register, cadastral system, building accessibility register and the previous SoE report. The dataset is made available as Linked Data. The Linked Data generation process includes data acquisition, cleaning, transformation, annotation, publishing, augmentation and interlinking the annotated data as well as quality assessment of the interlinked datasets. The dataset is published under the Norwegian License for Open Government Data (NLOD) and serves as a reference point for applications using data on central government real estates, such as generation of the SoE report, searching properties suitable for asylum reception centres, risk assessment for state-owned buildings or a public building application for visitors.

Ling Shi, Dina Sukhobok, Nikolay Nikolov, Dumitru Roman
The InfraRisk Ontology: Enabling Semantic Interoperability for Critical Infrastructures at Risk from Natural Hazards

Earthquakes, landslides, and other natural hazard events have severe negative socio-economic impacts. Among other consequences, those events can cause damage to infrastructure networks such as roads and railways. Novel methodologies and tools are needed to analyse the potential impacts of extreme natural hazard events and aid in the decision-making process regarding the protection of existing critical road and rail infrastructure as well as the development of new infrastructure. Enabling uniform, integrated, and reliable access to data on historical failures of critical transport infrastructure can help infrastructure managers and scientist from various related areas to better understand, prevent, and mitigate the impact of natural hazards on critical infrastructures. This paper describes the construction of the InfraRisk ontology for representing relevant information about natural hazard events and their impact on infrastructure components. Furthermore, we present a software prototype that visualizes data published using the proposed ontology.

Dumitru Roman, Dina Sukhobok, Nikolay Nikolov, Brian Elvesæter, Antoine Pultier
Usability of Visual Data Profiling in Data Cleaning and Transformation

This paper proposes an approach for using visual data profiling in tabular data cleaning and transformation processes. Visual data profiling is the statistical assessment of datasets to identify and visualize potential quality issues. The proposed approach was implemented in a software prototype and empirically validated in a usability study to determine to what extent visual data profiling is useful and how easy it is to use by data scientists. The study involved 24 users in a comparative usability test and 4 expert reviewers in cognitive walkthroughs. The evaluation results show that users find visual data profiling capabilities to be useful and easy to use in the process of data cleaning and transformation.

Bjørn Marius von Zernichow, Dumitru Roman
Semantic-Based Approach for Low-Effort Engineering of Automation Systems

Industry 4.0, also referred to as the fourth industrial revolution aims at mass customized production with low-cost and shorter production time. Automation Systems (ASs) used in the manufacturing processes should be flexible to meet the constantly changing needs of mass customized production. Low-effort engineering of an Automation System (AS) is an important requirement towards this goal. Secondly, transparency and interoperability of ASs across different domains open a new class of applications. In order to address these challenges we propose a low-effort approach to engineer, configure and re-engineer an AS by employing Web of Things and Semantic Web Technologies. The approach allows for creating semantic specification for a new functionality or an application. It automatically checks whether a target AS can run a new functionality. We developed an engineering tool with a graphical user interface for our approach that enables an engineer to easily interact with an AS when discovering its functionality, engineering, configuring and deploying new functionality on it.

Aparna Saisree Thuluva, Kirill Dorofeev, Monika Wenger, Darko Anicic, Sebastian Rudolph
Backmatter
Metadata
Title
On the Move to Meaningful Internet Systems. OTM 2017 Conferences
Editors
Hervé Panetto
Christophe Debruyne
Walid Gaaloul
Mike Papazoglou
Prof. Dr. Adrian Paschke
Dr. Claudio Agostino Ardagna
Robert Meersman
Copyright Year
2017
Electronic ISBN
978-3-319-69459-7
Print ISBN
978-3-319-69458-0
DOI
https://doi.org/10.1007/978-3-319-69459-7

Premium Partner