Skip to main content

2017 | Buch

Knowledge Engineering and Semantic Web

8th International Conference, KESW 2017, Szczecin, Poland, November 8-10, 2017, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 8th International Conference on Knowledge Engineering and the Semantic Web, KESW 2017, held Szczecin, Poland, in November 2017.

The 16 full papers presented were carefully reviewed and selected from 58 submissions.

The papers are organized in topical sections on natural language processing; knowledge representation and reasoning; ontologies and controlled vocabularies; scalable data access and storage solutions; semantic Web and education; linked data; semantic technologies in manufacturing and business.

Inhaltsverzeichnis

Frontmatter

Natural Language Processing

Frontmatter
Reducing the Degradation of Sentiment Analysis for Text Collections Spread over a Period of Time
Abstract
This paper presents approaches to improve sentiment classification in dynamically updated text collections in natural language. As social networks are constantly updated by users there is essential to take into account new jargons, vital discussed topics while solving classification task. Therefore two fundamentally different methods for solution this problem are suggested. Supervised machine learning method and unsupervised machine learning method are used for sentiment analysis. The methods are compared and it is shown which method is most applicable in certain cases. Experiments comparing the methods on sufficiently representative text collections are described.
Yuliya Rubtsova
Searching for the Most Negative Opinions
Abstract
Studies in sentiment analysis and opinion mining have been focused on several aspects of opinions, such as their automatic extraction, identification of their polarity (positive, negative or neutral), the entities or facets involved, and so on. However, to the best of our knowledge, no sentiment analysis approach has considered the automatic identification and extraction of the most negative opinions, in spite of their significant impact in many fields such as industry, trade, political and socials issues.
In this article, we will use diversified linguistic features and supervised machine learning algorithms so as to examine their effectiveness in the process of searching for the most negative opinions.
Sattam Almatarneh, Pablo Gamallo
Diversified Semantic Query Reformulation
Abstract
One main challenge for search engines is retrieving the user’s intended results. Diversification techniques are employed to cover as many aspects of the query as possible through a tradeoff between the relevance of the results and the diversity in the result set. Most diversification techniques reorder the final result set. However, these diversification techniques could be inadequate for search scenarios with small candidate set sizes, or those for which response time is a critical issue. This paper presents a diversification technique for such scenarios. Instead of reordering the result set, the query is reformulated, thus taking advantage of the knowledge available in Linked Data Knowledge Bases. The query is annotated with semantic data and then expanded to related resources. An adapted Maximal Marginal Relevance technique is applied to select resources from this expanded set whose properties form the expanded query. Experiments conducted on federated and non-federated scenarios show that this method has superior diversification capacity and shorter response times than algorithms based on result set reordering.
Rubén Manrique, Olga Mariño
RuThes Cloud: Towards a Multilevel Linguistic Linked Open Data Resource for Russian
Abstract
In this paper we present a new multi-level Linguistic Linked Open Data resource for Russian. It covers four linguistic levels: semantic, lexical, morphological and syntactic. The resource has been constructed on base of the well-known RuThes thesaurus and the original hitherto unpublished Extended Zaliznyak grammatical dictionary. The resource is represented in terms of SKOS, Lemon, and LexInfo ontologies and a new custom ontology. Building the resource, we automatically completed the following tasks: merging source resources upon common lexical entries, decomposing complex lexical entries, and publishing constructed resource as LLOD-compatible dataset. We demonstrate the use case in which the developed resource is exploited in IR task. We hope that our work can serve as a crystallization point of the LLOD cloud in Russian.
Alexander Kirillovich, Olga Nevzorova, Emil Gimadiev, Natalia Loukachevitch
The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models
Abstract
This paper presents the algorithm of modelling and analysis of Latent Semantic Relations inside the argumentative type of documents collection. The novelty of the algorithm consists in using a systematic approach: in the combination of the probabilistic Latent Dirichlet Allocation (LDA) and Linear Algebra based Latent Semantic Analysis (LSA) methods; in considering each document as a complex of topics, defined on the basis of separate analysis of the particular paragraphs. The algorithm contains the following stages: modelling and analysis of Latent Semantic Relations consistently on LDA- and LSA-based levels; rules-based adjustment of the results of the two levels of analysis. The verification of the proposed algorithm for subjectively positive and negative Polish-language film reviews corpuses was conducted. The level of the recall rate and precision indicator, as a result of case study, allowed to draw the conclusions about the effectiveness of the proposed algorithm.
Nina Rizun, Yurii Taranenko, Wojciech Waloszek
Discovering Relational Phrases for Qualia Roles Through Open Information Extraction
Abstract
In Generative Lexicon [17], Pustejovsky defined the Qualia Structure which organizes the semantic meaning carried by nouns through four roles: formal, telic, agentive and constitutive. Despite their expressive power, to the best of our knowledge no actual NLP system uses qualia structures possibly due to the large effort needed to construct such knowledge bases. Some researchers have tried to circumvent this obstacle using lexico-syntactic patterns based on Hearst idea [11]. In this paper, we propose an Open Information Extraction method to automatically acquire a set of relational phrases from a large corpus, starting with a small set of nouns and their qualia elements. Our idea is that the relational phrases unveil the relations between the nouns and their qualia elements. We compared our method with Reverb [10], Ollie [18] and ClausIE [9] in terms of patterns quality and the relative qualia elements extraction.
Giovanni Siragusa, Valentina Leone, Luigi Di Caro, Claudio Schifanella
Probabilistic Topic Modelling for Controlled Snowball Sampling in Citation Network Collection
Abstract
The paper presents a probabilistic topic model (PTM) application to citation network collection. Snowball sampling method is moderated with the selection of the most relevant papers by means of the PTM. The PTM used in the paper is modified to treat collections of short texts. It is constructed from the titles of seed papers collection united with the papers obtained through unrestricted snowball sampling. The objective of the research is to propose and to experimentally verify the approach of application of PTM of short text documents for improvement of a citation network collection. The preliminary analysis has shown that the method is robust: seed paper collection variations do not affect the most influencing papers subset in the collected citation network.
Hennadii Dobrovolskyi, Nataliya Keberle, Olga Todoriko
Russian Tagging and Dependency Parsing Models for Stanford CoreNLP Natural Language Toolkit
Abstract
The paper concerns implementing maximum entropy tagging model and neural net dependency parser model for Russian language in Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. Russian belongs to morphologically rich languages and demands full morphological analysis including annotating input texts with POS tags, features and lemmas (unlike the case of case-, person-, etc. insensitive languages when stemming and POS-tagging give enough information about grammatical behavior of a word form). Rich morphology is accompanied by free word order in Russian which adds indeterminacy to head finding rules in parsing procedures. In the paper we describe training data, linguistic features used to learn the classifiers, training and evaluation of tagging and parsing models.
Liubov Kovriguina, Ivan Shilin, Alexander Shipilo, Alina Putintseva
Investigating the Relationship Between Tweeting Style and Popularity: The Case of US Presidential Election 2016
Abstract
Predicting popularity from social media has been explored about a decade. As far as the number of social media users is soaring, understanding the relationship between popularity and social media is really beneficial because it can be mapped to the real popularity of an entity. The popularity in social media, for instance in Twitter, is interpreted by drawing a relationship between a social media account and its followers. Therefore, in this paper, to understand the popularity of candidates of the US election 2016 in social media, we verify this association in Twitter by analyzing the candidates’ tweets. More specifically, our aim is to assess if candidates put efforts to improve their style of tweeting over time to be more favorable to their followers. We show that Mr. Trump could wisely exploit Twitter to attract more people by tweeting in a well-organized and desirable manner and that tweeting style has increased his popularity in social media.
Farideh Tavazoee, Claudio Conversano, Francesco Mola

Knowledge Representation and Reasoning

Frontmatter
Temporal Reasoning with Non-convex Intervals
Abstract
The time dimension is fundamental to Semantic Web applications for supporting commercial transactions or allowing the retrieval of resources contextualised in time. In this work, we propose an ontology that is an extension to the OWL-Time ontology defining non-convex intervals. We also present temporal operators for non-convex intervals and how to reason with them in our approach. We present a temporal reasoner that is able to reason both with temporal entities encoded with OWL-Time ontology and with non-convex temporal entities.
Miguel Bento Alves, Carlos Viegas Damásio, Nuno Correia
More on the Data Complexity of Answering Ontology-Mediated Queries with a Covering Axiom
Abstract
We report on our recent results in the ongoing attempts to classify conjunctive queries (CQs) \({\varvec{q}}\) according to the data complexity of answering ontology-mediated queries of the form \((\{A \sqsubseteq F \sqcup T\},{\varvec{q}})\). In particular, we present new families of path CQs for which this problem is NL-, P- or coNP-complete.
Olga Gerasimova, Stanislav Kikot, Vladimir Podolskii, Michael Zakharyaschev
QSMat: Query-Based Materialization for Efficient RDF Stream Processing
Abstract
This paper presents a novel approach, QSMat, for efficient RDF data stream querying with flexible query-based materialization. Previous work accelerates either the maintenance of a stream window materialization or the evaluation of a query over the stream. QSMat exploits knowledge of a given query and entailment rule-set to accelerate window materialization by avoiding inferences that provably do not affect the evaluation of the query. We prove that stream querying over the resulting partial window materializations with QSMat is sound and complete with regard to the query. A comparative experimental performance evaluation based on the Berlin SPARQL benchmark and with selected representative systems for stream reasoning shows that QSMat can significantly reduce window materialization size, reasoning overhead, and thus stream query evaluation time.
Christian Mathieu, Matthias Klusch, Birte Glimm
Employing Link Differentiation in Linked Data Semantic Distance
Abstract
The use of Linked Open Data (LOD) has been explored in recommender systems in different ways, primarily through its graphical representation. The graph structure of LOD is utilized to measure inter-resource relatedness via their semantic distance in the graph. The intuition behind this approach is that the more connected resources are to each other, the more related they are. The drawback of this approach is that it treats all inter-resource connections identically rather than prioritizing links that may be more important in semantic relatedness calculations. In this paper, we show that different properties of inter-resource links hold different values for relatedness calculations between resources, and we exploit this observation to introduce improved resource semantic relatedness measures, Weighted Linked Data Semantic Distance (WLDSD) and Weighted Resource Similarity (WResim), which are more accurate than the current state of the art approaches. Exploiting these proposed weighted approaches, we also present two different ways to calculate links weights: Resource-Specific Link Awareness Weights (RSLAW) and Information Theoretic Weights (ITW). To validate the effectiveness of our approaches, we conducted an experiment to identify the relatedness between musical artists in DBpedia, and it demonstrated that approaches that prioritize link properties resulted in more accurate recommendation results.
Sultan Alfarhood, Susan Gauch, Kevin Labille

Ontologies and Controlled Vocabularies

Frontmatter
Ontology for Representing Human Needs
Abstract
Need satisfaction plays a fundamental role in human well-being. Hence understanding citizens’ needs is crucial for developing a successful social and economic policy. This notwithstanding, the concept of need has not yet found its place in information systems and online tools. Furthermore, assessing needs itself remains a labor-intensive, mostly offline activity, where only a limited support by computational tools is available. In this paper, we make the first step towards employing need management in the design of information systems supporting participation and participatory innovation by proposing OpeNeeD, a family of ontologies for representing human needs data. As a proof of concept, OpeNeeD has been used to represent, enrich and query the results of a needs assessment study in a local citizen community in one of the Vienna districts. The proposed ontology will facilitate such studies and enable the representation of citizens’ needs as Linked Data, fostering its co-creation and incentivizing the use of Open Data and services based on it.
Soheil Human, Florian Fahrenbach, Florian Kragulj, Vadim Savenkov
Knowledge Graph: Semantic Representation and Assessment of Innovation Ecosystems
Abstract
Innovative capacity is highly dependent upon knowledge and the possession of unique competences can be an important source of enduring strategic advantage. Hence, being able to identify, locate, measure, and assess competence occupants can be a decisive competitive edge. In this work, we introduce a framework that assists with performing such tasks. To achieve this, NLP-, rule-based, and machine learning techniques are employed to process raw data such as academic publications or patents. The framework gains normalized person and organization profiles and compiles identified entities (such as persons, organizations, or locations) into dedicated objects disambiguating and unifying where needed. The objects are then mapped with conceptual systems and stored along with identified semantic relations in a Knowledge Graph, which is constituted by RDF triples. An OWL reasoner allows for answering complex business queries, and in particular, to analyze and evaluate competences on multiple aggregation levels (i.e., single vs. collective) and dimensions (e.g., region, technological field of interest, time). In order to prove the general applicability of the framework and to illustrate how to solve concrete business cases from the automotive domain, it is evaluated with different datasets.
Klaus Ulmschneider, Birte Glimm

Scalable Data Access and Storage Solutions

Frontmatter
RDF Updates with Constraints
Abstract
This paper deals with the problem of updating an RDF database, expected to satisfy user-defined constraints as well as RDF intrinsic semantic constraints. As updates may violate these constraints, side-effects are generated in order to preserve consistency. We investigate the use of nulls (blank nodes) as placeholders for unknown required data as a technique to provide this consistency and to reduce the number of side-effects. Experimental results validate our goals.
Mirian Halfeld-Ferrari, Carmem S. Hara, Flavio R. Uber
Ephedra: Efficiently Combining RDF Data and Services Using SPARQL Federation
Abstract
Knowledge graph management use cases often require addressing hybrid information needs that involve multitude of data sources, multitude of data modalities (e.g., structured, keyword, geospatial search), and availability of computation services (e.g., machine learning and graph analytics algorithms). Although SPARQL queries provide a convenient way of expressing data requests over RDF knowledge graphs, the level of support for hybrid information needs is limited: existing query engines usually focus on retrieving RDF data and only support a set of hard-coded built-in services. In this paper we describe representative use cases of metaphacts in the cultural heritage and pharmacy domains and the hybrid information needs arising in them. To address these needs, we present Ephedra: a SPARQL federation engine aimed at processing hybrid queries. Ephedra provides a flexible declarative mechanism for including hybrid services into a SPARQL federation and implements a number of static and runtime query optimization techniques for improving the hybrid SPARQL queries performance. We validate Ephedra in the use case scenarios and discuss practical implications of hybrid query processing.
Andriy Nikolov, Peter Haase, Johannes Trame, Artem Kozlov
Managing Lifecycle of Big Data Applications
Abstract
The growing digitization and networking process within our society has a large influence on all aspects of everyday life. Large amounts of data are being produced continuously, and when these are analyzed and interlinked they have the potential to create new knowledge and intelligent solutions for economy and society. To process this data, we developed the Big Data Integrator (BDI) Platform with various Big Data components available out-of-the-box. The integration of the components inside the BDI Platform requires components homogenization, which leads to the standardization of the development process. To support these activities we created the BDI Stack Lifecycle (SL), which consists of development, packaging, composition, enhancement, deployment and monitoring steps. In this paper, we show how we support the BDI SL with the enhancement applications developed in the BDE project. As an evaluation, we demonstrate the applicability of the BDI SL on three pilots in the domains of transport, social sciences and security.
Ivan Ermilov, Axel-Cyrille Ngonga Ngomo, Aad Versteden, Hajira Jabeen, Gezim Sejdiu, Giorgos Argyriou, Luigi Selmi, Jürgen Jakobitsch, Jens Lehmann

Semantic Web and Education

Frontmatter
Ontology-Based Representation of Learner Profiles for Accessible OpenCourseWare Systems
Abstract
The development of accessible web applications has gained significant attention over the past couple of years due to the widespread use of the Internet and the equality laws enforced by governments. Particularly in e-learning contexts, web accessibility plays an important role, as e-learning often requires to be inclusive, addressing all types of learners, including those with disabilities. However, there is still no comprehensive formal representation of learners with disabilities and their particular accessibility needs in e-learning contexts. We propose the use of ontologies to represent accessibility needs and preferences of learners in order to structure the knowledge and to access the information for recommendations and adaptations in e-learning contexts. In particular, we reused the concepts of the ACCESSIBLE ontology and extended them with concepts defined by the IMS Global Learning Consortium. We show how OpenCourseWare systems can be adapted based on this ontology to improve accessibility.
Mirette Elias, Steffen Lohmann, Sören Auer
Towards the Semantic MOOC: Extracting, Enriching and Interlinking E-Learning Data in Open edX Platform
Abstract
In recent years, the educational technology market is growing rapidly. This phenomenon is explained by the increasing number of Massive Open Online Courses (MOOC) which provide learners an opportunity to study 24/7 at the top universities of the world. Information contained in such courses can be better structured, linked, and enriched by means of the semantic technologies and linked data principles. Given semantic annotations, discovery, and matching among learners, teachers, and learning resources can be made a lot more efficient. In this paper, we describe a method of metadata extraction from Open edX online courses for its subsequent processing. We solved the problem of a course representation at the formal and semantic levels, thus, both computers and humans could process and use the course following the ontology development. Also, we exploited NLP and RAKE technologies to integrate automatic concept extraction from course lectures. Triples are imported into RDF storage system allowing user the execution of SPARQL queries through the SPARQL endpoint. Moreover, plugin supports enriching and interlinking courses allowing users to learn the educational content of the courses on an individual trajectory. To summarize the above, it can be concluded that the considered data set is mapped at a satisfactory high level. The collected data can be useful for analyzing the relevance and quality of the course structure.
Dmitry Volchek, Aleksei Romanov, Dmitry Mouromtsev

Linked Data

Frontmatter
DBpedia Entity Type Detection Using Entity Embeddings and N-Gram Models
Abstract
This paper presents and evaluates a method for the detection of DBpedia entity types (classes) that can be used to assess DBpedia’s quality and to complete missing types for un-typed resources. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We evaluate the results for 358 typical DBpedia classes. Our results show that entity embeddings outperform n-gram models for type detection and can contribute to the improvement of DBpedia’s quality, maintenance, and evolution. This is a step toward improving the quality of Linked Open Data in general.
Hanqing Zhou, Amal Zouaq, Diana Inkpen
Alignment: A Collaborative, System Aided, Interactive Ontology Matching Platform
Abstract
Ontology matching is a crucial problem in the world of Semantic Web and other distributed, open world applications. Diversity in tools, knowledge, habits, language, interests and usually level of detail may drive in heterogeneity. Thus, many automated applications have been developed, implementing a large variety of matching techniques and similarity measures, with impressive results. However, there are situations where this is not enough and there must be human decision in order to create a link. In this paper we present Alignment, a collaborative, system aided, interactive ontology matching platform. Alignment offers a simple GUI environment for matching two ontologies with aid of configurable similarity algorithms.
Sotirios Karampatakis, Charalampos Bratsas, Ondřej Zamazal, Panagiotis Marios Filippidis, Ioannis Antoniou

Semantic Technologies in Manufacturing and Business

Frontmatter
ODERU: Optimisation of Semantic Service-Based Processes in Manufacturing
Abstract
A new requirement for the manufacturing companies in Industry 4.0 is to be flexible with respect to changes in demands, requiring to react rapidly and efficiently on the production capacities. Coupling it with the affirmed Service-Oriented Architectures (SOA) induces a need for agile collaboration among supply chain partners, but also between different divisions or branches of the same company. To this end, we propose a novel pragmatic approach for automatically implementing service-based manufacturing processes at design and run-time, called ODERU. It provides an optimal plan for a business process model, relying on a set of semantic annotations and a configurable QoS-based constraint optimisation problem (COP) solving. The additional information encoding the optimal process service plan produced by means of pattern-based semantic composition and optimisation of non-functional aspects, are mapped back to the BPMN 2.0 standard formalism, through the use of extension elements, generating an enactable optimal plan. This paper presents the approach, the technical architecture and sketches two initial real-world industrial application in the manufacturing domains of metal press maintenance and automotive exhaust production.
Luca Mazzola, Patrick Kapahnke, Matthias Klusch
Why Enriching Business Transactions with Linked Open Data May Be Problematic in Classification Tasks
Abstract
Linked Open Data has proven useful in disambiguation and query extension tasks, but their incomplete and inconsistent nature may make them less useful in analyzing brief, low-level business transactions. In this paper, we investigate the effect of using Wikidata and DBpedia to aid in classification of real bank transactions. The experiments indicate that Linked Open Data may have the potential to supplement transaction classification systems effectively. However, given the nature of the transaction data used in this research and the current state of Wikidata and DBpedia, the extracted data has in fact a negative impact the accuracy on the classification model when compared to the Baseline approach. The Baseline approach produces an accuracy score of 88,60% where the Wikidata, DBpedia and their combined approaches yield accuracy scores of 84,99%, 86,65% and 83,48%.
Eirik Folkestad, Erlend Vollset, Marius Rise Gallala, Jon Atle Gulla
Backmatter
Metadaten
Titel
Knowledge Engineering and Semantic Web
herausgegeben von
Dr. Przemysław Różewski
Christoph Lange
Copyright-Jahr
2017
Electronic ISBN
978-3-319-69548-8
Print ISBN
978-3-319-69547-1
DOI
https://doi.org/10.1007/978-3-319-69548-8

Neuer Inhalt