Top

2013 | Book

Read chapter Read first chapter

Knowledge Engineering and the Semantic Web

4th International Conference, KESW 2013, St. Petersburg, Russia, October 7-9, 2013. Proceedings

Editors: Pavel Klinov, Dmitry Mouromtsev

Publisher: Springer Berlin Heidelberg

Book Series : Communications in Computer and Information Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 4th Conference on Knowledge Engineering and the Semantic Web, KESW 2013, held in St. Petersburg, Russia, in October 2013. The 18 revised full papers presented together with 7 short system descriptions were carefully reviewed and selected from 52 submissions. The papers address research issues related to knowledge representation, semantic web, and linked data.

Frontmatter

Research and Industry Publications

Experiments on Using LOD Cloud Datasets to Enrich the Content of a Scientific Knowledge Base

Abstract

This paper describes the issues arising when employing the Linked Open Data (LOD) cloud datasets to enrich the content of a scientific knowledge base as well as the approaches to solving them. The experiments are carried out with the help of a toolkit intended to simplify the analysis and integration of data from different datasets. The toolkit comprises several tools for application-specific visualization. The dataset of the Open Archive of the Russian Academy of Sciences and several bibliographic datasets are used as test examples.

Zinaida Apanovich, Alexander Marchuk

Ontology-Based Content Trust Support of Expert Information Resources in Quantitative Spectroscopy

Abstract

An approach to assessing the content trust of information resources based on a publishing criterion has been developed and applied to several tens of spectroscopic expert datasets. The results represented as an OWL-ontology are shown to be accessible to programmable agents. The assessments enable the amount of measured and calculated trusted and distrusted data for spectroscopic quantities and ranges of their change in expert datasets to be determined. Building knowledge bases of this kind at virtual data centers intended for data intensive science will provide realization of an automatic selection of spectroscopic information resources exhibiting a high degree of trust.

Alexander Fazliev, Alexey Privezentsev, Dmitry Tsarkov, Jonathan Tennyson

Measuring Psychological Impact on Group Ontology Design and Development: An Empirical Approach

Abstract

This paper describes the interdisciplinary problems of group ontology design. It highlights the importance of studying individual features of cognitive style and their influence on the specifics of collaborative group ontology design and development. The paper describes the preliminary results of the research project focused on working out a new paradigm for structuring data and knowledge with respect to individual cognitive styles, using recent advances in knowledge engineering and conceptual structuring, aimed at creating new consistent and structurally holistic knowledge bases for various domains. The results of this research effort can be applied to organizing the group ontology design (especially for learning purposes), data structuring and other collaborative analytical work.

Tatiana Gavrilova, Ekaterina Bolotnikova, Irina Leshcheva, Evgeny Blagov, Anna Yanson

Applied Semantics for Integration and Analytics

Abstract

There are two major trends of industrial application for semantic technologies: integration and analytics. The integration potential of semantics was the first one which came to industrial implementation, particularly due to adoption of semantic platform by ISO 15926 standard. But there are certain problems in practical use of this standard, so integration is often built without using it. We will show an example of implementation of semantic Enterprise Service Bus in RosEnergoTrans company, and discuss whether it was better to rely on ISO 15926 ontologies for this project.

Adding analytic features to semantic solution gives it much more business value that integration itself can offer. We will discuss semantic solutions from this point of view, and compare analytical potential of ISO 15926 data models with “simple” semantics.

Sergey Gorshkov

Technology of Ontology Visualization Based on Cognitive Frames for Graphical User Interface

Abstract

This paper is dedicated to the issue of visualization of OWL ontologies to help their comprehension. In a previous work we showed a process of simplifying OWL ontologies by transforming them into a form called the User Presentation Ontology (UPO). In this paper we discuss some aspects of visual representation of ontologies for better understanding by human users. We present the approach of combining some UPO elements with special fragments called cognitive frames. It is expected that showing cognitive frames during visualization instead of just showing any terms linked with the chosen term will be more useful for ontology understanding. We determine some requirements for cognitive frames, define their types, and consider formal algorithms for constructing the frames.

Pavel Lomov, Maxim Shishaev

SRelation: Fast RDF Graph Traversal

Abstract

Linked Data in the RDF format can be viewed as a set of interlinked data on the web. Particular tasks, which are computed upon this data includes text based searching for entities, relations or performing various queries using querying languages like SPARQL. Such interlinked data can be interpreted as a graph with edges and vertexes. For the current SPARQL 1.1 standard, there is support for graph traversal, proposed and announced by SPARQL working group. Regarding performance, the property path task is the problem in current solutions. This paper describes an innovative and time efficient method of the graph traversal task - SRelation. First we discuss current approaches for SPARQL 1.1 graph traversal. For data access we mention local and distributed solutions, disk-based, mixed and whole-in-memory data storage aspects. We debate pros and cons of various approaches and suggest our new method SRelation to fit in the field of in-memory concept. To support this, we present our experiments on selected Linked Data datasets.

Ján Mojžiš, Michal Laclavík

An Approach to Improving the Classification of the New York Times Annotated Corpus

Abstract

The New York Times Annotated Corpus contains over 1.5 million of manually tagged articles. It could become a useful source for evaluation of algorithms for documents clustering. Since documents have been labeled over twenty years, it is argued that the classification may contains errors due to a possible dissent between experts and the necessity to add tags over time. This paper presents an approach to improving the classification quality by using assigned tags as a starting point.

It is assumed that tags can be described by a set of features. These features are selected based on the value of mutual information between the tag and stems from documents with it. An algorithm for reassigning tags in case the document does not contain features of its labels is presented. Experiments were performed on about ninety thousand articles published by the New York Times in 2005. Results of applying the algorithm to the collection are discussed.

Elena Mozzherina

Applying the Latent Semantic Analysis to the Issue of Automatic Extraction of Collocations from the Domain Texts

Abstract

The aim of this paper is to study possibilities of latent semantic analysis for automatic extraction of word pair collocations from domain texts. The basic idea of this work consists in a search of collocations among pairs of words with strong (stable) relations since collocations are nothing else than steady combinations of words. Results of experiments on a corpus of texts from a Russian online newspaper demonstrate that applying latent semantic analysis to collocation extraction significantly decreases information noise and strengthens the words associations. The proposed method will be used for an automatic building thesaurus for a domain.

Aliya Nugumanova, Igor Bessmertny

Exploring Automated Reasoning in First-Order Logic: Tools, Techniques and Application Areas

Abstract

This paper describes state-of-the-art automated reasoning techniques and applications. We explore mainly first-order logic theorem proving, though our discussion also covers other domains: higher-order logic, propositional logic. We review applications of theorem proving systems in mathematics, formal verification, and other areas.

It is known that resolution is the most popular reasoning technique in first-order logic. Nevertheless, studying other algorithms could facilitate the development of automated reasoning. This article presents a survey of some promising methods. In particular, we give a detailed description of Maslov’s inverse method that is applicable to a broad range of theories and also can be used as a decision procedure. However, this method is not well studied and its real potential is yet to be realized. We describe the architecture of our reasoning system that we use to compare the inverse method with the resolution method in practice. We also present some actual results of experiments.

Vladimir Pavlov, Alexander Schukin, Tanzilia Cherkasova

OntoQuad: Native High-Speed RDF DBMS for Semantic Web

Abstract

In the last years native RDF stores made enormous progress in closing the performance gap compared to RDBMS. This albeit smaller gap, however, still prevents adoption of RDF stores in scenarios with high requirements on responsiveness. We try to bridge the gap and present a native RDF store “OntoQuad” and its fundamental design principles. Basing on previous researches, we develop a vector database schema for quadruples, its realization on index data structures, and ways to efficiently implement the joining of two and more data sets simultaneously. We also offer approaches to optimizing the SPARQL query execution plan which is based on its heuristic transformations. The query performance efficiency is checked and proved on BSBM tests. The study results can be taken into consideration during the development of RDF DBMS’s suitable for storing large volumes of Semantic Web data, as well as for the creation of large-scale repositories of semantic data.

Alexander Potocki, Anton Polukhin, Grigory Drobyazko, Daniel Hladky, Victor Klintsov, Jörg Unbehauen

A Comparison of Federation over SPARQL Endpoints Frameworks

Abstract

The increasing amount of Linked Data and its inherent distributed nature have attracted significant attention throughout the research community and amongst practitioners to search data, in the past years. Inspired by research results from traditional distributed databases, different approaches for managing federation over SPARQL Endpoints have been introduced. SPARQL is the standardised query language for RDF, the default data model used in Linked Data deployments and SPARQL Endpoints are a popular access mechanism provided by many Linked Open Data (LOD) repositories. In this paper, we initially give an overview of the federation framework infrastructure and then proceed with a comparison of existing SPARQL federation frameworks. Finally, we highlight shortcomings in existing frameworks, which we hope helps spawning new research directions.

Nur Aini Rakhmawati, Jürgen Umbrich, Marcel Karnstedt, Ali Hasnain, Michael Hausenblas

Knowledge-Based Support for Complex Systems Exploration in Distributed Problem Solving Environments

Abstract

The work is aimed to the development of approaches to intelligent support of knowledge usage and generation process performed within simulation-based research. As contemporary e-Science tasks often require acquisition, integration and usage of complex knowledge belonging to different domains, the concept and technology for semantic integration and processing of knowledge used within complex systems simulation tasks were developed. Within proposed approach three main classes of knowledge considered are considered: domain-specific, IT, and general system-level knowledge. All these classes are needed to be integrated and coordinated to support the simulation process. Ontology-based technology is described as a core technique for unified multi-domain knowledge formalization and automatic or semi-automatic interconnection. Virtual Simulation Objects (VSO) concept and technology are described as a basic approach for development of domain-specific solutions to support of the whole simulation-based research process including model development, simulation running and results presentation.

Pavel A. Smirnov, Sergey V. Kovalchuk, Alexander V. Boukhanovsky

Efficient Module Extraction for Large Ontologies

Abstract

Modularity of ontologies has gained importance due to its application in ontology reasoning, ontology reuse and other areas of ontology engineering. One technique for extracting modules is by using Atomic Decomposition (AD). This paper uses MGS-Labels (Minimal Globalising Signatures) to improve the state-of-the-art approach which uses MSS-Labels (Minimal Seed Signatures) in terms of pre-processing time and memory requirement. It also improves the module extraction-time by reducing the number of containment checks in the worst case. We further improve the algorithm by introducing the notion of MGS-Space. We propose uniqueness properties about MGS-Space that help us to build indices and extract modules using simple operations on integers.

Venkata Krishna Chaitanya Turlapati, Sreenivasa Kumar Puligundla

Representation of Historical Sources on the Semantic Web by Means of Attempto Controlled English

Abstract

The paper discusses some promising approaches to the representation of the meta-information and the meaning of historical sources on the Semantic Web in order to provide researchers with appropriate tools for data capturing, semantic linkage of information, and automatic logical inference. It is desirable that the meaning of the sources is represented fully to aggregate their internal information within a definite semantic network and to link this information with the external data provided by ontologies. The authors propose to use controlled natural languages—namely, Attempto Controlled English (ACE)—to represent the meaning of historical sources on the Semantic Web. In the paper, both actual and potential possibilities of ACE in the representation of the contents of Old Russian charters are examined; therefore, a special attention is paid to ACE tools, especially to ACE Reasoner (RACE).

Aleksey Varfolomeyev, Aleksandrs Ivanovs

Ontology Merging in the Context of a Semantic Web Expert System

Abstract

The purpose of this paper is to describe the process of OWL ontology merging and member function assignment for ontology elements. Ontology merging and member function assignment are necessary for constructing a Semantic Web Expert System (SWES) knowledge base. We use the acronym SWES to refer to expert systems which are capable of processing OWL ontologies from the Web with the purpose to supplement or even to develop their knowledge base. To our knowledge, the tasks of ontology merging and member function assignment for SWES have not yet been investigated.

Olegs Verhodubs, Janis Grundspenkis

Web 2.0/3.0 Technology in Lexical Ontology Development: English-Russian WordNet 2.0

Abstract

This paper reports on the current results of the development of the English-Russian WordNet 2.0. It describes the usage of English-Russian lexical language resources and software to process English-Russian WordNet 2.0. Aspects of enhancing English-Russian WordNet 2.0 with Linked Open Data information are discussed.

Sergey Yablonsky

Topic Crawler for Social Networks Monitoring

Abstract

Paper describes a focused crawler for monitoring social networks which is used for information extraction and content analysis. Crawler implements MapReduce model for distributed computations and is oriented to big text data. Focused crawler allows to look for the pages classified as relevant to the specified topic. Classifier is build using knowledge database that defines words, their classes and rules of joining words into the phrases. Based on the weights of words and phrases the text weight which indicates relevance to the topic is obtained. This system was used to detect drug community in Russian segment of Livejournal social network. Official and slang drug terminology was implemented to develop knowledge database. Different aspects of knowledge database and classifier are studied. The non-homogeneous Poisson process was used to model blogs changing since it permits to build a monitoring policy that includes blogs update frequency and day-time effect. Evaluation on real data shows 25% increase in new posts detection.

Andrei V. Yakushev, Alexander V. Boukhanovsky, Peter M. A. Sloot

Conceptual Framework, Models, and Methods of Knowledge Acquisition and Management for Competency Management in Various Areas

Abstract

An approach to organizing knowledge acquisition and management (KAM) and an appropriate conceptual framework for wide competency management sphere and other traditional humanities subject domains (SD) are described. The main purpose of the approach is to support possibility of SD experts work not only with the knowledge, but also with the knowledge metamodel without knowledge engineers’ participation. The framework combines methods of knowledge elicitation and description well-known in humanities with formal methods for knowledge modeling. The framework includes the set of principles, a model of multidimensional space for visualization of continuing education (LifeLong Learning Space, LLLS), normalized competency metamodel (NCMM), and the process of NCMM tailoring. LLLS is used as domain-specific KAM goals model and as a tool for intuitively obvious goals visualization. Well-order and partial order relationships defined on LLLS dimensions are used for competency profiles evaluation and comparison. NCMM facilitates KAM process structuring and control. Rules for KAM process support and performance are described. The types of completed projects for more specific domains are mentioned.

Evgeny Z. Zinder, Irina G. Yunatova

System Description and Demo Publications

Linked Open Data Statistics: Collection and Exploitation

Abstract

This demo presents LODStats, a web application for collection and exploration of the Linked Open Data statistics. LODStats consists of two parts: the core collects statistics about the LOD cloud and publishes it on the LODStats web portal, a front-end for exploration of dataset statistics. Statistics are published both in human-readable and machine-readable formats, thus allowing consumption of the data through web front-end by the users as well as through an API by services and applications. As an example for the latter we showcase how to visualize the statistical data with the CubeViz application.

Ivan Ermilov, Michael Martin, Jens Lehmann, Sören Auer

ORG-Master: Combining Classifications, Matrices and Diagrams in the Enterprise Architecture Modeling Tool

Abstract

Enterprise architecture management is the basis of systemic enterprise transformations and information technology architecture development. Nowadays enterprise architecting is almost synonymous to diagramming. Diagrams are effective for knowledge elicitation, structuring, and dissemination. But as the number of diagrams and their types grows, they overlap and evolve, it becomes hard to maintain a collection of interrelated diagrams, even with the help of a common repository. Besides the very nature of enterprise architecting requires a lot of classifications (e.g. process architecture/classification) and matrices (goals - processes, processes - organizational roles, processes - applications…). The ORG-Master tool combines classifications and matrices with traditional diagram-based technologies.

Lev Grigoriev, Dmitry Kudryavtsev

User Interface for a Template Based Question Answering System

Abstract

As an increasing amount of RDF data is published as Linked Data, intuitive ways of accessing this data become more and more important. Natural language question answering approaches have been proposed as a good compromise between intuitiveness and expressiveness. We present a user interface for the template based question answering system which covers the full question answering pipeline and answers factual questions with a list of RDF resources. Users can ask full-sentence, English factual questions and get a list of resources which are then visualized using those properties which are expected to carry the most important information for the user. The available knowledge bases are (1) DBpedia for general domain question answering and (2) Oxford real estate for housing searches. However, the system is easily extensible to other knowledge bases.

Konrad Höffner, Christina Unger, Lorenz Bühmann, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Phillip Cimiano

TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data

Abstract

Linked Open Data (LOD) comprises of an unprecedented volume of structured datasets on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced and even extracted data of relatively low quality. We present a methodology for assessing the quality of linked data resources, which comprises of a manual and a semi-automatic process. In this paper we focus on the manual process where the first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. The second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is implemented by the tool TripleCheckMate wherein a user assesses an individual resource and evaluates each fact for correctness. This paper focuses on describing the methodology, quality taxonomy and the tools’ system architecture, user perspective and extensibility.

Dimitris Kontokostas, Amrapali Zaveri, Sören Auer, Jens Lehmann

Development of an Ontology-Based E-Learning System

Abstract

The paper describes the experience on using ontologies and other semantic technologies in e-learning and distant education. The main goal of the project is development of an ontology-based e-learning system to rectify a range of problems which currently exist in the Russian education, including the weak structuring of educational resources and the lack of connections between their individual components. The paper presents the ontology-based model and the platform called Information Workbench which are used in the system.

Dmitry Mouromtsev, Fedor Kozlov, Olga Parkhimovich, Maria Zelenina

Sci-Search: Academic Search and Analysis System Based on Keyphrases

Abstract

Structured data representation allows saving much time during relevant information search and gives a useful view on a domain. It allows researchers to find relevant publications faster and getting insights about tendencies and dynamics of a particular scientific domain as well as finding emerging topics. Sorted lists of search results provided by the popular search engines are not suitable for such a task. In this paper we demonstrate a demo version of a search engine working with abstracts of scientific articles and providing structured representation of information to the user. Keyphrases are used as the basis for processing algorithms and representation. Some algorithm details are described in the paper. A number of test requests and their results are discussed.

Svetlana Popova, Ivan Khodyrev, Artem Egorov, Stepan Logvin, Sergey Gulyaev, Maria Karpova, Dmitry Mouromtsev

Semantically-Enabled Environmental Data Discovery and Integration: Demonstration Using the Iceland Volcano Use Case

Abstract

We present a framework for semantically-enabled data discovery and integration across multiple Earth science disciplines. We leverage well-known vocabularies and ontologies to build semantic models for both metadata and data harmonization. Built upon standard guidelines, our metadata model extends them with richer semantics. To harmonize data, we implement an observation-centric data model based on the RDF Data Cube vocabulary. Previous works define the Data Cube extensions which are relevant to certain Earth science disciplines. To provide a generic and domain independent solution, we propose an upper level vocabulary that allows us to express domain specific information at a higher level of abstraction.

From a human viewpoint we provide an interactive Web based user interface for data discovery and integration across multiple research infrastructures. We will demonstrate the system on a use case of the Iceland Volcano’s eruption on April 10, 2010.

Tatiana Tarasova, Massimo Argenti, Maarten Marx

Backmatter

Title: Knowledge Engineering and the Semantic Web
Editors: Pavel Klinov
Dmitry Mouromtsev
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-41360-5
Print ISBN: 978-3-642-41359-9
DOI: https://doi.org/10.1007/978-3-642-41360-5

Springer Professional

About this book

Table of Contents

Frontmatter

Research and Industry Publications

Experiments on Using LOD Cloud Datasets to Enrich the Content of a Scientific Knowledge Base

Ontology-Based Content Trust Support of Expert Information Resources in Quantitative Spectroscopy

Measuring Psychological Impact on Group Ontology Design and Development: An Empirical Approach

Applied Semantics for Integration and Analytics

Technology of Ontology Visualization Based on Cognitive Frames for Graphical User Interface

SRelation: Fast RDF Graph Traversal

An Approach to Improving the Classification of the New York Times Annotated Corpus

Applying the Latent Semantic Analysis to the Issue of Automatic Extraction of Collocations from the Domain Texts

Exploring Automated Reasoning in First-Order Logic: Tools, Techniques and Application Areas

OntoQuad: Native High-Speed RDF DBMS for Semantic Web

A Comparison of Federation over SPARQL Endpoints Frameworks

Knowledge-Based Support for Complex Systems Exploration in Distributed Problem Solving Environments

Efficient Module Extraction for Large Ontologies

Representation of Historical Sources on the Semantic Web by Means of Attempto Controlled English

Ontology Merging in the Context of a Semantic Web Expert System

Web 2.0/3.0 Technology in Lexical Ontology Development: English-Russian WordNet 2.0

Topic Crawler for Social Networks Monitoring

Conceptual Framework, Models, and Methods of Knowledge Acquisition and Management for Competency Management in Various Areas

System Description and Demo Publications

Linked Open Data Statistics: Collection and Exploitation

ORG-Master: Combining Classifications, Matrices and Diagrams in the Enterprise Architecture Modeling Tool

User Interface for a Template Based Question Answering System

TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data

Development of an Ontology-Based E-Learning System

Sci-Search: Academic Search and Analysis System Based on Keyphrases

Semantically-Enabled Environmental Data Discovery and Integration: Demonstration Using the Iceland Volcano Use Case

Backmatter