Skip to main content
Top

2010 | Book

Semantic Web Information Management

A Model-Based Perspective

Editors: Roberto de Virgilio, Fausto Giunchiglia, Letizia Tanca

Publisher: Springer Berlin Heidelberg

insite
SEARCH

About this book

Databases have been designed to store large volumes of data and to provide efficient query interfaces. Semantic Web formats are geared towards capturing domain knowledge, interlinking annotations, and offering a high-level, machine-processable view of information. However, the gigantic amount of such useful information makes efficient management of it increasingly difficult, undermining the possibility of transforming it into useful knowledge.

The research presented by De Virgilio, Giunchiglia and Tanca tries to bridge the two worlds in order to leverage the efficiency and scalability of database-oriented technologies to support an ontological high-level view of data and metadata. The contributions present and analyze techniques for semantic information management, by taking advantage of the synergies between the logical basis of the Semantic Web and the logical foundations of data management. The book’s leitmotif is to propose models and methods especially tailored to represent and manage data that is appropriately structured for easier machine processing on the Web.

After two introductory chapters on data management and the Semantic Web in general, the remaining contributions are grouped into five parts on Semantic Web Data Storage, Reasoning in the Semantic Web, Semantic Web Data Querying, Semantic Web Applications, and Engineering Semantic Web Systems. The handbook-like presentation makes this volume an important reference on current work and a source of inspiration for future development, targeting academic and industrial researchers as well as graduate students in Semantic Web technologies or database design.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
In the most recent years, the Semantic Web has become a most promising research field, which tries to automate and support sharing and reuse of the data and metadata representing the growing amount of digital information available to our society. The underlying idea of having a description of the data on the Web, in such a way that it can be employed by machines for automation, integration and reuse across various applications, has been exploited in several research fields. However, the gigantic amount of such useful information makes more and more difficult its efficient management, undermining the possibility to transform it into useful knowledge.
Roberto De Virgilio, Fausto Giunchiglia, Letizia Tanca
Chapter 2. Data and Metadata Management
Abstract
In this chapter, we illustrate fundamental notions of Data Management. We start from the basic concepts of database schema and instance, illustrating how they are handled within dictionaries in relational databases. We then address, from a general point of view, the notion of data model as a means for the representation of information at different levels of details, and show how dictionaries can be used to manage schemas of different data models. A further level of abstraction, called metamodel, is then introduced with the aim of supporting the interoperability between model based applications. Finally, we discuss a methodology for schema translation that makes use of the metamodel as an intermediate level.
Paolo Atzeni, Riccardo Torlone
Chapter 3. The Semantic Web Languages
Abstract
The Semantic Web is basically an extension of the Web and of the Web-enabling database and Internet technology, and, as a consequence, the Semantic Web methodologies, representation mechanisms and logics strongly rely on those developed in databases. This is the motivation for many attempts to, more or less loosely, merge the two worlds like, for instance, the various proposals to use relational technology for storing web data or the use of ontologies for data integration. This chapter comes after one on data management, in order to first complete the picture with the description of the languages that can be used to represent information on the Semantic Web, and then highlight a few fundamental differences which make the database and Semantic Web paradigms complementary, but somehow difficult to integrate.
Fausto Giunchiglia, Feroz Farazi, Letizia Tanca, Roberto De Virgilio

Semantic Web Data Storage

Frontmatter
Chapter 4. Relational Technologies, Metadata and RDF
Abstract
Metadata plays an important role in successfully understanding and querying data on the web. A number of metadata management solutions have already been developed but each is tailored to specific kinds of metadata. The Resource Description Framework (RDF) is a generic, flexible and powerful model which is becoming the de-facto standard for metadata representation on the Web. Its adoption has created an exponential growth of the amount of available RDF data calling for efficient management solutions. Instead of designing such solutions from scratch, it is possible to invest on existing relational technologies by exploiting their long presence and maturity. Relational technologies can offer efficient storage and high performance querying at relatively low cost. Unfortunately, the principles of the relational model are fundamentally different from those of RDF. This difference means that specialized storage and querying schemes need to be put in place in order to use relational technologies for RDF data. In this work, we provide a comprehensive description of these relational RDF storage schemes and discuss their advantages and limitations. We believe that through carefully designed schemes, it is possible to achieve sophisticated high performance systems that support the full power of RDF and bring one step closer the materialization of the Semantic Web vision.
Yannis Velegrakis
Chapter 5. A Metamodel Approach to Semantic Web Data Management
Abstract
The Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving and reusing information. In this context, the Resource Description Framework (RDF) has been conceived to provide an easy way to represent any kind of data and metadata, according to a lightweight model and syntaxes for serialization (RDF/XML, N3, etc.). Despite RDF has the advantage of being general and simple, it cannot be used as a storage model as it is, since it can be easily shown that even simple management operations involve serious performance limitations. In this paper, we present a novel approach for storing, managing and processing RDF data in an effective and efficient way. The approach is based on the notion of construct, that represents a concept of the domain of interest. This makes the approach easily extensible and independent from the specific knowledge representation language. We refer to real world scenarios in which we consider complex data management operations, which go beyond simple selections or projections and involve the navigation of huge portions of RDF data sources.
Roberto De Virgilio, Pierluigi Del Nostro, Giorgio Gianforme, Stefano Paolozzi
Chapter 6. Managing Terabytes of Web Semantics Data
Abstract
A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.​com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.
Michele Catasta, Renaud Delbru, Nickolai Toupikov, Giovanni Tummarello

Reasoning in the Semantic Web

Frontmatter
Chapter 7. Reasoning in Semantic Web-based Systems
Abstract
In this chapter, we introduce the basics of formal languages and reasoning in a Web context. Denoting information by means of a logical formalism makes it possible to employ established techniques from the field of automated reasoning. However, reasoning in the context of Web-based systems has a distinct set of requirements in terms of quality and quantity of the information it has to cope with. In turn, this chapter focuses, in a very foundational way, on reasoning on Semantic Web-oriented data. For this purpose, we briefly identify and describe the basic paradigms forming the background for knowledge representation in Web-based Systems. We then examine how these paradigms are reflected in current standards and trends on the Web and what kinds of reasoning they typically facilitate. Based on this, we proceed to focus on concrete reasoning techniques and their particular properties, including optimizations and various other possibilities, e.g., parallelization and approximation, to meet the scalability requirements in Web-based systems.
Florian Fischer, Gulay Unel
Chapter 8. Modular Knowledge Representation and Reasoning in the Semantic Web
Abstract
Construction of modular ontologies by combining different modules is becoming a necessity in ontology engineering in order to cope with the increasing complexity of the ontologies and the domains they represent. The modular ontology approach takes inspiration from software engineering, where modularization is a widely acknowledged feature. Distributed reasoning is the other side of the coin of modular ontologies: given an ontology comprising of a set of modules, it is desired to perform reasoning by combination of multiple reasoning processes performed locally on each of the modules. In the last ten years, a number of approaches for combining logics has been developed in order to formalize modular ontologies. In this chapter, we survey and compare the main formalisms for modular ontologies and distributed reasoning in the Semantic Web. We select four formalisms build on formal logical grounds of Description Logics: Distributed Description Logics, ℰ-connections, Package-based Description Logics and Integrated Distributed Description Logics. We concentrate on expressivity and distinctive modeling features of each framework. We also discuss reasoning capabilities of each framework.
Luciano Serafini, Martin Homola
Chapter 9. Semantic Matching with S-Match
Abstract
We view matching as an operation that takes two graph-like structures (e.g., lightweight ontologies) and produces an alignment between the nodes of these graphs that correspond semantically to each other. Semantic matching is based on two ideas: (i) we discover an alignment by computing semantic relations (e.g., equivalence, more general); (ii) we determine semantic relations by analyzing the meaning (concepts, not labels) which is codified in the entities and the structures of ontologies. In this chapter, we first overview the state of the art in the ontology matching field. Then we present basic and optimized algorithms for semantic matching as well as their implementation within the S-Match system. Finally, we evaluate S-Match against state of the art systems, thereby justifying empirically the strength of the approach.
Pavel Shvaiko, Fausto Giunchiglia, Mikalai Yatskevich
Chapter 10. Preserving Semantics in Automatically Created Ontology Alignments
Abstract
In an open world such as the Internet, one of the most challenging tasks is ontology alignment, which is the process of finding relationships among their elements. Performing this work in an automated fashion is, however, subject to errors, because of the different semantics carried by the same concept in different application domains or because of different ontology design styles which often produce incompatible ontology structures. In this chapter, we relate the most important approaches to ontology mapping revision, proposing a revision technique which aims at preserving the semantics of the original ontologies.
Giorgio Orsi, Letizia Tanca
Chapter 11. tOWL: Integrating Time in OWL
Abstract
The Web Ontology Language (OWL) is the most expressive standard language for modeling ontologies on the Semantic Web. In this chapter, we present the temporal OWL (tOWL) language: a temporal extension of the OWL DL language. tOWL is based on three layers added on top of OWL DL. The first layer is the Concrete Domains layer, which allows the representation of restrictions using concrete domain binary predicates. The second layer is the Time Representation layer, which adds time points, intervals, and Allen’s 13 interval relations. The third layer is the Change Representation layer which supports a perdurantist view on the world, and allows the representation of complex temporal axioms, such as state transitions. A Leveraged Buyout process is used to exemplify the different tOWL constructs and show the tOWL applicability in a business context.
Flavius Frasincar, Viorel Milea, Uzay Kaymak

Semantic Web Data Querying

Frontmatter
Chapter 12. Datalog Extensions for Tractable Query Answering over Ontologies
Abstract
We survey a recently introduced family of expressive extensions of Datalog, called Datalog±, which is a new framework for representing ontologies in the form of integrity constraints, and for query answering under such constraints. Datalog± is derived from Datalog by allowing existentially quantified variables in rule heads, and by enforcing suitable properties in rule bodies, to ensure decidable and efficient query answering. We first present different languages in the Datalog± family, providing tight complexity bounds for nearly all cases. We then show that such languages are general enough to capture the most common tractable ontology languages. In particular, Datalog± can express the DL-Lite family of description logics and F-Logic Lite. Datalog± is a natural and very general framework that can be employed in different contexts such as data integration and exchange.
Andrea Calì, Georg Gottlob, Thomas Lukasiewicz
Chapter 13. On the Semantics of SPARQL
Abstract
The Resource Description Framework (RDF) is the standard data model for representing information about World Wide Web resources. In January 2008, it was released the recommendation of the W3C for querying RDF data, a query language called SPARQL. In this chapter, we give a detailed description of the semantics of this language. We start by focusing on the definition of a formal semantics for the core part of SPARQL, and then move to the definition for the entire language, including all the features in the specification of SPARQL by the W3C such as blank nodes in graph patterns and bag semantics for solutions.
Marcelo Arenas, Claudio Gutierrez, Jorge Pérez
Chapter 14. Labeling RDF Graphs for Linear Time and Space Querying
Abstract
Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.
Tim Furche, Antonius Weinzierl, François Bry
Chapter 15. SPARQLog: SPARQL with Rules and Quantification
Abstract
SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition, SPARQLog is aware of important RDF features such as the distinction between blank nodes, literals and IRIs or the RDFS vocabulary. The semantics of SPARQLog is closed (every answer is an RDF graph), but lifts RDF’s restrictions on literal and blank node occurrences for intermediary data. We show how to define a sound and complete operational semantics that can be implemented using existing logic programming techniques. While SPARQLog is Turing complete, we identify a decidable (in fact, polynomial time) fragment SwARQLog ensuring polynomial data-complexity inspired from the notion of super-weak acyclicity in data exchange. Furthermore, we prove that SPARQLog with no universal quantifiers in the scope of existential ones ( fragment) is equivalent to full SPARQLog in presence of graph projection. Thus, the convenience of arbitrary quantifier alternation comes, in fact, for free. These results, though here presented in the context of RDF querying, apply similarly also in the more general setting of data exchange.
François Bry, Tim Furche, Bruno Marnette, Clemens Ley, Benedikt Linse, Olga Poppe
Chapter 16. SP2Bench: A SPARQL Performance Benchmark
Abstract
A meaningful analysis and comparison of both existing storage schemes for RDF data and evaluation approaches for SPARQL queries necessitates a comprehensive and universal benchmark platform. We present SP2Bench, a publicly available, language-specific performance benchmark for the SPARQL query language. SP2Bench is settled in the DBLP scenario and comprises a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror vital key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. In this chapter, we discuss requirements and desiderata for SPARQL benchmarks and present the SP2Bench framework, including its data generator, benchmark queries and performance metrics.
Michael Schmidt, Thomas Hornung, Michael Meier, Christoph Pinkel, Georg Lausen

Semantic Web Applications

Frontmatter
Chapter 17. Using OWL in Data Integration
Abstract
One of the outcomes of the research work carried out on data integration in the last years is a clear architecture, comprising a global schema, the source schema and the mapping between the source and the global schema. In this chapter, we study data integration under this framework when the global schema is specified in OWL, the standard language for the Semantic Web and discuss the impact of this choice on computational complexity of query answering under different instantiations of the framework in terms of query language and form and interpretation of the mapping. We show that query answering in the resulting setting is computationally too complex, and discuss in detail the various sources of complexity. Then, we show how to limit the expressive power of the various components of the framework in order to have efficient query answering, in principle as efficient as query processing in relational DBMSs. In particular, we adopt OWL 2 QL as the ontology language used to express the global schema. OWL 2 QL is one of the tractable profiles of OWL 2, and essentially corresponds to a member of the DL-Lite family, a family of Description Logics designed to have a good trade-off between expressive power of the language and computational complexity of reasoning.
Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi
Chapter 18. Service Knowledge Spaces for Semantic Collaboration in Web-based Systems
Abstract
Semantic Web technologies have been applied to enable collaboration in open distributed systems, where interoperability issues raise due to the absence of a global view of the shared resources. Adoption of service-oriented technologies has improved interoperability at the application level by exporting systems functionalities as Web services. In fact, Service Oriented Architecture (SOA) constitutes an appropriate platform-independent approach to implement collaboration activities by means of automatic service discovery and composition. Recently, service discovery has been applied to collaborative environments such as the P2P one, where independent partners need cooperate through resource sharing without a stable network configuration and adopting different semantic models. Model-based techniques relying on Semantic Web need be defined to generate semantic service descriptions, allowing collaborative partners to export their functionalities in a semantic way. Semantic-based service matchmaking techniques are in charge of effectively and efficiently evaluating similarity between service requests and service offers in a huge, dynamic distributed environment. The result is an evolving service knowledge space where collaborative partners that provide similar services are semantically related and constitute synergic service centres in a given domain. Specific modeling requirements related to Semantic Web, service-oriented and P2P technologies must be considered.
Devis Bianchini, Valeria De Antonellis, Michele Melchiori
Chapter 19. Informative Top-k Retrieval for Advanced Skill Management
Abstract
The paper presents a knowledge-based framework for skills and talent management based on an advanced matchmaking between profiles of candidates and available job positions. Interestingly, informative content of top-k retrieval is enriched through semantic capabilities. The proposed approach allows to: (1) express a requested profile in terms of both hard constraints and soft ones; (2) provide a ranking function based also on qualitative attributes of a profile; (3) explain the resulting outcomes (given a job request, a motivation for the obtained score of each selected profile is provided). Top-k retrieval allows to select most promising candidates according to an ontology formalizing the domain knowledge. Such a knowledge is further exploited to provide a semantic-based explanation of missing or conflicting features in retrieved profiles. They also indicate additional profile characteristics emerging by the retrieval procedure for a further request refinement. A concrete case study followed by an exhaustive experimental campaign is reported to prove the approach effectiveness.
Simona Colucci, Tommaso Di Noia, Azzurra Ragone, Michele Ruta, Umberto Straccia, Eufemia Tinelli

Engineering Semantic Web Systems

Frontmatter
Chapter 20. MIDST: Interoperability for Semantic Annotations
Abstract
In the last years, interoperability of ontologies and databases has received a lot of attention. However, most of the work has concentrated on specific problems (such as storing an ontology in a database or making database data available to ontologies) and referred to specific models for each of the two. Here, we propose an approach that aims at being more general and model independent. In fact, it works for different dialects for ontologies and for various data models for databases. Also, it supports translations in both directions (ontologies to databases and vice versa) and it allows for flexibility in the translations, so that customization is possible. The proposal extends recent work for schema and data translation (the MIDST project, which implements the ModelGen operator proposed in model management), which relies on a metamodel approach, where data models and variations thereof are described in a common framework and translations are built as compositions of elementary ones.
Paolo Atzeni, Pierluigi Del Nostro, Stefano Paolozzi
Chapter 21. Virtuoso: RDF Support in a Native RDBMS
Abstract
RDF (Resource Description Framework) is seeing rapidly increasing adoption, for example, in the context of the Linked Open Data (LOD) movement and diverse life sciences data publishing and integration projects. This paper discusses how we have adapted OpenLink Virtuoso, a general purpose RDBMS, for this new type of workload. We discuss adapting Virtuoso’s relational engine for native RDF support with dedicated data types, bitmap indexing and SQL optimizer techniques. We further discuss scaling out by running on a cluster of commodity servers, each with local memory and disk. We look at how this impacts query planning and execution and how we achieve high parallel utilization of multiple CPU cores on multiple servers. We present comparisons with other RDF storage models as well as other approaches to scaling out on server clusters. We present conclusions and metrics as well as a number of use cases, from DBpedia to bio informatics and collaborative web applications.
Orri Erling, Ivan Mikhailov
Chapter 22. Hera: Engineering Web Applications Using Semantic Web-based Models
Abstract
In this chapter, we consider the contribution of models and model-driven approaches based on Semantic Web for the development of Web applications. The model-driven web engineering approach, that separates concerns on different abstraction level in the application design process, allows for more robust and structural design of web applications. This is illustrated by the use of Hera, an approach from the class of Web engineering methods that relies on models expressed using RDF(S) and an RDF(S) query language. It illustrates how models and in particular models that fit with the ideas and concepts from the Semantic Web allow to approach the design and engineering of modern, open and heterogeneous Web based systems. In the presented approach, adaptation and personalization are a main aspect and it is illustrated how they are expressed using semantic data models and languages. Also specific features of Hera are discussed, like interoperability between applications in user modeling, aspect orientation in Web design and graphical tool support for Web application design.
Kees van der Sluijs, Geert-Jan Houben, Erwin Leonardi, Jan Hidders
Backmatter
Metadata
Title
Semantic Web Information Management
Editors
Roberto de Virgilio
Fausto Giunchiglia
Letizia Tanca
Copyright Year
2010
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-04329-1
Print ISBN
978-3-642-04328-4
DOI
https://doi.org/10.1007/978-3-642-04329-1

Premium Partner