Skip to main content

Open Access 2014 | Open Access | Buch

Buchtitelbild

Linked Open Data -- Creating Knowledge Out of Interlinked Data

Results of the LOD2 Project

insite
SUCHEN

Über dieses Buch

Linked Open Data (LOD) is a pragmatic approach for realizing the Semantic Web vision of making the Web a global, distributed, semantics-based information system. This book presents an overview on the results of the research project “LOD2 -- Creating Knowledge out of Interlinked Data”. LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Program. Commencing in September 2010, this 4-year project comprised leading Linked Open Data research groups, companies, and service providers from across 11 European countries and South Korea. The aim of this project was to advance the state-of-the-art in research and development in four key areas relevant for Linked Data, namely 1. RDF data management; 2. the extraction, creation, and enrichment of structured RDF data; 3. the interlinking and fusion of Linked Data from different sources and 4. the authoring, exploration and visualization of Linked Data.

Inhaltsverzeichnis

Frontmatter

Open Access

Introduction to LOD2
Abstract
In this introductory chapter we give a brief overview on the Linked Data concept, the Linked Data lifecycle as well as the LOD2 Stack – an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: (1) Software integration and deployment using the Debian packaging system. (2) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. (3) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework.
Sören Auer

Technology

Frontmatter

Open Access

Advances in Large-Scale RDF Data Management
Abstract
One of the prime goals of the LOD2 project is improving the performance and scalability of RDF storage solutions so that the increasing amount of Linked Open Data (LOD) can be efficiently managed. Virtuoso has been chosen as the basic RDF store for the LOD2 project, and during the project it has been significantly improved by incorporating advanced relational database techniques from MonetDB and Vectorwise, turning it into a compressed column store with vectored execution. This has reduced the performance gap (“RDF tax”) between Virtuoso’s SQL and SPARQL query performance in a way that still respects the “schema-last” nature of RDF. However, by lacking schema information, RDF database systems such as Virtuoso still cannot use advanced relational storage optimizations such as table partitioning or clustered indexes and have to execute SPARQL queries with many self-joins to a triple table, which leads to more join effort than needed in SQL systems. In this chapter, we first discuss the new column store techniques applied to Virtuoso, the enhancements in its cluster parallel version, and show its performance using the popular BSBM benchmark at the unsurpassed scale of 150 billion triples. We finally describe ongoing work in deriving an “emergent” relational schema from RDF data, which can help to close the performance gap between relational-based and RDF-based storage solutions.
Peter Boncz, Orri Erling, Minh-Duc Pham

Open Access

Knowledge Base Creation, Enrichment and Repair
Abstract
This chapter focuses on data transformation to RDF and Linked Data and furthermore on the improvement of existing or extracted data especially with respect to schema enrichment and ontology repair. Tasks concerning the triplification of data are mainly grounded on existing and well-proven techniques and were refined during the lifetime of the LOD2 project and integrated into the LOD2 Stack. Triplification of legacy data, i.e. data not yet in RDF, represents the entry point for legacy systems to participate in the LOD cloud. While existing systems are often very useful and successful, there are notable differences between the ways knowledge bases and Wikis or databases are created and used. One of the key differences in content is in the importance and use of schematic information in knowledge bases. This information is usually absent in the source system and therefore also in many LOD knowledge bases. However, schema information is needed for consistency checking and finding modelling problems. We will present a combination of enrichment and repair steps to tackle this problem based on previous research in machine learning and knowledge representation. Overall, the Chapter describes how to enable tool-supported creation and publishing of RDF as Linked Data (Sect. 1) and how to increase the quality and value of such large knowledge bases when published on the Web (Sect. 2).
Sebastian Hellmann, Volha Bryl, Lorenz Bühmann, Milan Dojchinovski, Dimitris Kontokostas, Jens Lehmann, Uroš Milošević, Petar Petrovski, Vojtěch Svátek, Mladen Stanojević, Ondřej Zamazal

Open Access

Interlinking and Knowledge Fusion
Abstract
The central assumption of Linked Data is that data providers ease the integration of Web data by setting RDF links between data sources. In addition to linking entities, Web data integration also requires the alignment of the different vocabularies that are used to describe entities as well as the resolution of data conflicts between data sources. In this chapter, we present the methods and open source tools that have been developed in the LOD2 project for supporting data publishers to set RDF links between data sources. We also introduce the tools that have been developed for translating data between different vocabularies, for assessing the quality of Web data as well as for resolving data conflicts by fusing data from multiple data sources.
Volha Bryl, Christian Bizer, Robert Isele, Mateja Verlic, Soon Gill Hong, Sammy Jang, Mun Yong Yi, Key-Sun Choi

Open Access

Facilitating the Exploration and Visualization of Linked Data
Abstract
The creation and the improvement of tools that cover exploratory and visualization tasks for Linked Data were one of the major goals focused in the LOD2 project. Tools that support those tasks are regarded as essential for the Web of Data, since they can act as a user-oriented starting point for data customers. During the project, several development efforts were made, whose results either facilitate the exploration and visualization directly (such as OntoWiki, the Pivot Browser) or can be used to support such tasks. In this chapter we present the three selected solutions rsine, CubeViz and Facete.
Christian Mader, Michael Martin, Claus Stadler

Open Access

Supporting the Linked Data Life Cycle Using an Integrated Tool Stack
Abstract
The core of a Linked Data application is the processing of the knowledge expressed as Linked Data. Therefore the creation, management, curation and publication of Linked Data are critical aspects for an application’s success. For all of these aspects the LOD2 project provides components. These components have been collected and placed under one distribution umbrella: the LOD2 stack. In this chapter we will introduce this component stack. We will show how to get access; which component covers which aspect of the Linked Data life cycle and how using the stack eases the access to Linked Data management tools. Furthermore we will elaborate how the stack can be used to support a knowledge domain. The illustrated domain is statistical data.
Bert Van Nuffelen, Valentina Janev, Michael Martin, Vuk Mijovic, Sebastian Tramp

Use Cases

Frontmatter

Open Access

LOD2 for Media and Publishing
Abstract
It is the core business of the information industry, including traditional publishers and media agencies, to deal with content, data and information. Therefore, the development and adaptation of Linked Data and Linked Open Data technologies to this industry is a perfect fit. As a concrete example, the processing of legal information at Wolters Kluwer as a global legal publisher through the whole data life cycle is introduced. Further requirements, especially in the field of governance, maintenance and licensing of data are developed in detail. The partial implementation of this technology in the operational systems of Wolters Kluwer shows the relevance and usefulness of this technology.
Christian Dirschl, Tassilo Pellegrini, Helmut Nagy, Katja Eck, Bert Van Nuffelen, Ivan Ermilov

Open Access

Building Enterprise Ready Applications Using Linked Open Data
Abstract
Exploiting open data in the web community is an established movement that is growing these recent years. Government public data is probably the most common and visible part of the later phenomena. What about companies and business data? Even if the kickoff was slow, forward-thinking companies and businesses are embracing semantic technologies to manage their corporate information. The availability of various sources, be they internal or external, the maturity of semantic standards and frameworks, the emergence of big data technologies for managing huge volumes of data have fostered the companies to migrate their internal information systems from traditional silos of corporate data into semantic business data hubs. In other words, the shift from conventional enterprise information management into Linked Opened Data compliant paradigm is a strong trend in enterprise roadmaps. This chapter discusses a set of guidelines and best practices that eases this migration within the context of a corporate application.
Amar-Djalil Mezaour, Bert Van Nuffelen, Christian Blaschke

Open Access

Lifting Open Data Portals to the Data Web
Abstract
Recently, a large number of open data repositories, catalogs and portals have been emerging in the scientific and government realms. In this chapter, we characterise this newly emerging class of information systems. We describe the key functionality of open data portals, present a conceptual model and showcase the pan-European data portal PublicData.eu as a prominent example. Using examples from Serbia and Poland, we present an approach for lifting the often semantically shallow datasets registered at such data portals to Linked Data in order to make data portals the backbone of a distributed global data warehouse for our information society on the Web.
Sander van der Waal, Krzysztof Węcel, Ivan Ermilov, Valentina Janev, Uroš Milošević, Mark Wainwright

Open Access

Linked Open Data for Public Procurement
Abstract
Public procurement is an area that could largely benefit from linked open data technology. The respective use case of the LOD2 project covered several aspects of applying linked data on public contracts: ontological modeling of relevant concepts (Public Contracts Ontology), data extraction from existing semi-structured and structured sources, support for matchmaking the demand and supply on the procurement market, and aggregate analytics. The last two, end-user oriented, functionalities are framed by a specifically designed (prototype) web application.
Vojtěch Svátek, Jindřich Mynarz, Krzysztof Węcel, Jakub Klímek, Tomáš Knap, Martin Nečaský
Backmatter
Metadaten
Titel
Linked Open Data -- Creating Knowledge Out of Interlinked Data
herausgegeben von
Sören Auer
Volha Bryl
Sebastian Tramp
Copyright-Jahr
2014
Electronic ISBN
978-3-319-09846-3
Print ISBN
978-3-319-09845-6
DOI
https://doi.org/10.1007/978-3-319-09846-3