Skip to main content
main-content
Top

About this book

This book’s main goals are to bring together in a concise way all the methodologies, standards and recommendations related to Data, Queries, Links, Semantics, Validation and other issues concerning machine-readable data on the Web, to describe them in detail, to provide examples of their use, and to discuss how they contribute to – and how they have been used thus far on – the “Web of Data”. As the content of the Web becomes increasingly machine readable, increasingly complex tasks can be automated, yielding more and more powerful Web applications that are capable of discovering, cross-referencing, filtering, and organizing data from numerous websites in a matter of seconds.

The book is divided into nine chapters, the first of which introduces the topic by discussing the shortcomings of the current Web and illustrating the need for a Web of Data. Next, “Web of Data” provides an overview of the fundamental concepts involved, and discusses some current use-cases on the Web where such concepts are already being employed. “Resource Description Framework (RDF)” describes the graph-structured data model proposed by the Semantic Web community as a common data model for the Web. The chapter on “RDF Schema (RDFS) and Semantics” presents a lightweight ontology language used to define an initial semantics for terms used in RDF graphs. In turn, the chapter “Web Ontology Language (OWL)” elaborates on a more expressive ontology language built upon RDFS that offers much more powerful ontological features. In “SPARQL Query Language” a language for querying and updating RDF graphs is described, with examples of the features it supports, supplemented by a detailed definition of its semantics. “Shape Constraints and Expressions (SHACL/ShEx)” introduces two languages for describing the expected structure of – and expressing constraints on – RDF graphs for the purposes of validation. “Linked Data” discusses the principles and best practices proposed by the Linked Data community for publishing interlinked (RDF) data on the Web, and how these techniques have been adopted. The final chapter highlights open problems and rounds out the coverage with a more general discussion on the future of the Web of Data.

The book is intended for students, researchers and advanced practitioners interested in learning more about the Web of Data, and about closely related topics such as the Semantic Web, Knowledge Graphs, Linked Data, Graph Databases, Ontologies, etc. Offering a range of accessible examples and exercises, it can be used as a textbook for students and other newcomers to the field. It can also serve as a reference handbook for researchers and developers, as it offers up-to-date details on key standards (RDF, RDFS, OWL, SPARQL, SHACL, ShEx, RDB2RDF, LDP), along with formal definitions and references to further literature. The associated website webofdatabook.org offers a wealth of complementary material, including solutions to the exercises, slides for classes, raw data for examples, and a section for comments and questions.

Table of Contents

Frontmatter

Chapter 1. Introduction

Abstract
In this chapter, we first introduce the foundations upon which the current Web is built: HTTP, URLs and HTML. We discuss how the current Web was designed with human readability, rather than machine readability, in mind. In particular, much of the information presented in HTML webpages is encoded in natural language, which is poorly machine readable. We present concrete examples to illustrate why it is difficult to automate tasks over the content of the current Web. We argue that in order to enable greater levels of automation on the Web, we need content to be made available in formats that are more machine readable, motivating the need for a Web of Data.
Aidan Hogan

Chapter 2. Web of Data

Abstract
This chapter discusses the abstract concepts necessary to realise a Web of Data. We discuss how the content on the Web can be represented as graph-structured data in order to increase machine readability. We show how queries can be structured in a similar fashion to the data in order to automate their evaluation. We motivate the need for formal semantics, which makes explicit what the terms used in the data mean in relation to each other. We illustrate the need for constraints in order to automatically validate data. We further describe how links can be used to connect and discover data on the Web. In order to showcase the practical benefits of adopting these concepts, we look at four prominent scenarios in which they are currently being used on the Web: Wikidata, Knowledge Graphs, Schema.org, and Linking Open Data.
Aidan Hogan

Chapter 3. Resource Description Framework

Abstract
This chapter provides a detailed primer for the Resource Description Framework (RDF 1.1) standard, proposed as a common data model for publishing and exchanging structured data on the Web. We first motivate the need for a data model like RDF. We then describe the types of terms used in RDF: the basic building blocks of the framework. We discuss how these terms can be combined to make coherent statements in the form of RDF triples, and how triples form graphs and datasets. Thereafter we discuss the RDF vocabulary: a built-in set of terms used for modelling more complex data, such as complex relations and ordered lists. Finally, we give an overview of the different syntaxes by which RDF can be serialised and communicated.
Aidan Hogan

Chapter 4. RDF Schema and Semantics

Abstract
This chapter presents an in-depth primer for the RDF Schema (RDFS 1.1) standard, which is primarily used to define a lightweight semantics for the classes and properties used in RDF graphs. After an initial motivation and overview, we discuss the RDFS vocabulary, and how it can be used to define sub-classes, sub-properties, domain and ranges, amongst other types of definitions. We then describe in detail how the semantics of RDF(S) can be formalised in a model-theoretic way, discussing key concepts such as interpretations, models, satisfiability and entailment. We introduce various semantics for RDF(S), including the simple semantics, D semantics, RDF semantics, and the RDFS semantics. We conclude the chapter by discussing how rules can be used to support entailment under such semantics.
Aidan Hogan

Chapter 5. Web Ontology Language

Abstract
In this chapter, we provide a detailed primer on the second version of the Web Ontology Language (OWL 2) standard. We first motivate the need for such a standard, discussing the role and importance of ontologies on the Web. We then describe how ontology languages, which themselves can be formally defined through model theory, can subsequently be used to formally define ontologies. Thereafter we discuss the OWL vocabulary used to define the semantics of classes, properties, individuals, and datatypes within ontologies. We cover some of the main reasoning tasks for ontologies and the applications in which they are used. We discuss how these core reasoning tasks are undecidable for the full OWL (2) language and outline the sub-languages (aka. profiles) proposed by the standard that allow for more efficient reasoning procedures. We conclude by reflecting on the importance of having expressive ontologies on the Web of Data, and discuss open challenges.
Aidan Hogan

Chapter 6. SPARQL Query Language

Abstract
This chapter provides a detailed introduction to the SPARQL Protocol and RDF Query Language (SPARQL 1.1): the standard query language for RDF. After some initial motivation, we delve into the features of the query language, illustrated with concrete examples. We then formally define the semantics of these query features. We next discuss how federated queries can be used to evaluate queries over multiple remote sources on the Web. We detail the SPARQL Update language, which allows for modifying the data indexed by a SPARQL query service. We introduce SPARQL Entailment Profiles, which allow for query results to consider entailments, including support for RDF, RDFS and OWL semantics. We further discuss the HTTP-based protocol by which requests can be issued to a SPARQL service over the Web, as well as the SPARQL Service Description vocabulary, which can be used to describe and advertise the features supported by such services. We conclude by discussing the importance of SPARQL for the Web of Data, the key research directions that are currently being explored, as well as open challenges.
Aidan Hogan

Chapter 7. Shape Constraints and Expressions

Abstract
In this chapter, we introduce two languages for describing shapes and constraints for RDF graphs, namely the Shapes Constraint Language (SHACL) and the Shape Expressions Language (ShEx 2.1). Both languages allow for defining constraints over RDF graphs in terms of what data are expected, what data are obligatory, what data are allowed, and what data are disallowed. This in turn allows RDF graphs to be validated with respect to the specified constraints. We first look at SHACL, describing the SHACL-Core fragment and the constraints it allows. We then discuss how SHACL-SPARQL allows for further constraints to be expressed using SPARQL query syntax. Turning to ShEx, we describe its syntaxes, and how it differs from SHACL. We outline and provide a semantics for an abstract shapes syntax that generalises SHACL and ShEx. We conclude with a general discussion of the role of shapes languages on the Web of Data, as well as open challenges.
Aidan Hogan

Chapter 8. Linked Data

Abstract
This chapter motivates, introduces and describes Linked Data, which centres around a concise set of principles by which data can be published and interlinked on the Web, and by which a Web of Data can ultimately be formed. We first discuss the core Linked Data principles, which espouse the use of HTTP IRIs to identify the entities described in data, returning a machine-readable description of the entity (typically RDF) when its corresponding IRI is looked up on the Web. We then discuss some further best practices for publishing data conformant with the Linked Data principles in a way that enhances interoperability. We discuss the Linking Open Data (LOD) project founded on the idea of publishing Open Data on the Web in a standard, machine-readable fashion using Linked Data; we describe the most prominent datasets and vocabularies that have results from this initiative. We then discuss tools and techniques for converting legacy data to RDF, discovering links, and hosting Linked Data. We subsequently discuss the Linked Data Platform: a standard that outlines the protocols and resources needed to build a new generation of read–write Linked Data applications. We conclude the chapter with a discussion of open challenges yet to be addressed in the context of Linked Data.
Aidan Hogan

Chapter 9. Conclusions

Abstract
We begin the conclusions chapter with some high-level remarks about the current state of the Web of Data in terms of the milestones reached and the success stories that have emerged thus far. We then discuss a selection of research trends that are of key importance for furthering the vision of the Web of Data, with each trend involving a mix of data, semantics, querying, and links. Specifically we discuss trends relating to data quality, link discovery, context, legacy formats, graph analytics, inductive semantics, ontology engineering, ontology-based data access, linked vocabularies, Web reasoning and querying, query interfaces, usage control, read–write Linked Data, as well as knowledge graphs. In our final remarks, we reflect again on the overall goals of the Web of Data, and discuss the abstract challenges that lay ahead.
Aidan Hogan

Backmatter

Additional information

Premium Partner

    Image Credits