Skip to main content

Über dieses Buch

Christian Fürber investigates the useful application of semantic technologies for the area of data quality management. Based on a literature analysis of typical data quality problems and typical activities of data quality management processes, he develops the Semantic Data Quality Management framework as the major contribution of this thesis. The SDQM framework consists of three components that are evaluated in two different use cases. Moreover, this thesis compares the framework to conventional data quality software. Besides the framework, this thesis delivers important theoretical findings, namely a comprehensive typology of data quality problems, ten generic data requirement types, a requirement-centric data quality management process, and an analysis of related work.



1. Introduction

In this chapter, we will provide a brief introduction into the thesis topic, clarify our understanding of the term “data” and its dependency to business processes and decisions, and discuss the economic relevance of the systematic management of data quality. Moreover, we give a short overview of the thesis structure.

Christian Fürber

2. Research Design

In this chapter, we first provide a definition for the terms “semantic technologies” and “ontologies” to provide a basic understanding for the following chapters. After that, we define the research goals and research questions. This chapter concludes with the research methodology that has been applied to generate the answers to the research questions and achieve the research goals.

Christian Fürber

3. Data Quality

Data quality is a multidimensional concept (Batini & Scannapieco, 2006, p. 19ff.; Eppler, 2006; Redman, 1996, p. 245ff.; Wand & Wang, 1996, p. 87; Wang & Strong, 1996, p. 22f.) that can be defined from several different perspectives (cf. Ge & Helfert, 2007, p. 1; Kahn et al., 2002, p. 185). For example, data consumers, data producers, data providers, and data custodians may all have different perspectives on the definition of data quality (cf. Kahn et al., 2002, p. 184). From the consumer viewpoint, data quality can be defined as “data that are fit for use by data consumers” (Wang & Strong, 1996, p. 6) in analogy to the popular quality definition related to products and services by Juran (Juran, 1988, p. 2.2).

Christian Fürber

4. Semantic Technologies

As discussed in section 2.1 of this thesis we regard semantic technologies “as technical approaches that facilitate or make use of the interpretation of meaning by machines”. Ontologies are one of the core elements of semantic solutions. In the following, we review the definition of ontologies and briefly describe their general characteristics. Moreover, we discuss important concepts for ontology and knowledge representation within the Semantic Web. After that, we explain ways to process knowledge representations, such as reasoning, inferencing, and querying. Due to the focus of this thesis, we finally describe how relational databases and ontologies are related.

Christian Fürber

5. Data Quality in the Semantic Web

The Semantic Web is an initiative of the World Wide Web Consortium (W3C) with the vision to evolve the traditional Web, which is essentially a graph of interlinked documents, into a “Web of Data” (Berners-Lee et al., 2001; cf. W3C, 2013). One of the major goals of the Semantic Web is the supply of machine-interpretable data at Web scale to gain a higher degree of automation and to facilitate more complete processing of information (cf. Berners-Lee et al., 2001). For example, if the prices of all consumer products were published in a machine-readable format and structure throughout the whole Web, then more complete price comparisons at global scale would be possible with minimal manual effort.

Christian Fürber

6. Specification of Initial Requirements

This chapter specifies the requirements for an ontology-based data quality management framework, called Semantic Data Quality Management Framework (SDQM), which shall be developed to support data quality management activities by the use of ontologies. We thereby apply the Design Science Research Methodology (DSRM, cf. Peffers et al., 2008) process as explained in section 2.4. We start with describing the required artifacts with a motivating scenario that illustrates the needs related to data quality management. Based on the motivating scenario, we derive initial requirements for the framework.

Christian Fürber

7. Architecture of the Semantic Data Quality Management Framework (SDQM)

In this chapter, we define the objectives and justify the design decisions of the Semantic Data Quality Management framework (SDQM). We describe each component of SDQM’s architecture as illustrated in figure 26, namely (1) the data acquisition layer, (2) the data storage layer, (3) the data quality management vocabulary (DQM Vocabulary), (4) the data requirements editor, and (5) the reporting layer. The design of the architecture is based on the requirements identified in the previous chapter. The following sections are organized according to these major components of the SDQM.

Christian Fürber

8. Application Procedure of SDQM

In this chapter, we explain how to use the SDQM architecture from the perspective of business users who want to create data requirements, identify data requirement violations, and evaluate the quality state based on their data requirements.

Christian Fürber

9. Evaluation of the Semantic Data Quality Management Framework (SDQM)

In this chapter, we evaluate the proposed SDQM approach. The evaluation methodology of SDQM is separated into three parts. The first part is concerned with the evaluation of precision and recall of SDQM’s data quality monitoring and assessment algorithms. The second part evaluates the practical applicability of SDQM by applying the framework to three different use cases, namely one business use case on material master data of a large organization, one Semantic Web use case with data from DBpedia, and one use case that examines the capability of SDQM to automatically identify inconsistent data requirements. In the third part of the evaluation, SDQM is compared to a conventional data quality tool.

Christian Fürber

10. Related Work

This chapter summarizes research approaches in the area of ontology-based data quality management and compares the SDQM framework with such related work. Ontology-based data quality management frameworks in here are artifacts that make use of ontologies to support data quality management activities. In the following, we provide a high-level classification of the field, which is then used to organize the presentation of related work in this chapter.

Christian Fürber

11. Synopsis and Future Work

The research goal of this thesis was the investigation of the usefulness of ontologies for data quality management. In this thesis project, we created an ontology, called the Data Quality Management vocabulary (DQM vocabulary), to collect and store data requirements in a structured and linkable format. Moreover, we configured a wiki, called data requirements wiki, which contains standard forms to capture data requirements and to store them based on the elements of our ontology, the DQM vocabulary.

Christian Fürber


Weitere Informationen

Premium Partner

Neuer Inhalt

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.



Product Lifecycle Management im Konzernumfeld – Herausforderungen, Lösungsansätze und Handlungsempfehlungen

Für produzierende Unternehmen hat sich Product Lifecycle Management in den letzten Jahrzehnten in wachsendem Maße zu einem strategisch wichtigen Ansatz entwickelt. Forciert durch steigende Effektivitäts- und Effizienzanforderungen stellen viele Unternehmen ihre Product Lifecycle Management-Prozesse und -Informationssysteme auf den Prüfstand. Der vorliegende Beitrag beschreibt entlang eines etablierten Analyseframeworks Herausforderungen und Lösungsansätze im Product Lifecycle Management im Konzernumfeld.
Jetzt gratis downloaden!