Skip to main content

Über dieses Buch

Requiring heterogeneous information systems to cooperate and communicate has now become crucial, especially in application areas like e-business, Web-based mash-ups and the life sciences. Such cooperating systems have to automatically and efficiently match, exchange, transform and integrate large data sets from different sources and of different structure in order to enable seamless data exchange and transformation.

The book edited by Bellahsene, Bonifati and Rahm provides an overview of the ways in which the schema and ontology matching and mapping tools have addressed the above requirements and points to the open technical challenges. The contributions from leading experts are structured into three parts: large-scale and knowledge-driven schema matching, quality-driven schema mapping and evolution, and evaluation and tuning of matching tasks. The authors describe the state of the art by discussing the latest achievements such as more effective methods for matching data, mapping transformation verification, adaptation to the context and size of the matching and mapping tasks, mapping-driven schema evolution and merging, and mapping evaluation and tuning. The overall result is a coherent, comprehensive picture of the field.

With this book, the editors introduce graduate students and advanced professionals to this exciting field. For researchers, they provide an up-to-date source of reference about schema and ontology matching, schema and ontology evolution, and schema merging.



Large-Scale and Knowledge-Driven Schema Matching


Chapter 1. Towards Large-Scale Schema and Ontology Matching

The purely manual specification of semantic correspondences between schemas is almost infeasible for very large schemas or when many different schemas have to be matched. Hence, solving such large-scale match tasks asks for automatic or semiautomatic schema matching approaches. Large-scale matching needs especially to be supported for XML schemas and different kinds of ontologies due to their increasing use and size, e.g., in e-business and web and life science applications. Unfortunately, correctly and efficiently matching large schemas and ontologies are very challenging, and most previous match systems have only addressed small match tasks. We provide an overview about recently proposed approaches to achieve high match quality or/and high efficiency for large-scale matching. In addition to describing some recent matchers utilizing instance and usage data, we cover approaches on early pruning of the search space, divide and conquer strategies, parallel matching, tuning matcher combinations, the reuse of previous match results, and holistic schema matching. We also provide a brief comparison of selected match tools.
Erhard Rahm

Chapter 2. Interactive Techniques to Support Ontology Matching

There are many automatic approaches for generating matches between ontologies and schemas. However, these techniques are far from perfect and when the use case requires an accurate matching, humans must be involved in the process. Yet, involving the users in creating matchings presents its own problems. Users have trouble understanding the relationships between large ontologies and schemas and their concepts, remembering what they have looked at and executed, understanding output from the automatic algorithm, remembering why they performed an operation, reversing their decisions, and gathering evidence to support their decisions. Recently, researchers have been investigating these issues and developing tools to help users overcome these difficulties. In this chapter, we present some of the latest work related to human-guided ontology matching. Specifically, we discuss the cognitive difficulties users face with creating ontology matchings, the latest visual tools for assisting users with matching tasks, Web 2.0 approaches, common themes, challenges, and the next steps.
Sean M. Falconer, Natalya F. Noy

Chapter 3. Enhancing the Capabilities of Attribute Correspondences

In the process of schema matching, attribute correspondence is the association of attributes in different schemas. Increased importance of attribute correspondences led to new research attempts that were devoted to improve attribute correspondences by extending their capabilities. In this chapter, we describe recent advances in the schema matching literature that attempt to enhance the capabilities of attribute correspondences. We discuss contextual schema matching as a method for introducing conditional correspondences, based on context. The use of semantic matching is proposed to extend attribute correspondences to results in an ontological relationship. Finally, probabilistic schema matching generates multiple possible models, modeling uncertainty about which one is correct by using probability theory.
Avigdor Gal

Chapter 4. Uncertainty in Data Integration and Dataspace Support Platforms

Data integration has been an important area of research for several years. However, such systems suffer from one of the main drawbacks of database systems: the need to invest significant modeling effort upfront. Dataspace support platforms (DSSP) envision a system that offers useful services on its data without any setup effort and that improves with time in a pay-as-you-go fashion. We argue that to support DSSPs, the system needs to model uncertainty at its core. We describe the concepts of probabilistic mediated schemas and probabilistic mappings as enabling concepts for DSSPs.
Anish Das Sarma, Xin Luna Dong, Alon Y. Halevy

Quality-Driven Schema Mapping and Evolution


Chapter 5. Discovery and Correctness of Schema Mapping Transformations

Schema mapping is becoming pervasive in all data transformation, exchange, and integration tasks. It brings to the surface the problem of differences and mismatches between heterogeneous formats and models, respectively, used in source and target databases to be mapped one to another. In this chapter, we start by describing the problem of schema mapping, its background, and technical implications. Then, we outline the early schema mapping systems, along with the new generation of schema mapping tools. Moving from the former to the latter entailed a dramatic change in the performance of mapping generation algorithms. Finally, we conclude the chapter by revisiting the query answering techniques allowed by the mappings, and by discussing useful applications and future and current developments of schema mapping tools.
Angela Bonifati, Giansalvatore Mecca, Paolo Papotti, Yannis Velegrakis

Chapter 6. Recent Advances in Schema and Ontology Evolution

Schema evolution is the increasingly important ability to adapt deployed schemas to changing requirements. Effective support for schema evolution is challenging since schema changes may have to be propagated, correctly and efficiently, to instance data and dependent schemas, mappings, or applications. We introduce the major requirements for effective schema and ontology evolution, including support for a rich set of change operations, simplicity of change specification, evolution transparency (e.g., by providing and maintaining views or schema versions), automated generation of evolution mappings, and predictable instance migration that minimizes data loss and manual intervention. We then give an overview about the current state of the art and recent research results for the evolution of relational schemas, XML schemas, and ontologies. For numerous approaches, we outline how and to what degree they meet the introduced requirements.
Michael Hartung, James Terwilliger, Erhard Rahm

Chapter 7. Schema Mapping Evolution Through Composition and Inversion

Mappings between different representations of data are the essential building blocks for many information integration tasks. A schema mapping is a high-level specification of the relationship between two schemas, and represents a useful abstraction that specifies how the data from a source format can be transformed into a target format. The development of schema mappings is laborious and time consuming, even in the presence of tools that facilitate this development. At the same time, schema evolution inevitably causes the invalidation of the existing schema mappings (since their schemas change). Providing tools and methods that can facilitate the adaptation and reuse of the existing schema mappings in the context of the new schemas is an important research problem. In this chapter, we show how two fundamental operators on schema mappings, namely composition and inversion, can be used to address the mapping adaptation problem in the context of schema evolution. We illustrate the applicability of the two operators in various concrete schema evolution scenarios, and we survey the most important developments on the semantics, algorithms, and implementation of composition and inversion. We also discuss the main research questions that still remain to be addressed.
Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, Wang-Chiew Tan

Chapter 8. Mapping-Based Merging of Schemas

Merging schemas or other structured data occur in many different data models and applications, including merging ontologies, view integration, data integration, and computer supported collaborative work. This paper describes some of the key works in merging schemas and discusses some of the commonalities and differences.
Rachel Pottinger

Evaluating and Tuning of Matching Tasks


Chapter 9. On Evaluating Schema Matching and Mapping

The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led to some form of confusion regarding the exact nature, goals, core functionalities, expected features, and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users, since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers as they serve as illustrations of the existing system limitations, triggering further research in the area. In this work, we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques, and provide a description of the main functionalities and characteristics of existing systems.
Zohra Bellahsene, Angela Bonifati, Fabien Duchateau, Yannis Velegrakis

Chapter 10. Tuning for Schema Matching

Schema matching has long been heading towards complete automation. However, the difficulty arising from heterogeneity in the data sources, domain specificity or structure complexity has led to a plethora of semi-automatic matching tools. Besides, letting users the possibility to tune a tool also provides more flexibility, for instance to increase the matching quality. In the recent years, much work has been carried out to support users in the tuning process, specifically at higher levels. Indeed, tuning occurs at every step of the matching process. At the lowest level, similarity measures include internal parameters which directly impact computed similarity values. Furthermore, a common filter to present mappings to users are the thresholds applied to these values. At a mid-level, users can adopt one or more strategies according to the matching tool that they use. These strategies aim at combining similarity measures in an efficient way. Several tools support the users in this task, mainly by providing state-of-the-art graphical user interfaces. Automatically tuning a matching tool at this level is also possible, but this is limited to a few matching tools. The highest level deals with the choice of the matching tool. Due to the proliferation of these approaches, the first issue for the user is to find the one which would best satisfies his/her criteria. Although benchmarking available matching tools with datasets can be useful, we show that several approaches have been recently designed to solve this problem.
Zohra Bellahsene, Fabien Duchateau


Weitere Informationen

Premium Partner