Skip to main content

2014 | Buch

Big Data Integration Theory

Theory and Methods of Database Mappings, Programming Languages, and Semantics

insite
SUCHEN

Über dieses Buch

This book presents a novel approach to database concepts, describing a categorical logic for database schema mapping based on views, within a framework for database integration/exchange and peer-to-peer. Database mappings, database programming languages, and denotational and operational semantics are discussed in depth. An analysis method is also developed that combines techniques from second order logic, data modeling, co-algebras and functorial categorial semantics. Features: provides an introduction to logics, co-algebras, databases, schema mappings and category theory; describes the core concepts of big data integration theory, with examples; examines the properties of the DB category; defines the categorial RDB machine; presents full operational semantics for database mappings; discusses matching and merging operators for databases, universal algebra considerations and algebraic lattices of the databases; explores the relationship of the database weak monoidal topos w.r.t. intuitionistic logic.

Inhaltsverzeichnis

Frontmatter
1. Introduction and Technical Preliminaries
Abstract
Chapter 1 is a formal and short introduction to different topics and concepts: logics, (co)algebras, databases, schema mappings and category theory, in order to render this monograph more self-contained. It is important also due to the fact that usually the database experts do a lot with logics and relational algebras, but much less with programming languages (their denotational and operational semantics) and still much less with categorial semantics. For the experts in programming languages and category theory, having more information on FOL and its extensions used for the database theory will be useful.
Zoran Majkić
2. Composition of Schema Mappings: Syntax and Semantics
Abstract
In Chap. 2, the formal logical framework for the schema mappings is defined, based on the second-order tuple generating dependencies (SOtgds), with existentially quantified functional symbols. Each tgd is a material implication from the conjunctive formula (with relational symbols of a source schema, preceded with negation as well) into a particular relational symbol of the target schema. It provides a number of algorithms which transform these logical formulae into the algebraic structure based on the theory of R-operads. The schema database integrity constraints are transformed in a similar way so that both the schema mappings and schema integrity-constraints are formally represented by R-operads. Then the compositional properties are explored, in order to represent a database mapping system as a graph where the nodes are the database schemas and the arrows are the schema mappings or the integrity-constraints for schemas. This representation is used to define the database mapping sketches (small categories), based on the fact that each schema has an identity arrow (mapping) and that the mapping-arrows satisfy the associative low for the composition of them.
Zoran Majkić
3. Definition of DB Category
Abstract
Chapter 3 provides the basic results of this theory, including the definition of the DB category as a denotational semantics for the schema database mappings. The objects of this category are the instance-databases (composed of the relational tables and an empty relation ⊥) and every arrow is just a set of functions (mapping-interpretations defined in Chap. 2) from the set of relations of the source object (a source database) into a particular relation of the target object (a target database).
The power-view endofunctor T: DBDB is an extension of the power-view operation for a database to morphisms as well. For a given database instance A, TA is the set of all views (which can be obtained by Select–Project–Rename–Join–Union (SPRJU) statements) of this database A. The Data Federation and Data Separation operators for the databases and a partial ordering and the strong (behavioral) and weak equivalences for the databases are introduced.
Zoran Majkić
4. Functorial Semantics for Database Schema Mappings
Abstract
In Chap. 4, the categorial functorial semantics of database mappings is defined also for the database integrity constraints. In Sect. 4.2, we present the applications of this theory to data integration/exchange systems with an example for query-rewriting in GAV data integration system with (foreign) key integrity constraints, based on a coalgebra semantics. In the final section, a fixpoint operator for an infinite canonical solution in data integration/exchange systems is defined.
With this chapter we conclude the first part of this book. It is intended for all readers because it contains the principal results of the data integration theory, with a minimal introduction of necessary concepts in schema mappings based on SOtgds, their algebraization resulting in the DB category and the categorical semantics based on functors. A number of applications are given in order to obtain a clear view of these introduced concepts, especially for database experts who have not worked with categorical semantics.
Zoran Majkić
5. Extensions of Relational Codd’s Algebra and DB Category
Abstract
In Chap. 5, we consider the extensions of Codd’s SPRJU relational algebra Σ R and their relationships with the internal algebra of the category DB. Then we show that the computational power of the DB category (used as denotational semantics for database-mapping programs) is equivalent to the Σ RE relational algebra, which extends the Σ R algebra with all update operations for relations, and which is implemented as SQL statements in the software programming. We introduce an “action” category RA where each tree-term of the Σ RE relational algebra is equivalently represented by a single path term (an arrow in this category) which, applied to its source object, returns its target object. The arrows of this action category will be represented as the Application Plans in the abstract categorical RDB machines, in Chap. 5, during the executions of the embedded SQL statements.
Zoran Majkić
6. Categorial RDB Machines
Abstract
Chapter 6 is a continuation of Chap. 5 and is dedicated to computation systems and categorial RDB machines able to support all computations in the DB category (by translations of the arrows of the action category RA, represented by the Application Plans of the RDB machine, into the morphisms of the database category DB). The embedding of SQL into general purpose programs, synchronization process for execution of SQL statements as morphisms in the DB category, and transaction recovery are presented in a unifying categorial framework. In particular, we consider the concurrent categorial RDB machines able to support the time-shared “parallel” execution of several user programs.
Zoran Majkić
7. Operational Semantics for Database Mappings
Abstract
Chapter 7 provides a complete framework for the operational semantics for database-mapping programs, based on final coalgebraic semantics (dual of the initial algebraic semantics introduced in Chap. 5 for the syntax monads (programming languages), and completed in this chapter) of the database-mapping programs. We introduce an observational comonad for the final coalgebra operational semantics and explain the duality for the database mapping programs: specification versus solution. The relationship between initial algebras (denotational semantics) and final coalgebras (operational semantics) and their semantic adequateness is then presented in the last section. Thus, Chaps. 5, 6 and 7 present second part of this book, dedicated to the syntax (specification) and semantics (solutions) of database-mapping programs.
Zoran Majkić
8. The Properties of DB Category
Abstract
Chapter 8 analyzes advanced features of DB category: matching and merging operators (tensors) for databases, and presents universal algebra considerations and algebraic lattice of the databases. It is demonstrated that the DB category is not a Cartesian Closed Category (CCC) and hence it is not an elementary topos, so that its computational capabilities are strictly inferior to those of typed λ-calculus (as more precisely demonstrated in Chap. 5). It is demonstrated that DB is a V-category enriched over itself.
Finally, we present the inductive principle for objects and the coinductive principle for arrows in the DB category, and demonstrate that its “computation” Kleisly category is embedded into the database DB category by a faithful forgetful functor.
Zoran Majkić
9. Weak Monoidal DB Topos
Abstract
Chapter 9 considers the topological properties of the DB category: the database metric space, its Subobject classifier, and we demonstrate that DB is a weak monoidal topos. It is shown that DB is monoidal biclosed, finitely complete and cocomplete, locally small and locally finitely presentable category with hom-objects (“exponentiations”) and a subobject classifier. It is well known that the intuitionistic logic is a logic of an elementary (standard) topos. However, we obtain that DB is not an elementary, but a weak monoidal topos. We obtain that in the specific case when the universe of database values is a finite set this logic corresponds to the standard propositional logic. This is the case when the database-mapping system is completely specified by the FOL. However, in the case when we deal with incomplete information and hence we obtain the SOtgds with existentially quantified Skolem functions and our universe must include the infinite set of distinct Skolem constants (for recursive schema-mapping or schema integrity constraints), our logic is then an intermediate or superintuitionistic logic in which the weak excluded middle formula ¬ϕ∨¬¬ϕ is valid. Thus, this weak monoidal topos of DB has more theorems than intuitionistic logic but less than the standard propositional logic.
Zoran Majkić
Backmatter
Metadaten
Titel
Big Data Integration Theory
verfasst von
Zoran Majkić
Copyright-Jahr
2014
Electronic ISBN
978-3-319-04156-8
Print ISBN
978-3-319-04155-1
DOI
https://doi.org/10.1007/978-3-319-04156-8