Skip to main content

2000 | Buch

Fundamentals of Data Warehouses

verfasst von: Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis

Verlag: Springer Berlin Heidelberg

insite
SUCHEN

Über dieses Buch

Data warehouses have captured the attention of practitioners and researchers alike. But the design and optimization of data warehouses remains an art rather than a science. This book presents the first comparative review of the state of the art and best current practice of data warehouses. It covers source and data integration, multidimensional aggregation, query optimization, update propagation, metadata management, quality assessment, and design optimization. Also, based on results of the European Data Warehouse Quality project, it offers a conceptual framework by which the architecture and quality of data warehouse efforts can be assessed and improved using enriched metadata management combined with advanced techniques from databases, business modeling, and artificial intelligence. For researchers and database professionals in academia and industry, the book offers an excellent introduction to the issues of quality and metadata usage in the context of data warehouses.

Inhaltsverzeichnis

Frontmatter
1. Data Warehouse Practice: An Overview
Abstract
Since the beginning of data warehousing in the early 1990s, an informal consensus has been reached concerning the major terms and components involved in data warehousing. In this chapter, we first explain the main terms and components. Data warehouse vendors are pursuing different strategies in supporting this basic framework. We review a few of the major product families and show in the next chapter a brief survey of the basic problem areas data warehouse practice and research is faced with today. These issues are then treated in more depth in the remainder of this book.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
2. Data Warehouse Research: Issues and Projects
Abstract
In the previous chapter, we have given a broad-brush state of the practice in data warehousing. In this chapter, we look at more or less the same issues again, focusing, however, on problems rather than solutions. Each of the topics we address is covered in the following chapters. In Section 2.6, we briefly review some larger research projects which address more than one of the issues and will therefore be cited in several places throughout the book. Finally, Section 2.7 takes a critical overall look at this work and introduces the DWQ conceptual framework which takes the business perspective of data warehousing into account as well as the so far dominant technical aspects.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
3. Source Integration
Abstract
According to [Inmo96], integration is the most important aspect of a data warehouse. When data passes from the application-oriented operational environment to the data warehouse, possible inconsistencies and redundancies should be resolved, so that the warehouse is able to provide an integrated and reconciled view of data of the organization.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
4. Data Warehouse Refreshment
Abstract
The central problem addressed in this chapter is the refreshment of a data warehouse in order to reflect the changes that have occurred in the sources from which the data warehouse is defined. The possibility of having “fresh data” in a warehouse, is a key factor for success in business applications. In many activities, such as in retailing, business applications rely on the proper refreshment of their warehouses. For instance, [Jahn96] mentions the case of WalMart, the world’s most successful retailer. Many of WalMart’s large volume suppliers, such as Procter & Gamble, have direct access to the WalMart data warehouse, so they deliver goods to specific stores as needed. WalMart pays such companies for their products only when it is sold. Procter & Gamble ships 40% of its items in this way, eliminating paperwork and sale calls on both sides. It is essential for the supplier to use fresh data in order to establish accurate shipment plans and to know how much money is due from the retailer. Another example is Casino Supermarche, in France, which recouped several millions dollars when they noticed that Coca-Cola was often out of stock in many of their stores. Freshness of data does not necessarily refer to the highest currency but the currency required by the users. Clearly, applications have different requirements with respect to the freshness of data.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
5. Multidimensional Data Models and Aggregation
Abstract
This chapter is devoted to the modeling of multidimensional information in the context of data warehousing and knowledge representation, with a particular emphasis on the operation of aggregation.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
6. Query Processing and Optimization
Abstract
The ultimate purpose of a data warehouse is to support queries by end users who want to analyze the available information for an organization. However, from a more abstract point of view, queries are not only processed at the data warehouse back end.
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
7. Metadata and Data Warehouse Quality
Abstract
In the traditional view, data warehouses provide large-scale caches of historic data. They sit between information sources gained externally or through online transaction processing systems (OLTP), and decision support or data mining queries following the vision of online analytic processing (OLAP). Three main arguments have been put forward in favor of this caching approach:
1.
Performance and safety considerations: The concurrency control methods of most DBMS do not react well to a mix of short update transactions (as in OLTP) and OLAP queries that typically search a large portion of the database. Moreover, the OLTP systems are often critical for the operation of the organization and must not be in danger of corruption by other applications.
 
2.
Logical interpretability problems: Inspired by the success of spreadsheet techniques, OLAP users tend to think in terms of highly structured multidimensional data models, whereas information sources offer at best relational, often just semi-structured data models.
 
3.
Temporal and granularity mismatch: OLTP systems focus on current operational support in great detail, whereas OLAP often considers historical developments at a somewhat less detailed granularity.
 
Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, Panos Vassiliadis
Backmatter
Metadaten
Titel
Fundamentals of Data Warehouses
verfasst von
Matthias Jarke
Maurizio Lenzerini
Yannis Vassiliou
Panos Vassiliadis
Copyright-Jahr
2000
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-04138-3
Print ISBN
978-3-662-04140-6
DOI
https://doi.org/10.1007/978-3-662-04138-3