Efficient resumption of interrupted warehouse loads

Authors:
Wilburt Juan Labio

Gigabeat, Inc. Palo Alto CA

Gigabeat, Inc. Palo Alto CA
View Profile

,
Janet L. Wiener

Compaq SRC, Palo Alto, CA

Compaq SRC, Palo Alto, CA
View Profile

,
Hector Garcia-Molina

Stanford University

Stanford University
View Profile

,
Vlad Gorelik

Sagent Technologies

Sagent Technologies
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 29 Issue 2June 2000pp 46–57https://doi.org/10.1145/335191.335379

Published:16 May 2000Publication History

ACM SIGMOD Record

Abstract

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.

References

1 P. A. Bernstein, M. Hsu, and B. Mann. Implementing Recoverable Requests Using Queues. In SIGMOD, pp. 112- 122, 1990. Google ScholarDigital Library
2 P.A. Bernstein and E. Newcomer. Principles of Transaction Processing. Morgan-Kaufman, 1997. Google ScholarDigital Library
3 F. Carino. High-performance, parallel warehouse servers and large-scale applications, Oct. 1997. Talk about Teradata given in Stanford Database Seminar.Google Scholar
4 TPC Committee. Transaction Processing Council. Available at: http://www.tpc.org/.Google Scholar
5 J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan-Kaufman, 1993. Google ScholarDigital Library
6 Informatica. Powermart 4.0 overview. Available at: http://www.informatica.com/pm_tech_over.html.Google Scholar
7 W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik. Resumption algorithms. Technical report, Stanford University, 1998. Available at http://wwwdb. stanford.edu/pub/papers/resume.ps.Google Scholar
8 C. Mohan and I. Narang. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates. In SIGMOD, pp. 361-370, 1992. Google ScholarDigital Library
9 R. Reinsch and M. Zimowski. Method for Restarting a Long- Running, Fault-Tolerant Operation in a Transaction-Oriented Data Base System Without Burdening the System Log. U.S. Patent 4,868,744, IBM, 1989.Google Scholar
10 Sagent Technologies. Personal correspondence with customers.Google Scholar
11 J. L. Wiener and J. F. Naughton. OODB Bulk Loading Revisited: The Partitioned-List Approach. In VLDB, pp. 30- 41, Zurich, Switzerland, 1995. Google ScholarDigital Library
12 A. Witkowski, F. Carifio, and P. Kostamaa. NCR 3700- The Next-Generation Industrial Database Computer. In VLDB, pp. 230-243, 1993. Google ScholarDigital Library

Index Terms

Efficient resumption of interrupted warehouse loads
1. Information systems
  1. Data management systems
    1. Database design and models
    2. Database management system engines

Recommendations

Efficient resumption of interrupted warehouse loads
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined ...
Read More
Alliance Rules for Data Warehouse Cleansing
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing Systems

Data Cleansing is an activity performed on the data sets of data warehouse to enhance and maintain the quality and consistency of the data. This paper addresses the problems related with dirty data, entrance of dirty data and detection of dirty data in ...
Read More
Data Warehouse: Extract, Transform, Load, Metadata, Data Integration, Data Mining, Data Warehouse Appliance, Database Management System, Decision Support System
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 29, Issue 2
June 2000
609 pages
ISSN:0163-5808
DOI:10.1145/335191
Editors:
Weidong Chen
Southern Methodist Univ., Dallas, TX
,
Jeffrey Naughton
Univ. of Wisconsin-Madison, Madison
,
Philip A. Bernstein
Microsoft
Issue’s Table of Contents
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Chairmen:
Maggie Dunham
Southern Methodist Univ.
,
Jeffrey F. Naughton
Univ. of Wisconsin-Madison
,
Weidong Chen
Southern Methodist Univ.
,
Nick Koudas
AT &T Labs
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 May 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 85
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient resumption of interrupted warehouse loads

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Efficient resumption of interrupted warehouse loads

Alliance Rules for Data Warehouse Cleansing

Data Warehouse: Extract, Transform, Load, Metadata, Data Integration, Data Mining, Data Warehouse Appliance, Database Management System, Decision Support System