skip to main content
article
Free Access

Efficient resumption of interrupted warehouse loads

Published:16 May 2000Publication History
Skip Abstract Section

Abstract

Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined transformations of the data (e.g., find duplicates, resolve data inconsistencies, and add unique keys). If the load fails, a possible approach is to “redo” the entire load. A better approach is to resume the incomplete load from where it was interrupted. Unfortunately, traditional algorithms for resuming the load either impose unacceptable overhead during normal operation, or rely on the specifics of transformations. We develop a resumption algorithm called DR that imposes no overhead and relies only on the high-level properties of the transformations. We show that DR can lead to a ten-fold reduction in resumption time by performing experiments using commercial software.

References

  1. 1 P. A. Bernstein, M. Hsu, and B. Mann. Implementing Recoverable Requests Using Queues. In SIGMOD, pp. 112- 122, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 P.A. Bernstein and E. Newcomer. Principles of Transaction Processing. Morgan-Kaufman, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 F. Carino. High-performance, parallel warehouse servers and large-scale applications, Oct. 1997. Talk about Teradata given in Stanford Database Seminar.Google ScholarGoogle Scholar
  4. 4 TPC Committee. Transaction Processing Council. Available at: http://www.tpc.org/.Google ScholarGoogle Scholar
  5. 5 J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan-Kaufman, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Informatica. Powermart 4.0 overview. Available at: http://www.informatica.com/pm_tech_over.html.Google ScholarGoogle Scholar
  7. 7 W. J. Labio, J. L. Wiener, H. Garcia-Molina, and V. Gorelik. Resumption algorithms. Technical report, Stanford University, 1998. Available at http://wwwdb. stanford.edu/pub/papers/resume.ps.Google ScholarGoogle Scholar
  8. 8 C. Mohan and I. Narang. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates. In SIGMOD, pp. 361-370, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 R. Reinsch and M. Zimowski. Method for Restarting a Long- Running, Fault-Tolerant Operation in a Transaction-Oriented Data Base System Without Burdening the System Log. U.S. Patent 4,868,744, IBM, 1989.Google ScholarGoogle Scholar
  10. 10 Sagent Technologies. Personal correspondence with customers.Google ScholarGoogle Scholar
  11. 11 J. L. Wiener and J. F. Naughton. OODB Bulk Loading Revisited: The Partitioned-List Approach. In VLDB, pp. 30- 41, Zurich, Switzerland, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 A. Witkowski, F. Carifio, and P. Kostamaa. NCR 3700- The Next-Generation Industrial Database Computer. In VLDB, pp. 230-243, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient resumption of interrupted warehouse loads

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 29, Issue 2
        June 2000
        609 pages
        ISSN:0163-5808
        DOI:10.1145/335191
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
          May 2000
          604 pages
          ISBN:1581132174
          DOI:10.1145/342009

        Copyright © 2000 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 May 2000

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader