skip to main content
article
Free Access

Tracing the lineage of view data in a warehousing environment

Published:01 June 2000Publication History
Skip Abstract Section

Abstract

We consider the view data lineageproblem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formally define the lineage problem, develop lineage tracing algorithms for relational views with aggregation, and propose mechanisms for performing consistent lineage tracing in a multisource data warehousing environment. Our result can form the basis of a tool that allows analysts to browse warehouse data, select view tuples of interest, and then “drill-through” to examine the exact source tuples that produced the view tuples of interest.

References

  1. BANCILHON,F.AND SPYRATOS, N. 1981. Update semantics of relational views. ACM Trans. Database Syst. 6, 4, 557-575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BROADBASE SOFTWARE,INC. 1999. http://www.broadbase.com/.Google ScholarGoogle Scholar
  3. CHAUDHURI,S.AND DAYAL, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1, 65-74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CUI,Y.AND WIDOM, J. 2000a. Lineage tracing in a data warehousing system. In Proceedings of the 16th International Conference on Data Engineering (San Diego, CA, Feb.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CUI,Y.AND WIDOM, J. 2000b. Practical lineage tracing in data warehouses. In Proceedings of the 16th International Conference on Data Engineering (San Diego, CA, Feb.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CUI,Y.AND WIDOM, J. 1999. Storing auxiliary data for efficient maintenance and lineage tracing of complex views. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/pub/papers/auxview.ps.Google ScholarGoogle Scholar
  7. CUI, Y., WIDOM, J., AND WIENER, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/ pub/papers/lineage-full.ps.Google ScholarGoogle Scholar
  8. DAYAL,U.AND BERNSTEIN, P. A. 1978. On the updatability of relational views. In Proceedings of the 4th International Conference on Very Large Data Bases (Berlin, Germany, Sept. 13-15). 368-377.Google ScholarGoogle Scholar
  9. FALOUTSOS, C., JAGADISH,H.V.,AND SIDIROPOULOS, N. D. 1997. Recovering information from summary data. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97, Athens, Greece, Aug.). 36-45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GRAY, J., BOSWORTH, A., LAYMAN, A., AND PIRAHESH, H. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 12th IEEE International Conference on Data Engineering (New Orleans, LA, Feb. 1997). IEEE Press, Piscataway, NJ, 152-159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. GUPTA, H. 1997. Selection of views to materialize in a data warehouse. In Proceedings of the 6th International Conference on Database Theory (ICDT '97, Delphi, Greece, Jan. 9-10). Springer-Verlag, Berlin, Germany, 98-112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GUPTA, A., JAGADISH, H., AND MUMICK, I. S. 1996. Data integration using self-maintainable views. In Proceedings of the Fifth International Conference on Extending Database Technol-ogy (Avignon, France). 140-144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. GUPTA, A., HARINARAYAN, V., AND QUASS, D. 1995. Aggregate-query processing in data warehousing environments. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95, Zurich, Sept.). 358-369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HACHEM,N.I.,QIU, K., GENNERT, M., AND WARD, M. 1993. Managing derived data in the Gaea scientific DBMS. In Proceedings of the Conference on Very Large Data Bases (VLDB '93, Dublin, Ireland, Aug.). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HAN, J., CHEE, S., AND CHIAN, J. Y. 1998. Issues for on-line analytical mining of data warehouses. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (Seattle, WA, June '98).Google ScholarGoogle Scholar
  16. HULL,R.AND ZHOU, G. 1996. A framework for supporting data integration using the materialized and virtual approaches. SIGMOD Rec. 25, 2, 481-492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. KAWAGUCHI, A., LIEUWEN,D.F.,MUMICK,I.S.,QUASS, D., AND ROSS, Q. A. 1997. Concurrency control theory for deferred materialized views. In Proceedings of the 6th International Conference on Database Theory (ICDT '97, Delphi, Greece, Jan. 9-10). Springer-Verlag, Berlin, Germany, 306-320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. LABIO,W.J.,QUASS, D., AND ADELBERG, B. 1997. Physical database design for data warehousing. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 277-288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. LABIO,W.J.,YANG, J., CUI, Y., GARCIA-MOLINA, H., AND WIDOM, J. 1999. Performance issues in incremental warehouse maintenance. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/pub/papers/whips-wm.ps.Google ScholarGoogle Scholar
  20. MICROSOFT. 1999. Microsoft SQL server: Data transformation services. Microsoft Press, Redmond, WA. MSDN Online Library, http://msdn.microsoft.com/library/psdk/sql/dts_ovrw.htmGoogle ScholarGoogle Scholar
  21. QUASS, D., GUPTA, A., MUMICK,I.S.,AND WIDOM, J. 1996. Making views self-maintainable for data warehousing. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec. '96). 158-169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. SAGENT TECHNOLOGY. 1999. Sagent Technology. http://www.sagent.com/Google ScholarGoogle Scholar
  23. STONEBRAKER, M. 1975. Implementation of integrity constraints and views by query modification. In Proceedings of the ACM SIGMOD International Conference on Management of Data (San Jose, CA, May). 65-78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ULLMAN, J. D. 1989. Database and Knowledge-Base Systems. Computer Science Press, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. WIENER,J.L.,GUPTA, H., LABIO,W.J.,ZHUGE, Y., GARCIA-MOLINA, H., AND WIDOM, J. 1996. A system prototype for warehouse view maintenance. In Proceedings of the Workshop on Materialized Views: Techniques and Applications (Montreal, Canada, June). 26-33.Google ScholarGoogle Scholar
  26. WIDOM, J. 1995. Research problems in data warehousing. In Proceedings of the 1995 International Conference on Information and Knowledge Management (CIKM, Baltimore, MD, Nov. 28-Dec. 2), N. Pissinou, A. Silberschatz, E. K. Park, and K. Makki, Eds. ACM Press, New York, NY, 25-30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. WOODRUFF,A.AND STONEBRAKER, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 91-102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. ZHUGE, Y., WIENER,J.L.,AND GARCIA-MOLINA, H. 1997. Multiple view consistency for data warehousing. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 289-300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. ZHUGE, Y., GARCIA-MOLINA, H., AND WIENER, J. L. 1996. The Strobe algorithms for multi-source warehouse consistency. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec. '96). 146-157. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tracing the lineage of view data in a warehousing environment

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 25, Issue 2
      June 2000
      140 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/357775
      Issue’s Table of Contents

      Copyright © 2000 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2000
      Published in tods Volume 25, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader