Abstract
We consider the view data lineageproblem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formally define the lineage problem, develop lineage tracing algorithms for relational views with aggregation, and propose mechanisms for performing consistent lineage tracing in a multisource data warehousing environment. Our result can form the basis of a tool that allows analysts to browse warehouse data, select view tuples of interest, and then “drill-through” to examine the exact source tuples that produced the view tuples of interest.
- BANCILHON,F.AND SPYRATOS, N. 1981. Update semantics of relational views. ACM Trans. Database Syst. 6, 4, 557-575. Google ScholarDigital Library
- BROADBASE SOFTWARE,INC. 1999. http://www.broadbase.com/.Google Scholar
- CHAUDHURI,S.AND DAYAL, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1, 65-74. Google ScholarDigital Library
- CUI,Y.AND WIDOM, J. 2000a. Lineage tracing in a data warehousing system. In Proceedings of the 16th International Conference on Data Engineering (San Diego, CA, Feb.). Google ScholarDigital Library
- CUI,Y.AND WIDOM, J. 2000b. Practical lineage tracing in data warehouses. In Proceedings of the 16th International Conference on Data Engineering (San Diego, CA, Feb.). Google ScholarDigital Library
- CUI,Y.AND WIDOM, J. 1999. Storing auxiliary data for efficient maintenance and lineage tracing of complex views. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/pub/papers/auxview.ps.Google Scholar
- CUI, Y., WIDOM, J., AND WIENER, J. L. 1997. Tracing the lineage of view data in a warehousing environment. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/ pub/papers/lineage-full.ps.Google Scholar
- DAYAL,U.AND BERNSTEIN, P. A. 1978. On the updatability of relational views. In Proceedings of the 4th International Conference on Very Large Data Bases (Berlin, Germany, Sept. 13-15). 368-377.Google Scholar
- FALOUTSOS, C., JAGADISH,H.V.,AND SIDIROPOULOS, N. D. 1997. Recovering information from summary data. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97, Athens, Greece, Aug.). 36-45. Google ScholarDigital Library
- GRAY, J., BOSWORTH, A., LAYMAN, A., AND PIRAHESH, H. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 12th IEEE International Conference on Data Engineering (New Orleans, LA, Feb. 1997). IEEE Press, Piscataway, NJ, 152-159. Google ScholarDigital Library
- GUPTA, H. 1997. Selection of views to materialize in a data warehouse. In Proceedings of the 6th International Conference on Database Theory (ICDT '97, Delphi, Greece, Jan. 9-10). Springer-Verlag, Berlin, Germany, 98-112. Google ScholarDigital Library
- GUPTA, A., JAGADISH, H., AND MUMICK, I. S. 1996. Data integration using self-maintainable views. In Proceedings of the Fifth International Conference on Extending Database Technol-ogy (Avignon, France). 140-144. Google ScholarDigital Library
- GUPTA, A., HARINARAYAN, V., AND QUASS, D. 1995. Aggregate-query processing in data warehousing environments. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95, Zurich, Sept.). 358-369. Google ScholarDigital Library
- HACHEM,N.I.,QIU, K., GENNERT, M., AND WARD, M. 1993. Managing derived data in the Gaea scientific DBMS. In Proceedings of the Conference on Very Large Data Bases (VLDB '93, Dublin, Ireland, Aug.). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1-12. Google ScholarDigital Library
- HAN, J., CHEE, S., AND CHIAN, J. Y. 1998. Issues for on-line analytical mining of data warehouses. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (Seattle, WA, June '98).Google Scholar
- HULL,R.AND ZHOU, G. 1996. A framework for supporting data integration using the materialized and virtual approaches. SIGMOD Rec. 25, 2, 481-492. Google ScholarDigital Library
- KAWAGUCHI, A., LIEUWEN,D.F.,MUMICK,I.S.,QUASS, D., AND ROSS, Q. A. 1997. Concurrency control theory for deferred materialized views. In Proceedings of the 6th International Conference on Database Theory (ICDT '97, Delphi, Greece, Jan. 9-10). Springer-Verlag, Berlin, Germany, 306-320. Google ScholarDigital Library
- LABIO,W.J.,QUASS, D., AND ADELBERG, B. 1997. Physical database design for data warehousing. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 277-288. Google ScholarDigital Library
- LABIO,W.J.,YANG, J., CUI, Y., GARCIA-MOLINA, H., AND WIDOM, J. 1999. Performance issues in incremental warehouse maintenance. Tech. Rep. Stanford University, Stanford, CA. http://www-db.stanford.edu/pub/papers/whips-wm.ps.Google Scholar
- MICROSOFT. 1999. Microsoft SQL server: Data transformation services. Microsoft Press, Redmond, WA. MSDN Online Library, http://msdn.microsoft.com/library/psdk/sql/dts_ovrw.htmGoogle Scholar
- QUASS, D., GUPTA, A., MUMICK,I.S.,AND WIDOM, J. 1996. Making views self-maintainable for data warehousing. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec. '96). 158-169. Google ScholarDigital Library
- SAGENT TECHNOLOGY. 1999. Sagent Technology. http://www.sagent.com/Google Scholar
- STONEBRAKER, M. 1975. Implementation of integrity constraints and views by query modification. In Proceedings of the ACM SIGMOD International Conference on Management of Data (San Jose, CA, May). 65-78. Google ScholarDigital Library
- ULLMAN, J. D. 1989. Database and Knowledge-Base Systems. Computer Science Press, Inc., New York, NY. Google ScholarDigital Library
- WIENER,J.L.,GUPTA, H., LABIO,W.J.,ZHUGE, Y., GARCIA-MOLINA, H., AND WIDOM, J. 1996. A system prototype for warehouse view maintenance. In Proceedings of the Workshop on Materialized Views: Techniques and Applications (Montreal, Canada, June). 26-33.Google Scholar
- WIDOM, J. 1995. Research problems in data warehousing. In Proceedings of the 1995 International Conference on Information and Knowledge Management (CIKM, Baltimore, MD, Nov. 28-Dec. 2), N. Pissinou, A. Silberschatz, E. K. Park, and K. Makki, Eds. ACM Press, New York, NY, 25-30. Google ScholarDigital Library
- WOODRUFF,A.AND STONEBRAKER, M. 1997. Supporting fine-grained data lineage in a database visualization environment. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 91-102. Google ScholarDigital Library
- ZHUGE, Y., WIENER,J.L.,AND GARCIA-MOLINA, H. 1997. Multiple view consistency for data warehousing. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, UK, Apr.). IEEE Computer Society, Washington, DC, 289-300. Google ScholarDigital Library
- ZHUGE, Y., GARCIA-MOLINA, H., AND WIENER, J. L. 1996. The Strobe algorithms for multi-source warehouse consistency. In Proceedings of the Fourth International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec. '96). 146-157. Google ScholarDigital Library
Index Terms
- Tracing the lineage of view data in a warehousing environment
Recommendations
Selecting Materialized Views Based on Top-k Query Algorithm for Lineage Tracing
GCIS '12: Proceedings of the 2012 Third Global Congress on Intelligent SystemsLineage tracing queries help to locate updated views quickly in data warehouse. Materialized views can improve the efficiency of the data lineage tracing and view maintenance. This paper, a method to select materialized views using Top-k query algorithm ...
A comprehensive study of view maintenance approaches in data warehousing evolution
A data warehouse mainly stores integrated information over data from many different remote data sources for query and analysis. The integrated information at the data warehouse is stored in the form of materialized views. Using these materialized views, ...
Lineage Tracing in a Data Warehousing System
ICDE '00: Proceedings of the 16th International Conference on Data EngineeringA data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. In many cases, the ...
Comments