skip to main content
article

ROLAP implementations of the data cube

Published:02 November 2007Publication History
Skip Abstract Section

Abstract

Implementation of the data cube is an important and scientifically interesting issue in On-Line Analytical Processing (OLAP) and has been the subject of a plethora of related publications. Naive implementation methods that compute each node separately and store the result are impractical, since they have exponential time and space complexity with respect to the cube dimensionality. To overcome this drawback, a wide range of methods that provide efficient cube implementation (with respect to both computation and storage) have been proposed, which make use of relational, multidimensional, or graph-based data structures. Furthermore, there are several other methods that compute and store approximate descriptions of data cubes, sacrificing accuracy for condensation. In this article, we focus on Relational-OLAP (ROLAP), following the majority of the efforts so far. We review existing ROLAP methods that implement the data cube and identify six orthogonal parameters/dimensions that characterize them. We place the existing techniques at the appropriate points within the problem space defined by these parameters and identify several clusters that the techniques form with various interesting properties. A careful study of these properties leads to the identification of particularly effective values for the space parameters and indicates the potential for devising new algorithms with better overall performance.

References

  1. Acharya, S., Gibbons, P. B., and Poosala, V. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 487--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J. F., Ramakrishnan, R., and Sarawagi, S. 1996. On the computation of multidimensional aggregates. In Proceedings of Very Large Data Bases (VLDB). 506--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agrawal, R., Mehta, M., Shafer, J. C., Srikant, R., Arning, A., and Bollinger, T. 1996. The quest data mining system. In Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD). 244--249.Google ScholarGoogle Scholar
  4. Baralis, E., Paraboschi, S., and Teniente, E. 1997. Materialized views selection in a multidimensional database. In Proceedings of Very Large Data Bases (VLDB). 156--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Barbará, D. and Sullivan, M. 1997. Quasi-cubes: Exploiting approximations in multidimensional databases. SIGMOD Record 26, 3, 12--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Barbará, D. and Sullivan, M. 1998. A space-efficient way to support approximate multidimensional databases. In Tech. Rep., ISSE-TR-98-03, George Mason University.Google ScholarGoogle Scholar
  7. Beyer, K. S. and Ramakrishnan, R. 1999. Bottom-up computation of sparse and iceberg cubes. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Record 26, 1, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Codd, E. F. 1970. A relational model of data for large shared data banks. Comm. ACM 13, 6, 377--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Feng, Y., Agrawal, D., Abbadi, A. E., and Metwally, A. 2004. Range cube: Efficient cube computation by exploiting data correlation. In Proceedings of International Conference on Data Engineering (ICDE). 658--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fu, L. and Hammer, J. 2000. Cubist: A new algorithm for improving the performance of ad-hoc olap queries. In ACM International Workshop on Data Warehousing and OLAP (DOLAP). 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gibbons, P. B. and Matias, Y. 1998. New sampling-based summary statistics for improving approximate query answers. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 331--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of International Conference on Data Engineering (ICDE). 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gunopulos, D., Kollios, G., Tsotras, V. J., and Domeniconi, C. 2000. Approximating multidimensional aggregate range queries over real attributes. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 463--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gupta, H. 1997. Selection of views to materialize in a data warehouse. In Proceedings of International Conference on Database Theory (ICDT). 98--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gupta, H., Harinarayan, V., Rajaraman, A., and Ullman, J. D. 1997. Index selection for olap. In Proceedings of International Conference on Data Engineering (ICDE). 208--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gupta, H. and Mumick, I. S. 1999. Selection of views to materialize under a maintenance cost constraint. In Proceedings of International Conference on Database Theory (ICDT). 453--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Haas, P. J., Naughton, J. F., Seshadri, S., and Stokes, L. 1995. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of Very Large Data Bases (VLDB). 311--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Han, J., Pei, J., Dong, G., and Wang, K. 2001. Efficient computation of iceberg cubes with complex measures. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Harinarayan, V., Rajaraman, A., and Ullman, J. D. 1996. Implementing data cubes efficiently. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ioannidis, Y. E. and Kang, Y. C. 1990. Randomized algorithms for optimizing large join queries. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 312--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Johnson, T. and Shasha, D. 1997. Some approaches to index design for cube forest. IEEE Data Eng. Bull. 20, 1, 27--35.Google ScholarGoogle Scholar
  23. Kalnis, P., Mamoulis, N., and Papadias, D. 2002. View selection using randomized search. Data Knowl. Eng. 42, 1, 89--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karayannidis, N., Sellis, T. K., and Kouvaras, Y. 2004. Cube file: A file structure for hierarchically clustered olap cubes. In Proceedings of International Conference on Extending Database Technology (EDBT). 621--638.Google ScholarGoogle Scholar
  25. Kotidis, Y. and Roussopoulos, N. 1998. An alternative storage organization for rolap aggregate views based on cubetrees. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 249--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kotidis, Y. and Roussopoulos, N. 1999. Dynamat: A dynamic view management system for data warehouses. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 371--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kotsis, N. 2000. Multidimensional aggregation in olap systems. PhD thesis. Department of Computer Science, University of Strathclyde.Google ScholarGoogle Scholar
  28. Kotsis, N. and McGregor, D. R. 2000. Elimination of redundant views in multidimensional aggregates. In Proceedings of Data Warehousing and Knowledge Discovery (DaWaK). 146--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lakshmanan, L. V. S., Pei, J., and Han, J. 2002. Quotient cube: How to summarize the semantics of a data cube. In Proceedings of Very Large Data Bases (VLDB). 778--789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lakshmanan, L. V. S., Pei, J., and Zhao, Y. 2003. Qc-trees: An efficient summary structure for semantic olap. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 64--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lawrence, M. 2006. Multiobjective genetic algorithms for materialized view selection in olap data warehouses. In Proceedings of Genetic and Evolutionary Computation Congress (GECCO). 699--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lee, M. and Hammer, J. 1999. Speeding up warehouse physical design using a randomized algorithm. In Proceedings of Design and Management of Data Warehouses (DMDW). 3.Google ScholarGoogle Scholar
  33. Liang, W., Wang, H., and Orlowska, M. E. 2001. Materialized view selection under the maintenance time constraint. Data Knowl. Eng. 37, 2, 203--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Morfonios, K. and Ioannidis, Y. 2006a. Cure for cubes: Cubing using a rolap engine. In Proceedings of Very Large Data Bases (VLDB). 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Morfonios, K. and Ioannidis, Y. 2006b. Supporting the data cube lifecycle: The power of ROLAP. Int. J. VLDB. http//dx.doi.org/10.1007/s00778-006-0036-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nadeau, T. P. and Teorey, T. J. 2002. Achieving scalability in olap materialized view selection. In Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP). 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Poosala, V. and Ganti, V. 1999. Fast approximate answers to aggregate queries on a data cube. In Proceedings of International Conference on Statistical and Scientific Database Management (SSDBM). 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ross, K. A. and Srivastava, D. 1997. Fast computation of sparse datacubes. In Proceedings of Very Large Data Bases (VLDB). 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Roussopoulos, N., Kotidis, Y., and Roussopoulos, M. 1997. Cubetree: Organization of and bulk updates on the data cube. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 89--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shanmugasundaram, J., Fayyad, U. M., and Bradley, P. S. 1999. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD). 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shao, Z., Han, J., and Xin, D. 2004. Mm-cubing: Computing iceberg cubes by factorizing the lattice space. In Proceedings of International Conference on Scientific and Statistical Database Management (SSDBM). 213--222. Google ScholarGoogle ScholarCross RefCross Ref
  42. Shukla, A., Deshpande, P., and Naughton, J. F. 1998. Materialized view selection for multidimensional datasets. In Proceedings of Very Large Data Bases (VLDB). 488--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shukla, A., Deshpande, P., Naughton, J. F., and Ramasamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. In Proceedings of Very Large Data Bases (VLDB). 522--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sismanis, Y., Deligiannakis, A., Roussopoulos, N., and Kotidis, Y. 2002. Dwarf: shrinking the petacube. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 464--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sismanis, Y. and Roussopoulos, N. 2004. The complexity of fully materialized coalesced cubes. In Proceedings of Very Large Data Bases (VLDB). 540--551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Theodoratos, D., Dalamagas, T., Simitsis, A., and Stavropoulos, M. 2001. A randomized approach for the incremental design of an evolving data warehouse. In Proceedings of the 20th International Conference on Conceptual Modeling (ER'01). 325--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Theodoratos, D. and Sellis, T. K. 1997. Data warehouse configuration. In Proceedings of Very Large Data Bases (VLDB). 126--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Uchiyama, H., Runapongsa, K., and Teorey, T. J. 1999. A progressive view materialization algorithm. In Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP). 36--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Vitter, J. S., Wang, M., and Iyer, B. R. 1998. Data cube approximation and histograms via wavelets. In Proceedings of International Conference on Information and Knowledge Management (CIKM). 96--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wang, W., Feng, J., Lu, H., and Yu, J. X. 2002. Condensed cube: An efficient approach to reducing data cube size. In Proceedings of International Conference on Data Engineering (ICDE). 155--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xin, D., Han, J., Li, X., and Wah, B. W. 2003. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proceedings of Very Large Data Bases (VLDB). 476--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yu, J. X., Yao, X., Choi, C.-H., and Gou, G. 2003. Materialized view selection as constrained evolutionary optimization. IEEE Trans. Syst., Man, Cybernetics, Part C 33, 4, 458--467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Zhao, Y., Deshpande, P., and Naughton, J. F. 1997. An array-based algorithm for simultaneous multidimensional aggregates. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 159--170. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ROLAP implementations of the data cube

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 39, Issue 4
      2007
      134 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/1287620
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2007
      Published in csur Volume 39, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader