Abstract
Implementation of the data cube is an important and scientifically interesting issue in On-Line Analytical Processing (OLAP) and has been the subject of a plethora of related publications. Naive implementation methods that compute each node separately and store the result are impractical, since they have exponential time and space complexity with respect to the cube dimensionality. To overcome this drawback, a wide range of methods that provide efficient cube implementation (with respect to both computation and storage) have been proposed, which make use of relational, multidimensional, or graph-based data structures. Furthermore, there are several other methods that compute and store approximate descriptions of data cubes, sacrificing accuracy for condensation. In this article, we focus on Relational-OLAP (ROLAP), following the majority of the efforts so far. We review existing ROLAP methods that implement the data cube and identify six orthogonal parameters/dimensions that characterize them. We place the existing techniques at the appropriate points within the problem space defined by these parameters and identify several clusters that the techniques form with various interesting properties. A careful study of these properties leads to the identification of particularly effective values for the space parameters and indicates the potential for devising new algorithms with better overall performance.
- Acharya, S., Gibbons, P. B., and Poosala, V. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 487--498. Google ScholarDigital Library
- Agarwal, S., Agrawal, R., Deshpande, P., Gupta, A., Naughton, J. F., Ramakrishnan, R., and Sarawagi, S. 1996. On the computation of multidimensional aggregates. In Proceedings of Very Large Data Bases (VLDB). 506--521. Google ScholarDigital Library
- Agrawal, R., Mehta, M., Shafer, J. C., Srikant, R., Arning, A., and Bollinger, T. 1996. The quest data mining system. In Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD). 244--249.Google Scholar
- Baralis, E., Paraboschi, S., and Teniente, E. 1997. Materialized views selection in a multidimensional database. In Proceedings of Very Large Data Bases (VLDB). 156--165. Google ScholarDigital Library
- Barbará, D. and Sullivan, M. 1997. Quasi-cubes: Exploiting approximations in multidimensional databases. SIGMOD Record 26, 3, 12--17. Google ScholarDigital Library
- Barbará, D. and Sullivan, M. 1998. A space-efficient way to support approximate multidimensional databases. In Tech. Rep., ISSE-TR-98-03, George Mason University.Google Scholar
- Beyer, K. S. and Ramakrishnan, R. 1999. Bottom-up computation of sparse and iceberg cubes. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 359--370. Google ScholarDigital Library
- Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Record 26, 1, 65--74. Google ScholarDigital Library
- Codd, E. F. 1970. A relational model of data for large shared data banks. Comm. ACM 13, 6, 377--387. Google ScholarDigital Library
- Feng, Y., Agrawal, D., Abbadi, A. E., and Metwally, A. 2004. Range cube: Efficient cube computation by exploiting data correlation. In Proceedings of International Conference on Data Engineering (ICDE). 658--670. Google ScholarDigital Library
- Fu, L. and Hammer, J. 2000. Cubist: A new algorithm for improving the performance of ad-hoc olap queries. In ACM International Workshop on Data Warehousing and OLAP (DOLAP). 72--79. Google ScholarDigital Library
- Gibbons, P. B. and Matias, Y. 1998. New sampling-based summary statistics for improving approximate query answers. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 331--342. Google ScholarDigital Library
- Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of International Conference on Data Engineering (ICDE). 152--159. Google ScholarDigital Library
- Gunopulos, D., Kollios, G., Tsotras, V. J., and Domeniconi, C. 2000. Approximating multidimensional aggregate range queries over real attributes. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 463--474. Google ScholarDigital Library
- Gupta, H. 1997. Selection of views to materialize in a data warehouse. In Proceedings of International Conference on Database Theory (ICDT). 98--112. Google ScholarDigital Library
- Gupta, H., Harinarayan, V., Rajaraman, A., and Ullman, J. D. 1997. Index selection for olap. In Proceedings of International Conference on Data Engineering (ICDE). 208--219. Google ScholarDigital Library
- Gupta, H. and Mumick, I. S. 1999. Selection of views to materialize under a maintenance cost constraint. In Proceedings of International Conference on Database Theory (ICDT). 453--470. Google ScholarDigital Library
- Haas, P. J., Naughton, J. F., Seshadri, S., and Stokes, L. 1995. Sampling-based estimation of the number of distinct values of an attribute. In Proceedings of Very Large Data Bases (VLDB). 311--322. Google ScholarDigital Library
- Han, J., Pei, J., Dong, G., and Wang, K. 2001. Efficient computation of iceberg cubes with complex measures. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 1--12. Google ScholarDigital Library
- Harinarayan, V., Rajaraman, A., and Ullman, J. D. 1996. Implementing data cubes efficiently. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 205--216. Google ScholarDigital Library
- Ioannidis, Y. E. and Kang, Y. C. 1990. Randomized algorithms for optimizing large join queries. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 312--321. Google ScholarDigital Library
- Johnson, T. and Shasha, D. 1997. Some approaches to index design for cube forest. IEEE Data Eng. Bull. 20, 1, 27--35.Google Scholar
- Kalnis, P., Mamoulis, N., and Papadias, D. 2002. View selection using randomized search. Data Knowl. Eng. 42, 1, 89--111. Google ScholarDigital Library
- Karayannidis, N., Sellis, T. K., and Kouvaras, Y. 2004. Cube file: A file structure for hierarchically clustered olap cubes. In Proceedings of International Conference on Extending Database Technology (EDBT). 621--638.Google Scholar
- Kotidis, Y. and Roussopoulos, N. 1998. An alternative storage organization for rolap aggregate views based on cubetrees. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 249--258. Google ScholarDigital Library
- Kotidis, Y. and Roussopoulos, N. 1999. Dynamat: A dynamic view management system for data warehouses. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 371--382. Google ScholarDigital Library
- Kotsis, N. 2000. Multidimensional aggregation in olap systems. PhD thesis. Department of Computer Science, University of Strathclyde.Google Scholar
- Kotsis, N. and McGregor, D. R. 2000. Elimination of redundant views in multidimensional aggregates. In Proceedings of Data Warehousing and Knowledge Discovery (DaWaK). 146--161. Google ScholarDigital Library
- Lakshmanan, L. V. S., Pei, J., and Han, J. 2002. Quotient cube: How to summarize the semantics of a data cube. In Proceedings of Very Large Data Bases (VLDB). 778--789. Google ScholarDigital Library
- Lakshmanan, L. V. S., Pei, J., and Zhao, Y. 2003. Qc-trees: An efficient summary structure for semantic olap. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 64--75. Google ScholarDigital Library
- Lawrence, M. 2006. Multiobjective genetic algorithms for materialized view selection in olap data warehouses. In Proceedings of Genetic and Evolutionary Computation Congress (GECCO). 699--706. Google ScholarDigital Library
- Lee, M. and Hammer, J. 1999. Speeding up warehouse physical design using a randomized algorithm. In Proceedings of Design and Management of Data Warehouses (DMDW). 3.Google Scholar
- Liang, W., Wang, H., and Orlowska, M. E. 2001. Materialized view selection under the maintenance time constraint. Data Knowl. Eng. 37, 2, 203--216. Google ScholarDigital Library
- Morfonios, K. and Ioannidis, Y. 2006a. Cure for cubes: Cubing using a rolap engine. In Proceedings of Very Large Data Bases (VLDB). 379--390. Google ScholarDigital Library
- Morfonios, K. and Ioannidis, Y. 2006b. Supporting the data cube lifecycle: The power of ROLAP. Int. J. VLDB. http//dx.doi.org/10.1007/s00778-006-0036-8. Google ScholarDigital Library
- Nadeau, T. P. and Teorey, T. J. 2002. Achieving scalability in olap materialized view selection. In Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP). 28--34. Google ScholarDigital Library
- Poosala, V. and Ganti, V. 1999. Fast approximate answers to aggregate queries on a data cube. In Proceedings of International Conference on Statistical and Scientific Database Management (SSDBM). 24--33. Google ScholarDigital Library
- Ross, K. A. and Srivastava, D. 1997. Fast computation of sparse datacubes. In Proceedings of Very Large Data Bases (VLDB). 116--125. Google ScholarDigital Library
- Roussopoulos, N., Kotidis, Y., and Roussopoulos, M. 1997. Cubetree: Organization of and bulk updates on the data cube. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 89--99. Google ScholarDigital Library
- Shanmugasundaram, J., Fayyad, U. M., and Bradley, P. S. 1999. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD). 223--232. Google ScholarDigital Library
- Shao, Z., Han, J., and Xin, D. 2004. Mm-cubing: Computing iceberg cubes by factorizing the lattice space. In Proceedings of International Conference on Scientific and Statistical Database Management (SSDBM). 213--222. Google ScholarCross Ref
- Shukla, A., Deshpande, P., and Naughton, J. F. 1998. Materialized view selection for multidimensional datasets. In Proceedings of Very Large Data Bases (VLDB). 488--499. Google ScholarDigital Library
- Shukla, A., Deshpande, P., Naughton, J. F., and Ramasamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. In Proceedings of Very Large Data Bases (VLDB). 522--531. Google ScholarDigital Library
- Sismanis, Y., Deligiannakis, A., Roussopoulos, N., and Kotidis, Y. 2002. Dwarf: shrinking the petacube. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 464--475. Google ScholarDigital Library
- Sismanis, Y. and Roussopoulos, N. 2004. The complexity of fully materialized coalesced cubes. In Proceedings of Very Large Data Bases (VLDB). 540--551. Google ScholarDigital Library
- Theodoratos, D., Dalamagas, T., Simitsis, A., and Stavropoulos, M. 2001. A randomized approach for the incremental design of an evolving data warehouse. In Proceedings of the 20th International Conference on Conceptual Modeling (ER'01). 325--338. Google ScholarDigital Library
- Theodoratos, D. and Sellis, T. K. 1997. Data warehouse configuration. In Proceedings of Very Large Data Bases (VLDB). 126--135. Google ScholarDigital Library
- Uchiyama, H., Runapongsa, K., and Teorey, T. J. 1999. A progressive view materialization algorithm. In Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP). 36--41. Google ScholarDigital Library
- Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 193--204. Google ScholarDigital Library
- Vitter, J. S., Wang, M., and Iyer, B. R. 1998. Data cube approximation and histograms via wavelets. In Proceedings of International Conference on Information and Knowledge Management (CIKM). 96--104. Google ScholarDigital Library
- Wang, W., Feng, J., Lu, H., and Yu, J. X. 2002. Condensed cube: An efficient approach to reducing data cube size. In Proceedings of International Conference on Data Engineering (ICDE). 155--165. Google ScholarDigital Library
- Xin, D., Han, J., Li, X., and Wah, B. W. 2003. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proceedings of Very Large Data Bases (VLDB). 476--487. Google ScholarDigital Library
- Yu, J. X., Yao, X., Choi, C.-H., and Gou, G. 2003. Materialized view selection as constrained evolutionary optimization. IEEE Trans. Syst., Man, Cybernetics, Part C 33, 4, 458--467.Google ScholarDigital Library
- Zhao, Y., Deshpande, P., and Naughton, J. F. 1997. An array-based algorithm for simultaneous multidimensional aggregates. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD). 159--170. Google ScholarDigital Library
Index Terms
- ROLAP implementations of the data cube
Recommendations
Graph cube: on warehousing and OLAP multidimensional networks
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataWe consider extending decision support facilities toward large sophisticated networks, upon which multidimensional attributes are associated with network entities, thereby forming the so-called multidimensional networks. Data warehouses and OLAP (Online ...
An efficient method for maintaining data cubes incrementally
The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data ...
Multidimensional models: constructing data CUBE
CompSysTech '04: Proceedings of the 5th international conference on Computer systems and technologiesThe goal of this article is to depict algorithms and new approach of creating data cube efficiently and to set tasks for future work. In short, the paper has described it as a data abstraction that allows one to view aggregated data from a number of ...
Comments