Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2009

01.10.2009

The Multi-Tree Cubing algorithm for computing iceberg cubes

verfasst von: Xing Li, Howard J. Hamilton, Kamran Karimi, Liqiang Geng

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The computation of data cubes is one of the most expensive operations in on-line analytical processing (OLAP). To improve efficiency, an iceberg cube represents only the cells whose aggregate values are above a given threshold (minimum support). Top-down and bottom-up approaches are used to compute the iceberg cube for a data set, but both have performance limitations. In this paper, a new algorithm, called Multi-Tree Cubing (MTC), is proposed for computing an iceberg cube. The Multi-Tree Cubing algorithm is an integrated top-down and bottom-up approach. Overall control is handled in a top-down manner, so MTC features shared computation. By processing the orderings in the opposite order from the Top-Down Computation algorithm, the MTC algorithm is able to prune attributes. The Bottom Up Computation (BUC) algorithm and its variations also perform pruning by relying on the processing of intermediate partitions. The MTC algorithm, however, prunes without processing such partitions. The MTC algorithm is based on a specialized type of prefix tree data structure, called an Attribute–Partition tree (AP-tree), consisting of attribute and partition nodes. The AP-tree facilitates fast, in-memory sorting and APRIORI-like pruning. We report on five series of experiments, which confirm that MTC is consistently as fast or faster than BUC, while finding the same iceberg cubes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agarwal, S., Agrawal, R., Deshpande, P. M., Gupta, A., Naughton, J. F., Ramakrishnan, R., et al. (1996). On the computation of multidimensional aggregates. In Proceedings of the 22nd VLDB conference (pp. 506–521). Bombay, India. Agarwal, S., Agrawal, R., Deshpande, P. M., Gupta, A., Naughton, J. F., Ramakrishnan, R., et al. (1996). On the computation of multidimensional aggregates. In Proceedings of the 22nd VLDB conference (pp. 506–521). Bombay, India.
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on very large databases (pp. 487–499). Santiago, Chile. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on very large databases (pp. 487–499). Santiago, Chile.
Zurück zum Zitat Berry, M. J. A., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. Hoboken: Wiley. Berry, M. J. A., & Linoff, G. (1997). Data mining techniques for marketing, sales, and customer support. Hoboken: Wiley.
Zurück zum Zitat Beyer, K., & Ramakrishnan, R. (1999). Bottom-up computation of sparse and iceberg cubes. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 359–370). Philadelphia Beyer, K., & Ramakrishnan, R. (1999). Bottom-up computation of sparse and iceberg cubes. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 359–370). Philadelphia
Zurück zum Zitat Chen, Y., Dehne, F., Eavis, T., & Rau-Chaplin, A. (2005). PnP: Parallel and external memory iceberg cubes. In Proceedings of the 21st international conference on data engineering (pp. 576–577). Tokyo, Japan. Chen, Y., Dehne, F., Eavis, T., & Rau-Chaplin, A. (2005). PnP: Parallel and external memory iceberg cubes. In Proceedings of the 21st international conference on data engineering (pp. 576–577). Tokyo, Japan.
Zurück zum Zitat Cho, M., Pei, J., & Cheung, D. (2005). Cross table cubing: Mining iceberg cubes from data warehouses. In Proceedings of the 5th SIAM international data mining conference (pp. 461–465). Newport Beach, California, USA. Cho, M., Pei, J., & Cheung, D. (2005). Cross table cubing: Mining iceberg cubes from data warehouses. In Proceedings of the 5th SIAM international data mining conference (pp. 461–465). Newport Beach, California, USA.
Zurück zum Zitat Chou, P. L., & Zhang, X. (2003). Efficiently computing the top N averages in iceberg cubes. In Proceedings of the twenty-sixth Australasian computer science conference (ACSC2003) (pp. 101–109). Adelaide, Australia. Chou, P. L., & Zhang, X. (2003). Efficiently computing the top N averages in iceberg cubes. In Proceedings of the twenty-sixth Australasian computer science conference (ACSC2003) (pp. 101–109). Adelaide, Australia.
Zurück zum Zitat Codd, E. F. (1993). Providing OLAP (On-line Analytical Processing) to user-analysts: An IT mandate. Hoboken: E. F. Codd and Associates. Codd, E. F. (1993). Providing OLAP (On-line Analytical Processing) to user-analysts: An IT mandate. Hoboken: E. F. Codd and Associates.
Zurück zum Zitat Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (Eds.) (1996). Advances in knowledge discovery and data mining. Cambridge: AAAI Press/The MIT Press. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (Eds.) (1996). Advances in knowledge discovery and data mining. Cambridge: AAAI Press/The MIT Press.
Zurück zum Zitat Findlater, L., & Hamilton, H. J. (2003). Iceberg-cube algorithms: An empirical evaluation on synthetic and real data. Intelligent Data Analysis, 7(2), 77–97.MATH Findlater, L., & Hamilton, H. J. (2003). Iceberg-cube algorithms: An empirical evaluation on synthetic and real data. Intelligent Data Analysis, 7(2), 77–97.MATH
Zurück zum Zitat Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., et al. (1997). Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1), 29–54.CrossRef Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., et al. (1997). Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1), 29–54.CrossRef
Zurück zum Zitat Han, J., Pei, J., Dong, G., & Wang, K. (2001). Efficient computation of iceberg cubes with complex measures. In Proceedings of the 2001 ACM SIGMOD international conference on management of data (pp. 1–12). Santa Barbara, California. Han, J., Pei, J., Dong, G., & Wang, K. (2001). Efficient computation of iceberg cubes with complex measures. In Proceedings of the 2001 ACM SIGMOD international conference on management of data (pp. 1–12). Santa Barbara, California.
Zurück zum Zitat Li, X. (2005). The Multi-Tree Cubing algorithm for computing iceberg cubes. M. Sc. Thesis, Department of Computer Science, University of Regina, Regina, SK, Canada, June. Li, X. (2005). The Multi-Tree Cubing algorithm for computing iceberg cubes. M. Sc. Thesis, Department of Computer Science, University of Regina, Regina, SK, Canada, June.
Zurück zum Zitat Poosala, V. (1995). Zipf’s Law. Technical report, Computer Science, University of Wisconsin, Madison, Wisconsin, USA. Poosala, V. (1995). Zipf’s Law. Technical report, Computer Science, University of Wisconsin, Madison, Wisconsin, USA.
Zurück zum Zitat Ross, K. A., & Srivastava, D. (1997). Fast computation of sparse data cubes. In Proceedings of the 23rd international conference on very large databases (pp. 116–125). Athens, Greece. Ross, K. A., & Srivastava, D. (1997). Fast computation of sparse data cubes. In Proceedings of the 23rd international conference on very large databases (pp. 116–125). Athens, Greece.
Zurück zum Zitat Shao, Z., Han, J., & Xin, D. (2004). MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proceedings of the 16th international conference on scientific and statistical database management (SSDBM 2004), June (pp. 213–222). Santorini Island, Greece. Shao, Z., Han, J., & Xin, D. (2004). MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proceedings of the 16th international conference on scientific and statistical database management (SSDBM 2004), June (pp. 213–222). Santorini Island, Greece.
Zurück zum Zitat Wang, K., Jiang, Y., Yu, J. X., Dong, G., & Han, J. (2005). Divide-and-approximate: A novel constraint push strategy for iceberg cube mining. IEEE Transactions on Knowledge and Data Engineering, 17(3), 354–368.CrossRef Wang, K., Jiang, Y., Yu, J. X., Dong, G., & Han, J. (2005). Divide-and-approximate: A novel constraint push strategy for iceberg cube mining. IEEE Transactions on Knowledge and Data Engineering, 17(3), 354–368.CrossRef
Zurück zum Zitat Xin, D., Han, J., Li, X., & Wah, B. W. (2003). Star-Cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proceedings of the 26th international conference on very large databases (VLDB’03) (pp. 476–487). Berlin, Germany. Xin, D., Han, J., Li, X., & Wah, B. W. (2003). Star-Cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proceedings of the 26th international conference on very large databases (VLDB’03) (pp. 476–487). Berlin, Germany.
Zurück zum Zitat Zhao, Y., Deshpande, P., & Naughton, J. F. (1997). An array-based algorithm for simultaneous multidimensional aggregates. In Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 159–170) Tucson, Arizona. Zhao, Y., Deshpande, P., & Naughton, J. F. (1997). An array-based algorithm for simultaneous multidimensional aggregates. In Proceedings of the 1997 ACM SIGMOD international conference on management of data (pp. 159–170) Tucson, Arizona.
Metadaten
Titel
The Multi-Tree Cubing algorithm for computing iceberg cubes
verfasst von
Xing Li
Howard J. Hamilton
Kamran Karimi
Liqiang Geng
Publikationsdatum
01.10.2009
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2009
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-008-0074-3

Premium Partner