Skip to main content
Erschienen in: The Journal of Supercomputing 1/2020

03.11.2019

The curse of indecomposable aggregates for big data exploratory analysis with a case for frequent pattern cubes

verfasst von: Hamid Fadishei, Azadeh Soltani

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Exploratory big data analytics requires the interaction delays to be kept at minimum. Although data cubes help this goal by pre-calculating the measures of interest, some aggregations are not decomposable and require runtime scans through the cube data which will cause the response time to exceed the real-time interaction limits. One of such costly aggregations is the calculation of the frequent patterns over data cube partitions. The existing inefficient merge-and-count approach used for solving this problem is not feasible in the world of big data. In this paper, an efficient approach is proposed for mining frequent patterns from cube data accompanied by a formal overview of decomposable and indecomposable data aggregates. A new concept of semi-decomposable aggregates is introduced that sits in between these two extremes. With the case of frequent pattern mining problem, we show that sometimes indecomposable aggregates are in fact semi-decomposable and exploratory data analysis can still be realized for them. The proposed FPCubes algorithm shows promising experimental results for aggregating frequent patterns which can help exploratory frequent itemset analysis on real-world multidimensional big datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Acharya S, Gibbons PB, Poosala V, Ramaswamy S (1999) The Aqua approximate query answering system. ACM SIGMOD Rec 28:574–576 (ACM)CrossRef Acharya S, Gibbons PB, Poosala V, Ramaswamy S (1999) The Aqua approximate query answering system. ACM SIGMOD Rec 28:574–576 (ACM)CrossRef
2.
Zurück zum Zitat Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18(12):1585–1599CrossRef Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18(12):1585–1599CrossRef
4.
Zurück zum Zitat Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1):29–53CrossRef Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1):29–53CrossRef
5.
Zurück zum Zitat Han J (1997) OLAP mining: an integration of OLAP with data mining. In: Proceedings of the 7th IFIP, vol 2. Citeseer, pp 1–9 Han J (1997) OLAP mining: an integration of OLAP with data mining. In: Proceedings of the 7th IFIP, vol 2. Citeseer, pp 1–9
6.
Zurück zum Zitat Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. ACM SIGMOD Rec 25:205–216 (ACM)CrossRef Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. ACM SIGMOD Rec 25:205–216 (ACM)CrossRef
8.
Zurück zum Zitat Jesus P, Baquero C, Almeida PS (2014) A survey of distributed data aggregation algorithms. IEEE Commun Surv Tutor 17(1):381–404CrossRef Jesus P, Baquero C, Almeida PS (2014) A survey of distributed data aggregation algorithms. IEEE Commun Surv Tutor 17(1):381–404CrossRef
9.
Zurück zum Zitat Jesus P (2012) Robust distributed data aggregation. Ph.D. thesis. University of Minho, Braga, Portugal Jesus P (2012) Robust distributed data aggregation. Ph.D. thesis. University of Minho, Braga, Portugal
10.
Zurück zum Zitat Jordan C (1870) Traite des substitutions et des equations algebriques. Gauthier-Villars, ParisMATH Jordan C (1870) Traite des substitutions et des equations algebriques. Gauthier-Villars, ParisMATH
11.
Zurück zum Zitat Kamat N, Nandi A (2018) A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans Knowl Discov Data (TKDD) 12(1):9 Kamat N, Nandi A (2018) A session-based approach to fast-but-approximate interactive data cube exploration. ACM Trans Knowl Discov Data (TKDD) 12(1):9
12.
Zurück zum Zitat Kamber M, Han J, Chiang J (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. KDD 97:207 Kamber M, Han J, Chiang J (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. KDD 97:207
13.
Zurück zum Zitat Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssi-Yan-Kai G (2018) Roaring bitmaps: implementation of an optimized software library. Softw Pract Exp 48(4):867–895CrossRef Lemire D, Kaser O, Kurz N, Deri L, O’Hara C, Saint-Jacques F, Ssi-Yan-Kai G (2018) Roaring bitmaps: implementation of an optimized software library. Softw Pract Exp 48(4):867–895CrossRef
14.
Zurück zum Zitat Lins L, Klosowski JT, Scheidegger C (2013) Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans Vis Comput Graph 19(12):2456–2465CrossRef Lins L, Klosowski JT, Scheidegger C (2013) Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Trans Vis Comput Graph 19(12):2456–2465CrossRef
15.
Zurück zum Zitat Liu Z, Heer J (2014) The effects of interactive latency on exploratory visual analysis. IEEE Trans Vis Comput Graph 20(12):2122–2131CrossRef Liu Z, Heer J (2014) The effects of interactive latency on exploratory visual analysis. IEEE Trans Vis Comput Graph 20(12):2122–2131CrossRef
16.
Zurück zum Zitat Liu Z, Jiang B, Heer J (2013) imMens: real-time visual querying of big data. Comput Graph Forum 32:421–430CrossRef Liu Z, Jiang B, Heer J (2013) imMens: real-time visual querying of big data. Comput Graph Forum 32:421–430CrossRef
17.
Zurück zum Zitat Messaoud RB, Boussaid O, Rabaseda SL (2006) Mining association rules in OLAP cubes. In: 2006 Innovations in Information Technology. IEEE Messaoud RB, Boussaid O, Rabaseda SL (2006) Mining association rules in OLAP cubes. In: 2006 Innovations in Information Technology. IEEE
18.
Zurück zum Zitat Miranda F, Lins L, Klosowski JT, Silva CT (2018) TopKube: a rank-aware data cube for real-time exploration of spatiotemporal data. IEEE Trans Vis Comput Graph 24(3):1394–1407CrossRef Miranda F, Lins L, Klosowski JT, Silva CT (2018) TopKube: a rank-aware data cube for real-time exploration of spatiotemporal data. IEEE Trans Vis Comput Graph 24(3):1394–1407CrossRef
19.
Zurück zum Zitat Monteiro RS, Zimbrão G, Schwarz H, Mitschang B, de Souza JM (2005) Building the data warehouse of frequent itemsets in the DWFIST approach. In: International Symposium on Methodologies for Intelligent Systems. Springer, pp 294–303 Monteiro RS, Zimbrão G, Schwarz H, Mitschang B, de Souza JM (2005) Building the data warehouse of frequent itemsets in the DWFIST approach. In: International Symposium on Methodologies for Intelligent Systems. Springer, pp 294–303
20.
Zurück zum Zitat Ohmori T, Naruse M, Hoshi M (2007) A new data cube for integrating data mining and OLAP. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, pp 896–903 Ohmori T, Naruse M, Hoshi M (2007) A new data cube for integrating data mining and OLAP. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, pp 896–903
21.
Zurück zum Zitat Pahins CA, Stephens SA, Scheidegger C, Comba JL (2017) Hashedcubes: simple, low memory, real-time visual exploration of big data. IEEE Trans Vis Comput Graph 23(1):671–680CrossRef Pahins CA, Stephens SA, Scheidegger C, Comba JL (2017) Hashedcubes: simple, low memory, real-time visual exploration of big data. IEEE Trans Vis Comput Graph 23(1):671–680CrossRef
22.
Zurück zum Zitat Rahman S, Aliakbarpour M, Kong HK, Blais E, Karahalios K, Parameswaran A, Rubinfield R (2017) I’ve seen enough: incrementally improving visualizations to support rapid decision making. Proc VLDB Endow 10(11):1262–1273CrossRef Rahman S, Aliakbarpour M, Kong HK, Blais E, Karahalios K, Parameswaran A, Rubinfield R (2017) I’ve seen enough: incrementally improving visualizations to support rapid decision making. Proc VLDB Endow 10(11):1262–1273CrossRef
23.
Zurück zum Zitat Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: International Conference on Data Warehousing and Knowledge Discovery. Springer, pp 224–233 Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: International Conference on Data Warehousing and Knowledge Discovery. Springer, pp 224–233
24.
Zurück zum Zitat Shrivastava N, Buragohain C, Agrawal D, Suri S (2004) Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems. ACM, pp 239–249 Shrivastava N, Buragohain C, Agrawal D, Suri S (2004) Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems. ACM, pp 239–249
25.
Zurück zum Zitat Singh K, Shakya HK, Biswas B (2015) An efficient approach to discovering frequent patterns from data cube using aggregation and directed graph. In: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015. ACM, pp 31–35 Singh K, Shakya HK, Biswas B (2015) An efficient approach to discovering frequent patterns from data cube using aggregation and directed graph. In: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015. ACM, pp 31–35
26.
Zurück zum Zitat Tang X, Wehrmeister R, Shau J, Chakraborty A, Alex D, Al Omari A, Atnafu F, Davis J, Deng L, Jaiswal D, et al (2016) SQL-SA for big data discovery. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1182–1193 Tang X, Wehrmeister R, Shau J, Chakraborty A, Alex D, Al Omari A, Atnafu F, Davis J, Deng L, Jaiswal D, et al (2016) SQL-SA for big data discovery. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1182–1193
27.
Zurück zum Zitat Tukey JW (1977) Exploratory data analysis. Addison-Wesley, MassachusettsMATH Tukey JW (1977) Exploratory data analysis. Addison-Wesley, MassachusettsMATH
28.
Zurück zum Zitat Wan M, McAuley J (2018) Item recommendation on monotonic behavior chains. In: Proceedings of the 12th ACM Conference on Recommender Systems. ACM, pp 86–94 Wan M, McAuley J (2018) Item recommendation on monotonic behavior chains. In: Proceedings of the 12th ACM Conference on Recommender Systems. ACM, pp 86–94
29.
Zurück zum Zitat Wang Z, Ferreira N, Wei Y, Bhaskar AS, Scheidegger C (2017) Gaussian cubes: real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans Vis Comput Graph 23(1):681–690CrossRef Wang Z, Ferreira N, Wei Y, Bhaskar AS, Scheidegger C (2017) Gaussian cubes: real-time modeling for visual exploration of large multidimensional datasets. IEEE Trans Vis Comput Graph 23(1):681–690CrossRef
30.
Zurück zum Zitat Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: KDD-97 Proceedings. AAAI, pp 283–286 Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: KDD-97 Proceedings. AAAI, pp 283–286
31.
Zurück zum Zitat Zgraggen E, Galakatos A, Crotty A, Fekete JD, Kraska T (2017) How progressive visualizations affect exploratory analysis. IEEE Trans Vis Comput Graph 23(8):1977–1987CrossRef Zgraggen E, Galakatos A, Crotty A, Fekete JD, Kraska T (2017) How progressive visualizations affect exploratory analysis. IEEE Trans Vis Comput Graph 23(8):1977–1987CrossRef
Metadaten
Titel
The curse of indecomposable aggregates for big data exploratory analysis with a case for frequent pattern cubes
verfasst von
Hamid Fadishei
Azadeh Soltani
Publikationsdatum
03.11.2019
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-03053-8

Weitere Artikel der Ausgabe 1/2020

The Journal of Supercomputing 1/2020 Zur Ausgabe