Skip to main content
Erschienen in:
Buchtitelbild

2018 | OriginalPaper | Buchkapitel

Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets

verfasst von : Carson K. Leung, Hao Zhang, Joglas Souza, Wookey Lee

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Advances in technology and the increasing growth of popularity on Internet of Things (IoT) for many applications have produced huge volume of data at a high velocity. These valuable big data can be of a wide variety or different veracity. Embedded in these big data are useful information and valuable knowledge. This leads to data science, which aims to apply big data analytics to mine implicit, previously unknown and potentially useful information from big data. As a popular data analytic task, frequent itemset mining discovers knowledge about sets of frequently co-occurring items in the big data. Such a task has drawn attention in both academia and industry partially due to its practicality in various real-life applications. Existing mining approaches mostly use serial, distributed or parallel algorithms to mine the data horizontally (i.e., on a transaction basis). In this paper, we present an alternative big data analytic approach. Specifically, our scalable algorithm uses the MapReduce programming model that runs in a Spark environment to mine the data vertically (i.e., on an item basis). Evaluation results show the effectiveness of our algorithm in big data analytics of frequent itemsets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–399 (1994) Aggarwal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–399 (1994)
4.
Zurück zum Zitat Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Pazdor, A.G.M., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017)CrossRef Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Pazdor, A.G.M., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017)CrossRef
5.
Zurück zum Zitat Brown, J.A., Cuzzocrea, A., Kresta, M., Kristjanson, K.D.L., Leung, C.K., Tebinka, T.W.: A machine learning system for supporting advanced knowledge discovery from chess game data. In: IEEE ICMLA 2017, pp. 649–654 (2017) Brown, J.A., Cuzzocrea, A., Kresta, M., Kristjanson, K.D.L., Leung, C.K., Tebinka, T.W.: A machine learning system for supporting advanced knowledge discovery from chess game data. In: IEEE ICMLA 2017, pp. 649–654 (2017)
7.
Zurück zum Zitat Cuzzocrea, A., Jiang, F., Leung, C.K., Liu, D., Peddle, A., Tanbeer, S.K.: Mining popular patterns: a novel mining problem and its application to static transactional databases and dynamic data streams. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI. LNCS, vol. 9260, pp. 115–139. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47804-2_6CrossRef Cuzzocrea, A., Jiang, F., Leung, C.K., Liu, D., Peddle, A., Tanbeer, S.K.: Mining popular patterns: a novel mining problem and its application to static transactional databases and dynamic data streams. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI. LNCS, vol. 9260, pp. 115–139. Springer, Heidelberg (2015). https://​doi.​org/​10.​1007/​978-3-662-47804-2_​6CrossRef
8.
Zurück zum Zitat Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. JMLR 15(1), 3389–3393 (2014)MATH Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C., Tseng, V.S.: SPMF: a Java open-source pattern mining library. JMLR 15(1), 3389–3393 (2014)MATH
10.
Zurück zum Zitat Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000) Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)
11.
Zurück zum Zitat Hoi, C.S.H., Leung, C.K., Tran, K., Cuzzocrea, A., Bochicchio, M., Simonetti, M.: Supporting social information discovery from big uncertain social key-value data via graph-like metaphors. In: Xiao, J., Mao, Z.-H., Suzumura, T., Zhang, L.-J. (eds.) ICCC 2018. LNCS, vol. 10971, pp. 102–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94307-7_8CrossRef Hoi, C.S.H., Leung, C.K., Tran, K., Cuzzocrea, A., Bochicchio, M., Simonetti, M.: Supporting social information discovery from big uncertain social key-value data via graph-like metaphors. In: Xiao, J., Mao, Z.-H., Suzumura, T., Zhang, L.-J. (eds.) ICCC 2018. LNCS, vol. 10971, pp. 102–116. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-94307-7_​8CrossRef
12.
Zurück zum Zitat Islam, M.A., Ahmed, C.F., Leung, C.K., Hoi, C.S.H.: WFSM-MaxPWS: an efficient approach for mining weighted frequent subgraphs from edge-weighted graph databases. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 664–676. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_52CrossRef Islam, M.A., Ahmed, C.F., Leung, C.K., Hoi, C.S.H.: WFSM-MaxPWS: an efficient approach for mining weighted frequent subgraphs from edge-weighted graph databases. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 664–676. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-93040-4_​52CrossRef
13.
Zurück zum Zitat Leung, C.K.: Big data analysis and mining. In: Encyclopedia of Information Science and Technology, 4th edn, pp. 338–348 (2018) Leung, C.K.: Big data analysis and mining. In: Encyclopedia of Information Science and Technology, 4th edn, pp. 338–348 (2018)
15.
Zurück zum Zitat Leung, C.K., Carmichael, C.L., Johnstone, P., Xing, R.R., Yuen, D.S.H.: Interactive visual analytics of big data. In: Ontologies and Big Data Considerations for Effective Intelligence, pp. 1–26 (2017) Leung, C.K., Carmichael, C.L., Johnstone, P., Xing, R.R., Yuen, D.S.H.: Interactive visual analytics of big data. In: Ontologies and Big Data Considerations for Effective Intelligence, pp. 1–26 (2017)
18.
Zurück zum Zitat Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: ACM RecSys 2008, pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: ACM RecSys 2008, pp. 107–114 (2008)
21.
Zurück zum Zitat Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE BigData 2013, pp. 111–118 (2013) Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE BigData 2013, pp. 111–118 (2013)
22.
Zurück zum Zitat Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-Mine: hyper-structure mining of frequent patterns in large databases. In: IEEE ICDM 2001, pp. 441–448 (2001) Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-Mine: hyper-structure mining of frequent patterns in large databases. In: IEEE ICDM 2001, pp. 441–448 (2001)
23.
Zurück zum Zitat Qiu, H., Gu, R., Yuan, C., Huang Y.: YAFIM: a parallel frequent itemset mining algorithm with Spark. In: IEEE IPDPS 2014 Workshops, pp. 1664–1671 (2014) Qiu, H., Gu, R., Yuan, C., Huang Y.: YAFIM: a parallel frequent itemset mining algorithm with Spark. In: IEEE IPDPS 2014 Workshops, pp. 1664–1671 (2014)
24.
Zurück zum Zitat Rahman, M.M., Ahmed, C.F., Leung, C.K., Pazdor, A.G.M.: Frequent sequence mining with weight constraints in uncertain databases. In: ACM IMCOM 2018, Article no. 48 (2018) Rahman, M.M., Ahmed, C.F., Leung, C.K., Pazdor, A.G.M.: Frequent sequence mining with weight constraints in uncertain databases. In: ACM IMCOM 2018, Article no. 48 (2018)
26.
Zurück zum Zitat Shenoy, P., Bhalotia, J.R., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: ACM SIGMOD 2000, pp. 22–33 (2000) Shenoy, P., Bhalotia, J.R., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: ACM SIGMOD 2000, pp. 22–33 (2000)
28.
Zurück zum Zitat Zaki, M.J.: Scalable algorithms for association mining. IEEE TKDE 12(3), 372–390 (2000) Zaki, M.J.: Scalable algorithms for association mining. IEEE TKDE 12(3), 372–390 (2000)
29.
Zurück zum Zitat Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: KDD 2003, pp. 326–335 (2003) Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: KDD 2003, pp. 326–335 (2003)
30.
Zurück zum Zitat Zhang, Z., Ji, G., Tang, M.: MREclat: an algorithm for parallel mining frequent itemsets. In: CBD 2013, pp. 177–180 (2013) Zhang, Z., Ji, G., Tang, M.: MREclat: an algorithm for parallel mining frequent itemsets. In: CBD 2013, pp. 177–180 (2013)
Metadaten
Titel
Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets
verfasst von
Carson K. Leung
Hao Zhang
Joglas Souza
Wookey Lee
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-98809-2_1

Premium Partner