Skip to main content
Top
Published in: Cluster Computing 3/2019

30-01-2018

Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Authors: Fang’ai Liu, Qianqian Wang, Xin Wang

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the rapid development of the World Wide Web technology, complex and diverse data present explosive growth, so frequent itemset mining plays an essential role. In view of the mining frequent itemsets in multiple data streams by limited computing power of a single processor, an improved algorithm of Parallel Mining Collaborative frequent itemsets in multiple data streams (PMCMD-Stream) was proposed. Firstly, the algorithm compresses the potential and frequent itemsets into CP-Tree only by one-scan and applies increment method to inserting or deleting related branch on CP-Tree, we do not need to repeatedly scanning the databases to generate many candidate frequent itemsets and save the running time. Secondly, this parallelized algorithm can be run in the MapReduce programming environment. Finally, the valuable frequent itemsets, namely global collaborative frequent itemsets, were obtained. Because each candidate frequent itemset is independent, and different candidate frequent itemsets can be processed by multiple computing machines concurrently. The experimental results show that PMCMD-Stream algorithm not only can improve the mining efficiency but also have much better scalability than the existing algorithms, so as to discover the collaborative frequent itemsets from large-scale data streams.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gani, A., Siddiqa, A., Shamshirband, S., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)CrossRef Gani, A., Siddiqa, A., Shamshirband, S., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)CrossRef
2.
go back to reference Shamshirb, S., Kalantari, S., Sam, D.Z., et al.: Expert security system in wireless sensor networks based on fuzzy discussion multi-agent systems. Sci. Res. Essays 5(24), 3840–3849 (2010) Shamshirb, S., Kalantari, S., Sam, D.Z., et al.: Expert security system in wireless sensor networks based on fuzzy discussion multi-agent systems. Sci. Res. Essays 5(24), 3840–3849 (2010)
3.
go back to reference Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. Extern. Mem. Algorithms 50, 107–118 (1998)MathSciNetCrossRef Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. Extern. Mem. Algorithms 50, 107–118 (1998)MathSciNetCrossRef
4.
go back to reference Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, pp. 346–357 (2002)CrossRef Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, pp. 346–357 (2002)CrossRef
5.
go back to reference Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and mining frequent patterns from large windows over data streams. In: IEEE 24th International Conference on: Data Engineering, ICDE 2008. IEEE, pp. 179–188 (2008) Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and mining frequent patterns from large windows over data streams. In: IEEE 24th International Conference on: Data Engineering, ICDE 2008. IEEE, pp. 179–188 (2008)
6.
go back to reference MacBean, N., Peylin, P., Chevallier, F., et al.: Consistent assimilation of multiple data streams in a carbon cycle data assimilation system. Geosci. Model Dev. 9(10), 3569 (2016)CrossRef MacBean, N., Peylin, P., Chevallier, F., et al.: Consistent assimilation of multiple data streams in a carbon cycle data assimilation system. Geosci. Model Dev. 9(10), 3569 (2016)CrossRef
7.
go back to reference Che-Qing, J.I.N., Wei-Ning, Q., Ao-Ying, Z.: Analysis and management of streaming data: a survey. J. Softw. 8, 008 (2004) Che-Qing, J.I.N., Wei-Ning, Q., Ao-Ying, Z.: Analysis and management of streaming data: a survey. J. Softw. 8, 008 (2004)
8.
go back to reference Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM sigmod record. ACM, vol. 22(2), pp. 207–216 (1993)CrossRef Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM sigmod record. ACM, vol. 22(2), pp. 207–216 (1993)CrossRef
9.
go back to reference Han, J., Pei, J., Yin, Y., et al.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRef Han, J., Pei, J., Yin, Y., et al.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRef
10.
go back to reference Chaure, TM., Singh, KR.: Frequent itemset mining techniques—a technical review. In: World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave). IEEE, pp. 1–4 (2016) Chaure, TM., Singh, KR.: Frequent itemset mining techniques—a technical review. In: World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave). IEEE, pp. 1–4 (2016)
11.
go back to reference Yu, J.X., Chong, Z., Lu, H., et al.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases-Vol. 30. VLDB Endowment, pp. 204–215 (2004) Yu, J.X., Chong, Z., Lu, H., et al.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases-Vol. 30. VLDB Endowment, pp. 204–215 (2004)
12.
go back to reference Hristidis, V., Valdivia, O., Vlachos, M., et al.: Information discovery across multiple streams. Inf. Sci. 179(19), 3268–3285 (2009)CrossRef Hristidis, V., Valdivia, O., Vlachos, M., et al.: Information discovery across multiple streams. Inf. Sci. 179(19), 3268–3285 (2009)CrossRef
13.
go back to reference Yeh, M.Y., Dai, B.R., Chen, M.S.: Clustering over multiple evolving streams by events and correlations. IEEE Trans. Knowl. Data Eng. 19(10), 1349–1362 (2007)CrossRef Yeh, M.Y., Dai, B.R., Chen, M.S.: Clustering over multiple evolving streams by events and correlations. IEEE Trans. Knowl. Data Eng. 19(10), 1349–1362 (2007)CrossRef
14.
go back to reference Guo, J., Zhang, P., Tan, J., et al.: Mining frequent patterns across multiple data streams. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, pp. 2325–2328 (2011) Guo, J., Zhang, P., Tan, J., et al.: Mining frequent patterns across multiple data streams. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, pp. 2325–2328 (2011)
15.
go back to reference Gunopulos, D., Khardon, R., Mannila, H., et al.: Discovering all most specific sentences. ACM Trans. Database Syst. (TODS) 28(2), 140–174 (2003)CrossRef Gunopulos, D., Khardon, R., Mannila, H., et al.: Discovering all most specific sentences. ACM Trans. Database Syst. (TODS) 28(2), 140–174 (2003)CrossRef
16.
go back to reference Otey, M.E., Wang, C., Parthasarathy, S., et al.: Mining frequent itemsets in distributed and dynamic databases. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE, pp. 617–620 (2003) Otey, M.E., Wang, C., Parthasarathy, S., et al.: Mining frequent itemsets in distributed and dynamic databases. In: Third IEEE International Conference on Data Mining, ICDM 2003. IEEE, pp. 617–620 (2003)
17.
go back to reference Xun, Y., Zhang, J.: A parallel frequent itemsets mining algorithm based on compressed linked list. Icic Express Lett. 9(8), 2313–2318 (2015) Xun, Y., Zhang, J.: A parallel frequent itemsets mining algorithm based on compressed linked list. Icic Express Lett. 9(8), 2313–2318 (2015)
18.
go back to reference Deng, Z.H., Wang, Z.H., Jiang, J.J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55(9), 2008–2030 (2012)MathSciNetCrossRef Deng, Z.H., Wang, Z.H., Jiang, J.J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55(9), 2008–2030 (2012)MathSciNetCrossRef
19.
go back to reference Yu, H., Wen, J., Wang, H., et al.: An improved Apriori algorithm based on the Boolean matrix and Hadoop. Procedia Eng. 15, 1827–1831 (2011)CrossRef Yu, H., Wen, J., Wang, H., et al.: An improved Apriori algorithm based on the Boolean matrix and Hadoop. Procedia Eng. 15, 1827–1831 (2011)CrossRef
20.
go back to reference Li, H., Wang, Y., Zhang, D., et al.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems. ACM, pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., et al.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems. ACM, pp. 107–114 (2008)
21.
go back to reference Saabith, A.L.S., Sundararajan, E., Bakar, A.A.: Parallel implementation of apriori algorithms on the hadoop-mapreduce platform-an evaluation of literature. J. Theor. Appl. Inf. Technol. 85(3), 321 (2016) Saabith, A.L.S., Sundararajan, E., Bakar, A.A.: Parallel implementation of apriori algorithms on the hadoop-mapreduce platform-an evaluation of literature. J. Theor. Appl. Inf. Technol. 85(3), 321 (2016)
23.
go back to reference Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets MapReduce. IEEE Trans. Sys. Man Cyb. 46(3), 313–325 (2016)CrossRef Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets MapReduce. IEEE Trans. Sys. Man Cyb. 46(3), 313–325 (2016)CrossRef
24.
go back to reference Duong, K.C., Bamha, M., Giacometti, A., et al.: MapFIM: memory aware parallelized frequent itemset mining in very large datasets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp. 478–495 (2017) Duong, K.C., Bamha, M., Giacometti, A., et al.: MapFIM: memory aware parallelized frequent itemset mining in very large datasets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp. 478–495 (2017)
25.
go back to reference Bernecker, T., Cheng, R., Cheung, D.W., et al.: Model-based probabilistic frequent itemset mining. Knowl. Inf. Syst. 37(1), 181–217 (2013)CrossRef Bernecker, T., Cheng, R., Cheung, D.W., et al.: Model-based probabilistic frequent itemset mining. Knowl. Inf. Syst. 37(1), 181–217 (2013)CrossRef
26.
go back to reference Wang, S., Wang, G.R.: Frequent items query algorithm for uncertain sensing data. Jisuanji Xuebao (Chin. J. Comput.) 36(3), 571–581 (2013) Wang, S., Wang, G.R.: Frequent items query algorithm for uncertain sensing data. Jisuanji Xuebao (Chin. J. Comput.) 36(3), 571–581 (2013)
27.
go back to reference Li, H.F., Lee, S.Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)CrossRef Li, H.F., Lee, S.Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36(2), 1466–1477 (2009)CrossRef
28.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
29.
go back to reference Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference very Large Data bases, VLDB, vol. 1215, pp. 487–499 (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference very Large Data bases, VLDB, vol. 1215, pp. 487–499 (1994)
30.
go back to reference Baccarelli, E., Cordeschi, N., Mei, A., et al.: Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study. IEEE Netw. 30(2), 54–61 (2016)CrossRef Baccarelli, E., Cordeschi, N., Mei, A., et al.: Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study. IEEE Netw. 30(2), 54–61 (2016)CrossRef
31.
go back to reference Wu, G., Zhang, H., Qiu, M., et al.: A decentralized approach for mining event correlations in distributed system monitoring. J. Parallel Distrib. Comput. 73(3), 330–340 (2013)CrossRef Wu, G., Zhang, H., Qiu, M., et al.: A decentralized approach for mining event correlations in distributed system monitoring. J. Parallel Distrib. Comput. 73(3), 330–340 (2013)CrossRef
Metadata
Title
Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams
Authors
Fang’ai Liu
Qianqian Wang
Xin Wang
Publication date
30-01-2018
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-018-1859-y

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner