Skip to main content
Top
Published in: Cluster Computing 2/2018

14-10-2017

A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework

Authors: Sumalatha Saleti, R. B. V. Subramanyam

Published in: Cluster Computing | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms. We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)CrossRef Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8(6), 962–969 (1996)CrossRef
2.
go back to reference Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)CrossRef Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)CrossRef
4.
go back to reference Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)CrossRef Bechini, A., Marcelloni, F., Segatori, A.: A MapReduce solution for associative classification of big data. Inf. Sci. 332, 33–55 (2016)CrossRef
5.
go back to reference Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)CrossRef Chambi, S., Lemire, D., Kaser, O., Godin, R.: Better bitmap performance with Roaring bitmaps. Softw. Pract. Exp. 46(5), 709–719 (2016)CrossRef
6.
go back to reference Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)CrossRefMATH Colantonio, A., Pietro, R.D.: Concise: compressed ’n’ composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)CrossRefMATH
7.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
8.
go back to reference Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)CrossRef Dong, J., Han, M.: BitTableFI: an efficient mining frequent itemsets algorithm. Knowl. Based Syst. 20(4), 329–335 (2007)CrossRef
9.
10.
go back to reference Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016) Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The SPMF open-source data mining library version 2. In: Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery 9853, pp. 36–40 (2016)
11.
go back to reference Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)CrossRef Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Trans. Knowl. Data Eng. 12(3), 337–352 (2000)CrossRef
12.
go back to reference Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)MathSciNetCrossRef Han, J., Yin, J.P.Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8(1), 53–87 (2004)MathSciNetCrossRef
13.
go back to reference Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013) Hong, S., Huaxuan, Z., Shiping, C., Chunyan, H.: The study of improved FP-Growth Algorithm in MapReduce. In: Proceedings of 1st International Workshop on Cloud Computing and Information Security, pp. 250–253 (2013)
14.
go back to reference Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008) Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: Parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender systems, pp. 107–114 (2008)
15.
go back to reference Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011) Li, L., Zhang, M.: The strategy of mining association rule based on cloud computing. In: Proceedings of IEEE International Conference on Business Computing and Global Informatization, pp. 475–478 (2011)
16.
go back to reference Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012) Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (2012)
17.
go back to reference Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014) Meng, S., Dou, W., Zhang, X., Chen, J.: KASR: a keyword-aware service recommendation method on MapReduce for big data applications. IEEE Trans. Parallel Distrib. Syst. 25(12), 3221–3231 (2014)
18.
go back to reference Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013) Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: Proceedings of IEEE International Conference on Big data, pp. 111–118 (2013)
19.
go back to reference Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)CrossRef Mohamed, M.H., Darwieesh, M.M.: Efficient mining frequent itemsets algorithms. Int. J. Mach. Learn. Cybern. 5(6), 823–833 (2013)CrossRef
20.
go back to reference Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016) Sakr, N.A., ELdesouky, A., Arafat, H.: An efficient fast-response content-based image retrieval framework for big data. Comput. Electr. Eng. 54, 522–538 (2016)
21.
go back to reference Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015) Sandhu, R., Sood, S.K.: Scheduling of big data applications on distributed cloud based on QoS parameters. Clust. Comput. 18(2), 817–828 (2015)
22.
go back to reference Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)CrossRef Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl. Based Syst. 21(6), 507–513 (2008)CrossRef
23.
go back to reference Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)CrossRef Spiegler, I., Maayan, R.: Storage and retrieval considerations of binary data bases. Inf. Process. Manag. 21(3), 233–254 (1985)CrossRef
24.
go back to reference Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015) Sun, D., Lee, V.C., Burstein, F., Haghighi, P.D.: An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining. In: Proceedings of IEEE Industrial Electronics and Applications, pp. 108–112 (2015)
25.
go back to reference Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)CrossRef Tsay, Y.J., Hsu, T.J., Yu, J.R.: FIUT: a new method for mining frequent itemsets. Inf. Sci. 179(11), 1724–1737 (2009)CrossRef
26.
go back to reference Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)CrossRef Vennila, V., Kannan, A.R.: Symmetric Matrix-based Predictive Classifier for Big Data computation and information sharing in Cloud. Comput. Electr. Eng. 56, 831–841 (2016)CrossRef
27.
go back to reference Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRef Wang, L., Feng, L., Zhang, J., Liao, P.: An efficient algorithm of frequent itemsets mining based on MapReduce. J. Inf. Comput. Sci. 11(8), 2809–2816 (2014)CrossRef
28.
go back to reference Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008) Wu, K., Stockinger, K., Shoshani, A.: Breaking the curse of cardinality on bitmap indexes. In: Proceedings of the 20th International Conference on Scientific and Statistical Database Management 5069, pp. 348–365 (2008)
29.
go back to reference Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRef Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)CrossRef
30.
go back to reference Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013) Xia, D., Zhou, Y., Rong, Z., Zhang, Z.: IPFP: An improved parallel FP-growth algorithm for frequent itemsets mining. In Proceedings of 59th ISI World Statistics Congress, pp. 4034–4039 (2013)
31.
go back to reference Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)CrossRef Xun, Y., Zhang, J., Qin, X.: FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)CrossRef
32.
go back to reference Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)MathSciNetCrossRef Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)MathSciNetCrossRef
33.
go back to reference Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005) Yuan, Y., Huang, T.: A matrix algorithm for mining association rules. Proc. Int. Conf. Intell. Comput. 3644, 370–379 (2005)
34.
go back to reference Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003) Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 326–335 (2003)
35.
go back to reference Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)CrossRef Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18(4), 1493–1501 (2015)CrossRef
36.
go back to reference Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014) Zhou, L., Wang, X.: Research of the FP-growth algorithm based on cloud environments. J. Softw. 9(3), 676–683 (2014)
37.
go back to reference Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010) Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., Feng, S.: Balanced parallel FP-growth with MapReduce. In: Proceedings of IEEE Youth Conference Information Computing and Telecommunications, pp. 243–246 (2010)
Metadata
Title
A novel Bit Vector Product algorithm for mining frequent itemsets from large datasets using MapReduce framework
Authors
Sumalatha Saleti
R. B. V. Subramanyam
Publication date
14-10-2017
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2018
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1249-x

Other articles of this Issue 2/2018

Cluster Computing 2/2018 Go to the issue

Premium Partner