Skip to main content
Erschienen in: Cluster Computing 2/2015

01.06.2015

Mining association rules in big data with NGEP

verfasst von: Yunliang Chen, Fangyuan Li, Junqing Fan

Erschienen in: Cluster Computing | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Analyses and applications of big data require special technologies to efficiently process large number of data. Mining association rules focus on obtaining relations between data. When mining association rules in big data, conventional methods encounter severe problems incurred by the tremendous cost of computing and inefficiency to achieve the goal. This study proposes an evolutionary algorithm to address these problems, namely Niche-Aided Gene Expression Programming (NGEP). The NGEP algorithm (1) divides individuals to several niches to evolve separately and fuses selected niches according to the similarities of the best individuals to ensure the dispersibility of chromosomes, and (2) adjusts the fitness function to adapt to the needs of the underlying applications. A number of experiments have been performed to compare NGEP with the FP-Growth and Apriori algorithms to evaluate the NGEP’s performance in mining association rules with a dataset of measurement for environment pressure (Iris dataset) and an Artificial Simulation Database (ASD). Experimental results indicate that NGEP can efficiently achieve more association rules (36 vs. 33 vs. 25 in Iris dataset experiments and 57 vs. 44 vs. 44 in ASD experiments) with a higher accuracy rate (74.8 vs. 53.2 vs. 50.6 % in Iris dataset experiments and 95.8 vs. 77.4 vs. 80.3 % in ASD experiments) and the time of computing is also much less than the other two methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lizhe, W., Ke, L., Peng, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)CrossRef Lizhe, W., Ke, L., Peng, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)CrossRef
2.
Zurück zum Zitat Barnes, J.: Data, data, everywhere. ITS Int. 20(1), 44–49 (2014) Barnes, J.: Data, data, everywhere. ITS Int. 20(1), 44–49 (2014)
3.
Zurück zum Zitat Deng, Z., Wu, X., Wang, L., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.1109/TPDS.2014.2311811 Deng, Z., Wu, X., Wang, L., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.​1109/​TPDS.​2014.​2311811
4.
Zurück zum Zitat Chen, D., Wang, L., Wu, X., et al.: Hybrid modeling and simulation of huge crowd over a hierarchical grid architecture. Future Gener. Comput. Syst. 29(5), 1309–1317 (2013)CrossRefMathSciNet Chen, D., Wang, L., Wu, X., et al.: Hybrid modeling and simulation of huge crowd over a hierarchical grid architecture. Future Gener. Comput. Syst. 29(5), 1309–1317 (2013)CrossRefMathSciNet
5.
Zurück zum Zitat Chen, D., Wang, L., Zomaya, A., et al.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.1109/TPDS.2014.2311805 Chen, D., Wang, L., Zomaya, A., et al.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. (2014). doi:10.​1109/​TPDS.​2014.​2311805
6.
Zurück zum Zitat Xue, W., Yang, C., Fu, H. et al.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium (2014). IEEE Xue, W., Yang, C., Fu, H. et al.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium (2014). IEEE
7.
Zurück zum Zitat Zhao, J., Wang, L., Tao, J., et al.: A security framework in G-Hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)CrossRefMATHMathSciNet Zhao, J., Wang, L., Tao, J., et al.: A security framework in G-Hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)CrossRefMATHMathSciNet
8.
Zurück zum Zitat Chen, D., Turner, S.J., Cai, W., et al.: Synchronization in federation community networks. J. Parallel Distrib. Comput. 70(2), 144–159 (2010)CrossRefMATH Chen, D., Turner, S.J., Cai, W., et al.: Synchronization in federation community networks. J. Parallel Distrib. Comput. 70(2), 144–159 (2010)CrossRefMATH
9.
Zurück zum Zitat Ma, Y., Wang, L., Liu, D., et al.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. Prac. Exper. 25(12), 1784–1797 (2013)CrossRef Ma, Y., Wang, L., Liu, D., et al.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. Prac. Exper. 25(12), 1784–1797 (2013)CrossRef
10.
Zurück zum Zitat Ma, Y., Wang, L., Zomaya, A., et al.: Task-tree based large-scale Mosaicking for remote sensed imageries with dynamic DAG scheduling. IEEE Trans. Parallel Distrib. Syst. 25(8), 2126–2137 (2013)CrossRef Ma, Y., Wang, L., Zomaya, A., et al.: Task-tree based large-scale Mosaicking for remote sensed imageries with dynamic DAG scheduling. IEEE Trans. Parallel Distrib. Syst. 25(8), 2126–2137 (2013)CrossRef
11.
Zurück zum Zitat Wang, L., von Laszewski, G., Younge, A., et al.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)CrossRefMATH Wang, L., von Laszewski, G., Younge, A., et al.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)CrossRefMATH
12.
Zurück zum Zitat Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI Press (1991) Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI Press (1991)
13.
Zurück zum Zitat Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Record (1993) Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Record (1993)
14.
Zurück zum Zitat Li, L., Xue, W., Ranjan, R., et al.: A scalable Helmholtz solver in GRAPES over large-scale multicore cluster. Concurr. Comput. Prac. Exper. 25(12), 1722–1737 (2013)CrossRef Li, L., Xue, W., Ranjan, R., et al.: A scalable Helmholtz solver in GRAPES over large-scale multicore cluster. Concurr. Comput. Prac. Exper. 25(12), 1722–1737 (2013)CrossRef
15.
Zurück zum Zitat Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33–43 (2014) Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33–43 (2014)
16.
Zurück zum Zitat Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994) Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994)
17.
Zurück zum Zitat Duru, N.: An application of apriori algorithm on a diabetic database. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 398–404. Springer, Berlin (2005) Duru, N.: An application of apriori algorithm on a diabetic database. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 398–404. Springer, Berlin (2005)
18.
Zurück zum Zitat Aflori, C., Craus, M.: Grid implementation of the Apriori algorithm. Adv. Eng. Softw. 38(5), 295–300 (2007)CrossRef Aflori, C., Craus, M.: Grid implementation of the Apriori algorithm. Adv. Eng. Softw. 38(5), 295–300 (2007)CrossRef
19.
20.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
21.
Zurück zum Zitat Shaheen, M., Shahbaz, M., Guergachi, A.: Context based positive and negative spatio-temporal association rule mining. Knowledge-Based Syst. 37, 261–273 (2013)CrossRef Shaheen, M., Shahbaz, M., Guergachi, A.: Context based positive and negative spatio-temporal association rule mining. Knowledge-Based Syst. 37, 261–273 (2013)CrossRef
22.
Zurück zum Zitat Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using Nodesets. Exper. Syst. Appl. 41(10), 4505–4512 (2014)CrossRef Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using Nodesets. Exper. Syst. Appl. 41(10), 4505–4512 (2014)CrossRef
23.
Zurück zum Zitat Deng, Z., Wang, Z., Jiang, J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inform. Sci. 55(9), 2008–2030 (2012)CrossRefMATHMathSciNet Deng, Z., Wang, Z., Jiang, J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inform. Sci. 55(9), 2008–2030 (2012)CrossRefMATHMathSciNet
24.
Zurück zum Zitat Deng, Z., Wang, Z.: A new fast vertical method for mining frequent patterns. Int. J. Comput. Intell. Syst. 3(6), 733–744 (2010)CrossRefMathSciNet Deng, Z., Wang, Z.: A new fast vertical method for mining frequent patterns. Int. J. Comput. Intell. Syst. 3(6), 733–744 (2010)CrossRefMathSciNet
25.
Zurück zum Zitat Romão, W., Freitas, A.A., Gimenes, I.M.D.S.: Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput. 4(2), 121–137 (2004) Romão, W., Freitas, A.A., Gimenes, I.M.D.S.: Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput. 4(2), 121–137 (2004)
26.
Zurück zum Zitat Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)CrossRef Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)CrossRef
27.
Zurück zum Zitat Chen, D., Li, D., Xiong, M., et al.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inform. Technol. Biomed. 14(6), 1417–1427 (2010)CrossRef Chen, D., Li, D., Xiong, M., et al.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inform. Technol. Biomed. 14(6), 1417–1427 (2010)CrossRef
29.
Zurück zum Zitat Chen, Y., Chen, D., Khan, S.U., et al.: Solving symbolic regression problems with uniform design-aided gene expression programming. J. Supercomput. 66(3), 1553–1575 (2013)CrossRefMathSciNet Chen, Y., Chen, D., Khan, S.U., et al.: Solving symbolic regression problems with uniform design-aided gene expression programming. J. Supercomput. 66(3), 1553–1575 (2013)CrossRefMathSciNet
30.
Zurück zum Zitat Wei, W., Wang, Q., Wang, H., et al.: The feature extraction of nonparametric curves based on niche genetic algorithms and multi-population competition. Pattern Recognit. Lett. 26(10), 1483–1497 (2005)CrossRef Wei, W., Wang, Q., Wang, H., et al.: The feature extraction of nonparametric curves based on niche genetic algorithms and multi-population competition. Pattern Recognit. Lett. 26(10), 1483–1497 (2005)CrossRef
31.
Zurück zum Zitat Ferreira, C.: Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: 4th International Workshop on Frontiers in Evolutionary Algorithms (2002) Ferreira, C.: Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: 4th International Workshop on Frontiers in Evolutionary Algorithms (2002)
32.
Zurück zum Zitat Wang, L., Chen, D., Hu, Y., et al.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39(1), 3–14 (2013)CrossRef Wang, L., Chen, D., Hu, Y., et al.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39(1), 3–14 (2013)CrossRef
33.
Zurück zum Zitat Freitas, A.A.: A survey of evolutionary algorithms for data mining and knowledge discovery. In Advances in Evolutionary Computing, pp. 819–845. Springer, Berlin (2003) Freitas, A.A.: A survey of evolutionary algorithms for data mining and knowledge discovery. In Advances in Evolutionary Computing, pp. 819–845. Springer, Berlin (2003)
34.
Zurück zum Zitat Noda, E., Freitas, A.A., Lopes, H.S.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation (1999) Noda, E., Freitas, A.A., Lopes, H.S.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation (1999)
35.
Zurück zum Zitat Lopes, H.S., Weinert, W.R.: EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int. J. Appl. Math. Comput. Sci. 14(3), 375–384 (2004)MATHMathSciNet Lopes, H.S., Weinert, W.R.: EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int. J. Appl. Math. Comput. Sci. 14(3), 375–384 (2004)MATHMathSciNet
36.
Zurück zum Zitat Ferreira, C.: Function finding and the creation of numerical constants in gene expression programming. In Advances in Soft Computing: Engineering Design and Manufacturing, p. 265 (2003) Ferreira, C.: Function finding and the creation of numerical constants in gene expression programming. In Advances in Soft Computing: Engineering Design and Manufacturing, p. 265 (2003)
37.
Zurück zum Zitat Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley, Boston (1989) Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley, Boston (1989)
38.
Zurück zum Zitat Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press , Cambridge (1994) Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press , Cambridge (1994)
39.
Zurück zum Zitat Zhang, J., Huang, D.-S., Lok, T.-M., et al.: A novel adaptive sequential niche technique for multimodal function optimization. Neurocomputing 69(16), 2396–2401 (2006)CrossRef Zhang, J., Huang, D.-S., Lok, T.-M., et al.: A novel adaptive sequential niche technique for multimodal function optimization. Neurocomputing 69(16), 2396–2401 (2006)CrossRef
40.
Zurück zum Zitat Ferreira, C.: Genetic representation and genetic neutrality in gene expression programming. Adv. Complex Syst. 5(04), 389–408 (2002)CrossRefMATH Ferreira, C.: Genetic representation and genetic neutrality in gene expression programming. Adv. Complex Syst. 5(04), 389–408 (2002)CrossRefMATH
41.
Zurück zum Zitat Siwei, J., Zhihua, C., Dang, Z.: Parallel gene expression programming algorithm based on simulated annealing method. ACTA Electr. Sinica 33, 2017–2021 (2005) Siwei, J., Zhihua, C., Dang, Z.: Parallel gene expression programming algorithm based on simulated annealing method. ACTA Electr. Sinica 33, 2017–2021 (2005)
42.
Zurück zum Zitat Zuo, J., Tang, C., Zhang, T.: Mining predicate association rule by gene expression programming. In Advances in Web-Age Information Management, pp. 281–294. Springer, Berlin (2002) Zuo, J., Tang, C., Zhang, T.: Mining predicate association rule by gene expression programming. In Advances in Web-Age Information Management, pp. 281–294. Springer, Berlin (2002)
43.
Zurück zum Zitat Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM Sigmod Rec. 27(1), 41–46 (1998)CrossRef Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM Sigmod Rec. 27(1), 41–46 (1998)CrossRef
44.
Zurück zum Zitat Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. (2014). doi:10.1109/TC.2013.2295806 Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. (2014). doi:10.​1109/​TC.​2013.​2295806
Metadaten
Titel
Mining association rules in big data with NGEP
verfasst von
Yunliang Chen
Fangyuan Li
Junqing Fan
Publikationsdatum
01.06.2015
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-014-0419-3

Weitere Artikel der Ausgabe 2/2015

Cluster Computing 2/2015 Zur Ausgabe