Skip to main content
Erschienen in: Cognitive Computation 3/2019

16.01.2019

A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data

verfasst von: F. Padillo, J. M. Luna, S. Ventura

Erschienen in: Cognitive Computation | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The state-of-the-art in associative classification includes interesting approaches for building accurate and interpretable classifiers. These approaches generally work on four different phases (data discretization, pattern mining, rule mining, and classifier building), some of them being computational expensive. The aim of this work is to propose a novel evolutionary algorithm for efficiently building associative classifiers in Big Data. The proposed model works in only two phases (a grammar-guided genetic programming framework is performed in each phase): (1) mining reliable association rules; (2) building an accurate classifier by ranking and combining the previously mined rules. The proposal has been implemented on different architectures (multi-thread, Apache Spark and Apache Flink) to take advantage of the distributed computing. The experimental results have been obtained on 40 well-known datasets and analyzed through non-parametric tests. Results were compared to multiple approaches in the field and analyzed on three ways: quality of the predictions, level of interpretability, and efficiency. The proposed method obtained accurate and interpretable classifiers in an efficient way even on high-dimensional data, outperforming the state-of-the-art algorithms on three different levels: quality of the predictions, interpretability, and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fernández A, del Río S, Chawla N V, Herrera F. An insight into imbalanced big data classification: outcomes and challenges. Complex &, Intelligent Systems 2017;3(2):105–20.CrossRef Fernández A, del Río S, Chawla N V, Herrera F. An insight into imbalanced big data classification: outcomes and challenges. Complex &, Intelligent Systems 2017;3(2):105–20.CrossRef
2.
Zurück zum Zitat Chen H, Chiang R, Storey V. Business intelligence and analytics: from big data to big impact. MIS Quarterly: Management Information Systems 2012;36(4):1165–88.CrossRef Chen H, Chiang R, Storey V. Business intelligence and analytics: from big data to big impact. MIS Quarterly: Management Information Systems 2012;36(4):1165–88.CrossRef
3.
Zurück zum Zitat Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.CrossRef Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.CrossRef
4.
Zurück zum Zitat Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. SIGMOD Rec 1993;22(2):207–16.CrossRef Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. SIGMOD Rec 1993;22(2):207–16.CrossRef
5.
Zurück zum Zitat Han J, Kamber M. 2011. Data mining: concepts and techniques. Morgan Kaufmann. Han J, Kamber M. 2011. Data mining: concepts and techniques. Morgan Kaufmann.
6.
Zurück zum Zitat Quinlan R. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publishers; 1993. Quinlan R. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publishers; 1993.
7.
Zurück zum Zitat Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97. Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97.
8.
Zurück zum Zitat Thabtah FA. A review of associative classification mining. Knowl Eng Rev 2007;22(1):37–65.CrossRef Thabtah FA. A review of associative classification mining. Knowl Eng Rev 2007;22(1):37–65.CrossRef
9.
Zurück zum Zitat Asghar M Z, Khan A, Bibi A, Kundi F M, Ahmad H. Sentence-level emotion detection framework using rule-based classification. Cogn Comput 2017;9(6):868–94.CrossRef Asghar M Z, Khan A, Bibi A, Kundi F M, Ahmad H. Sentence-level emotion detection framework using rule-based classification. Cogn Comput 2017;9(6):868–94.CrossRef
10.
Zurück zum Zitat Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. 4th International Conference on Knowledge Discovery and Data Mining(KDD98); 1998. p. 80–86. Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. 4th International Conference on Knowledge Discovery and Data Mining(KDD98); 1998. p. 80–86.
11.
Zurück zum Zitat Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Inf Sci 2016;332:33–55.CrossRef Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Inf Sci 2016;332:33–55.CrossRef
12.
Zurück zum Zitat Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008 2008;51(1):107–13.CrossRef Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008 2008;51(1):107–13.CrossRef
13.
Zurück zum Zitat Alcalá-Fdez J, Alcalá R, Herrera F. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 2011;19(5):857–72.CrossRef Alcalá-Fdez J, Alcalá R, Herrera F. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 2011;19(5):857–72.CrossRef
14.
Zurück zum Zitat Venturini L, Baralis E, Garza P. Scaling associative classification for very large datasets. Journal of Big Data 2017;4(1):44.CrossRef Venturini L, Baralis E, Garza P. Scaling associative classification for very large datasets. Journal of Big Data 2017;4(1):44.CrossRef
15.
Zurück zum Zitat Padillo F, Luna J M, Ventura S. Exhaustive search algorithms to mine subgroups on big data using Apache spark. Progress in Artificial Intelligence 2017;6(2):145–58.CrossRef Padillo F, Luna J M, Ventura S. Exhaustive search algorithms to mine subgroups on big data using Apache spark. Progress in Artificial Intelligence 2017;6(2):145–58.CrossRef
16.
Zurück zum Zitat Ventura S, Luna JM. Pattern mining with evolutionary algorithms. New York: Springer International Publishing; 2016.CrossRef Ventura S, Luna JM. Pattern mining with evolutionary algorithms. New York: Springer International Publishing; 2016.CrossRef
17.
Zurück zum Zitat Oneto L, Bisio F, Cambria E, Anguita D. SLT-based ELM for big social data analysis. Cogn Comput 2017;9(2):259–74.CrossRef Oneto L, Bisio F, Cambria E, Anguita D. SLT-based ELM for big social data analysis. Cogn Comput 2017;9(2):259–74.CrossRef
18.
Zurück zum Zitat Kim S S, McLoone S, Byeon J H, Lee S, Liu H. Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 2017;9(2):207–224.CrossRef Kim S S, McLoone S, Byeon J H, Lee S, Liu H. Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 2017;9(2):207–224.CrossRef
19.
Zurück zum Zitat Al-Radaideh Q A, Bataineh DQ. A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms. Cogn Comput 2018;10(4):651–69.CrossRef Al-Radaideh Q A, Bataineh DQ. A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms. Cogn Comput 2018;10(4):651–69.CrossRef
20.
Zurück zum Zitat Molina D, LaTorre A, Herrera F. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cogn Comput 2018;10(4):517–44.CrossRef Molina D, LaTorre A, Herrera F. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cogn Comput 2018;10(4):517–44.CrossRef
21.
Zurück zum Zitat Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015; 7(6):706–14.CrossRef Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015; 7(6):706–14.CrossRef
22.
Zurück zum Zitat Lam C. Hadoop in action, 1st ed. Greenwich: Manning Publications Co.; 2010. Lam C. Hadoop in action, 1st ed. Greenwich: Manning Publications Co.; 2010.
23.
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10. Berkeley, CA, USA; 2010. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10. Berkeley, CA, USA; 2010.
24.
Zurück zum Zitat Kumar C, Anjaiah P, Patil S, Lingappa E, Rakesh M. 2017. Mining association rules from NoSQL data bases using MapReduce fuzzy association rule mining algorithm. Kumar C, Anjaiah P, Patil S, Lingappa E, Rakesh M. 2017. Mining association rules from NoSQL data bases using MapReduce fuzzy association rule mining algorithm.
25.
Zurück zum Zitat Martín D, Martínez-Ballesteros M, García-Gil D, Alcalá-Fdez J, Herrera F, Riquelme-Santos JC. MRQAR: a generic MapReduce framework to discover quantitative association rules in big data problems. Knowl-Based Syst 2018;153:176–92.CrossRef Martín D, Martínez-Ballesteros M, García-Gil D, Alcalá-Fdez J, Herrera F, Riquelme-Santos JC. MRQAR: a generic MapReduce framework to discover quantitative association rules in big data problems. Knowl-Based Syst 2018;153:176–92.CrossRef
26.
Zurück zum Zitat McKay R I, Hoai N X, Whigham P A, Shan Y, O’Neill M. Grammar-based genetic programming: a survey. Genet Program Evolvable Mach 2010;11:365–96.CrossRef McKay R I, Hoai N X, Whigham P A, Shan Y, O’Neill M. Grammar-based genetic programming: a survey. Genet Program Evolvable Mach 2010;11:365–96.CrossRef
27.
Zurück zum Zitat Herrera F, Carmona C J, González P, del Jesus MJ. An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 2011;29(3):495–525.CrossRef Herrera F, Carmona C J, González P, del Jesus MJ. An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 2011;29(3):495–525.CrossRef
28.
Zurück zum Zitat Luna JM, Padillo F, Pechenizkiy M, Ventura S. Apriori versions based on MapReduce for mining frequent patterns on big data. IEEE Trans Cybern 2017;PP(99):1–15. Luna JM, Padillo F, Pechenizkiy M, Ventura S. Apriori versions based on MapReduce for mining frequent patterns on big data. IEEE Trans Cybern 2017;PP(99):1–15.
29.
Zurück zum Zitat Ben-David A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst Appl 2008; 34(2):825– 32.CrossRef Ben-David A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst Appl 2008; 34(2):825– 32.CrossRef
30.
Zurück zum Zitat Triguero I, González S, Moyano J M, Garcîa S, Alcalá-Fdez J, Luengo J, et al. KEEL 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 2017;10(1):1238–49.CrossRef Triguero I, González S, Moyano J M, Garcîa S, Alcalá-Fdez J, Luengo J, et al. KEEL 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 2017;10(1):1238–49.CrossRef
31.
Zurück zum Zitat Yin X, Han J. CPAR: classification based on predictive association rules. 3rd SIAM International Conference on Data Mining(SDM03); 2003. p. 331–5. Yin X, Han J. CPAR: classification based on predictive association rules. 3rd SIAM International Conference on Data Mining(SDM03); 2003. p. 331–5.
32.
Zurück zum Zitat Li W, Han J, Pei J. CMAR: accurate and efficient classification based on multiple class-association rules. 2001 IEEE International Conference on Data Mining(ICDM01); 2001. p. 369–76. Li W, Han J, Pei J. CMAR: accurate and efficient classification based on multiple class-association rules. 2001 IEEE International Conference on Data Mining(ICDM01); 2001. p. 369–76.
33.
Zurück zum Zitat Liu B, Ma Y, Wong CK. . Classification Using Association Rules: Weaknesses and Enhancements. Kluwer Academic Publishers; 2001. p. 591–601. Liu B, Ma Y, Wong CK. . Classification Using Association Rules: Weaknesses and Enhancements. Kluwer Academic Publishers; 2001. p. 591–601.
34.
Zurück zum Zitat Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004;8(1):53–87.CrossRef Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004;8(1):53–87.CrossRef
35.
Zurück zum Zitat Cohen WW. Fast effective rule induction. Machine Learning: Proceedings of the 12th International Conference; 1995. p. 1–10. Cohen WW. Fast effective rule induction. Machine Learning: Proceedings of the 12th International Conference; 1995. p. 1–10.
36.
Zurück zum Zitat Tan K C, Yu Q, Ang J H. A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 2006;37(12):835–64.CrossRef Tan K C, Yu Q, Ang J H. A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 2006;37(12):835–64.CrossRef
37.
Zurück zum Zitat Holte R C. Very simple classification rules perform well on most commonly used datasets. Mach Learn 1993; 11:63–91.CrossRef Holte R C. Very simple classification rules perform well on most commonly used datasets. Mach Learn 1993; 11:63–91.CrossRef
38.
Zurück zum Zitat Segatori A, Bechini A, Ducange P, Marcelloni F. A distributed fuzzy associative classifier for big data. IEEE Trans Cybern 2018;48(9):2656–69.CrossRef Segatori A, Bechini A, Ducange P, Marcelloni F. A distributed fuzzy associative classifier for big data. IEEE Trans Cybern 2018;48(9):2656–69.CrossRef
39.
Zurück zum Zitat Fazzolari M, Alcalá R, Herrera F. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl Soft Comput 2014;24:470–81.CrossRef Fazzolari M, Alcalá R, Herrera F. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl Soft Comput 2014;24:470–81.CrossRef
Metadaten
Titel
A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data
verfasst von
F. Padillo
J. M. Luna
S. Ventura
Publikationsdatum
16.01.2019
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 3/2019
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-018-9617-2

Weitere Artikel der Ausgabe 3/2019

Cognitive Computation 3/2019 Zur Ausgabe