Skip to main content
Log in

Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Selecting the most discriminative features is a challenging problem in many applications. Bio-inspired optimization algorithms have been widely applied to solve many optimization problems including the feature selection problem. In this paper, the most discriminating features were selected by a new Chaotic Dragonfly Algorithm (CDA) where chaotic maps embedded with searching iterations of the Dragonfly Algorithm (DA). Ten chaotic maps were employed to adjust the main parameters of dragonflies’ movements through the optimization process to accelerate the convergence rate and improve the efficiency of DA. The proposed algorithm is employed for selecting features from the dataset that were extracted from the Drug bank database, which contained 6712 drugs. In this paper, 553 drugs that were bio-transformed into liver are used. This data have four toxic effects, namely, irritant, mutagenic, reproductive, and tumorigenic effect, where each drug is represented by 31 chemical descriptors. The proposed model is mainly comprised of three phases; data pre-processing, features selection, and the classification phase. In the data pre-processing phase, Synthetic Minority Over-sampling Technique (SMOTE) was used to solve the problem of the imbalanced dataset. At the features selection phase, the most discriminating features were selected using CDA. Finally, the selected features from CDA were used to feed Support Vector Machine (SVM) classifier at the classification phase. Experimental results proved the capability of CDA to find the optimal feature subset, which maximizing the classification performance and minimizing the number of selected features compared with DA and the other meta-heuristic optimization algorithms. Moreover, the experiments showed that Gauss chaotic map was the appropriate map to significantly boost the performance of DA. Additionally, the high obtained value of accuracy (81.82–96.08%), recall (80.84–96.11%), precision (81.45–96.08%) and F-Score (81.14–96.1%) for all toxic effects proved the robustness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Liu H, Motoda H (2012) Feature selection for knowledge discovery and data mining, vol 454. Springer Science & Business Media, Berlin

    Google Scholar 

  2. Yu L, Liu H (2003) Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863

  3. Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’m AZ, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems

  4. Tharwat A, Gaber T, Ibrahim A, Hassanien AE (2017) Linear discriminant analysis: a detailed tutorial. AI Commun 30(2):169–190

    Article  MathSciNet  Google Scholar 

  5. Tharwat A (2016) Principal component analysis-a tutorial. Int J Appl Pattern Recogn 3(3):197–240

    Article  Google Scholar 

  6. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671

    Article  Google Scholar 

  7. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66

    Article  Google Scholar 

  8. Kashef S, Nezamabadi-pour H (2013) A new feature selection algorithm based on binary ant colony optimization. in: 5th conference on information and knowledge technology (IKT). IEEE, pp 50–54

  9. Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl Soft Comput 43:117–130

    Article  Google Scholar 

  10. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J Glob Optim 39(3):459–471

    Article  MathSciNet  MATH  Google Scholar 

  11. Moayedikia A, Jensen R, Wiil UK, Forsati R (2015) Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng Appl Artif Intell 44:153–167

    Article  Google Scholar 

  12. Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98

    Article  Google Scholar 

  13. Zawbaa HM, Emary E, Parv B (2015) Feature selection based on antlion optimization algorithm. in: Third world conference on complex systems (WCCS). IEEE, pp 1–7

  14. Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103

    Article  Google Scholar 

  15. Mirjalili S, Gandomi AH, Mirjalili S, Saremi S, Faris H, Mirjalili S (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  16. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053– 1073

    Article  Google Scholar 

  17. Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3-4):853–871

    Article  Google Scholar 

  18. Gandomi AH, Yang XS (2014) Chaotic bat algorithm. J Comput Sci 5(2):224–232

    Article  MathSciNet  Google Scholar 

  19. Pereira M, Costa VS, Camacho R, Fonseca NA, Simões C, Brito RM (2009) Comparative study of classification algorithms using molecular descriptors in toxicological databases. In: Advances in Bioinformatics and Computational Biology. Springer, Berlin, pp 121–132

  20. Huang R, Southall N, Xia M, Cho MH, Jadhav A, Nguyen DT, Inglese J, Tice RR, Austin CP (2009) Weighted feature significance (wfs): a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features. Toxicol Sci 112(2):385–393

    Article  Google Scholar 

  21. Tharwat A, Gaber T, Fouad MM, Snasel V, Hassanien AE (2015) Towards an automated zebrafish-based toxicity test model using machine learningProceedings of International Conference on Communications, management, and Information technology (ICCMIT’2015). Proced Comput Sci 65:643–651

    Article  Google Scholar 

  22. Klopman G (1984) Artificial intelligence approach to structure-activity studies. computer automated structure evaluation of biological activity of organic molecules. J Am Chem Soc 106(24):7315–7321

    Article  Google Scholar 

  23. Prival MJ (2001) Evaluation of the topkat system for predicting the carcinogenicity of chemicals. Environ Mol Mutagen 37(1):55–69

    Article  Google Scholar 

  24. Woo YT, Lai DY, Argus MF, Arcos JC (1995) Development of structure-activity relationship rules for predicting carcinogenic potential of chemicals. Toxicol Lett 79(1):219–228

    Article  Google Scholar 

  25. Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55(2):460–473

    Article  Google Scholar 

  26. Tharwat A, Moemen YS, Hassanien AE (2017) Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines. J Biomed Inform 68:132–149

    Article  Google Scholar 

  27. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  28. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Article  Google Scholar 

  29. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  30. López V, Fernández A, Moreno-Torres JG, Herrera F (2012) Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. open problems on intrinsic data characteristics. Expert Syst Appl 39(7):6585–6608

    Article  Google Scholar 

  31. López V, Fernández A, Del Jesus MJ, Herrera F (2013) A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets. Knowl-Based Syst 38:85–104

    Article  Google Scholar 

  32. Reynolds CW (1987) Flocks, herds and schools: a distributed behavioral model. ACM SIGGRAPH Comput Graph 21(4):25–34

    Article  Google Scholar 

  33. Tharwat A, Hassanien AE (2018) Chaotic antlion algorithm for parameter optimization of support vector machine. Appl Intell 48(3):670–686

    Article  Google Scholar 

  34. Zhang Q, Li Z, Zhou CJ, Wei XP (2013) Bayesian network structure learning based on the chaotic particle swarm optimization algorithm. Genet Mol Res 12(4):4468–4479

    Article  Google Scholar 

  35. Saremi S, Mirjalili S, Lewis A (2014) Biogeography-based optimization with chaos. Neural Comput Appl 25(5):1077–1097

    Article  Google Scholar 

  36. Sarafrazi S (2013) Facing the classification of binary problems with a gsa-svm hybrid system. Math Comput Model 57:270–278

    Article  MathSciNet  MATH  Google Scholar 

  37. Tharwat A, Mahdi H, Elhoseny M, Hassanien AE (2018) Recognizing human activity in mobile crowdsensing environment using optimized k-nn algorithm. Expert Syst Appl 107:32–44

    Article  Google Scholar 

  38. Tharwat A (2016) Linear vs. quadratic discriminant analysis classifier: a tutorial. Int J Appl Pattern Recogn 3(2):145–180

    Article  Google Scholar 

  39. Liu Z, Cui Y, Li W (2015) A classification method for complex power quality disturbances using eemd and rank wavelet svm. IEEE Trans Smart Grid 6(4):1678–1685

    Article  Google Scholar 

  40. Sun L, Liu H, Zhang L, Meng J (2015) lncrscan-svm: a tool for predicting long non-coding rnas using support vector machine. PloS one 10(10):e0139654

    Article  Google Scholar 

  41. Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput 15(7):1667–1689

    Article  MATH  Google Scholar 

  42. Tharwat A, Hassanien AE, Elnaghi BE (2017) A ba-based algorithm for parameter optimization of support vector machine. Pattern Recogn Lett 93:13–22

    Article  Google Scholar 

  43. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57

    Article  Google Scholar 

  44. Mirjalili S, Mirjalili S, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  45. Meng X, Liu Y, Gao X, Zhang H (2014) A new bio-inspired algorithm: chicken swarm optimization. In: International conference in swarm intelligence. Springer, Berlin, pp 86–94

  46. Askarzadeh A (2016) A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Comput Struct 169:1–12

    Article  Google Scholar 

  47. Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133

    Article  Google Scholar 

  48. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83

    Article  Google Scholar 

  49. Derrac J, García S., Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evoloutionary Comput 1(1):3–18

    Article  Google Scholar 

  50. Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824

    Article  Google Scholar 

  51. Schiezaro M, Pedrini H (2013) Data feature selection based on artificial bee colony algorithm. EURASIP J Image Video Process 2013(1):1–8

    Article  Google Scholar 

  52. Sayed GI, Soliman M, Hassanien AE (2016) Bio-inspired swarm techniques for thermogram breast cancer detection. In: Medical Imaging in Clinical Applications. Springer, pp 487–506

  53. Hafez AI, Zawbaa HM, Emary E, Mahmoud HA, Hassanien AE (2015) An innovative approach for feature selection based on chicken swarm optimization. In: 2015 7th international conference of soft computing and pattern recognition (SoCPaR). IEEE, pp 19–24

  54. Hafez AI, Zawbaa HM, Emary E, Hassanien AE (2016) Sine cosine optimization algorithm for feature selection. In: International symposium on INnovations in intelligent systems and applications (INISTA). IEEE, pp 1–5

  55. Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell, pp 1–20

  56. Sayed GI, Hassanien AE, Azar AT (2017) Feature selection via a novel chaotic crow search algorithm. Neural Comput Applic, pp 1–18

Download references

Acknowledgment

We would like to thank Dr. Yasmine S. Momen of the clinical pathology department; national liver institute for providing the database that has been used in this work and for her great effort for getting understanding the used dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alaa Tharwat.

Ethics declarations

Conflict of interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sayed, G.I., Tharwat, A. & Hassanien, A.E. Chaotic dragonfly algorithm: an improved metaheuristic algorithm for feature selection. Appl Intell 49, 188–205 (2019). https://doi.org/10.1007/s10489-018-1261-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1261-8

Keywords

Navigation