Skip to main content
Top
Published in: Soft Computing 8/2020

11-03-2019 | Focus

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Authors: Eslam. M. Hassib, Ali. I. El-Desouky, Labib. M. Labib, El-Sayed M. El-kenawy

Published in: Soft Computing | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, big data plays a substantial part in information knowledge analysis, manipulation, and forecasting. Analyzing and extracting knowledge from such big datasets are a very challenging task due to the imbalance of data distribution, which could lead to a biased classification results and wrong decisions. The standard classifiers are not capable of handling such datasets. Hence, a new technique for dealing with such datasets is required. This paper proposes a novel classification framework for big data that consists of three developed phases. The first phase is the feature selection phase, which uses the Whale optimization algorithm (WOA) for finding the best set of features. The second phase is the preprocessing phase, which uses the SMOTE algorithm and the LSH-SMOTE algorithm for solving the class imbalance problem. Lastly, the third phase is WOA + BRNN algorithm, which is using the Whale optimization algorithm for training a deep learning approach called bidirectional recurrent neural network for the first time. Our proposed algorithm WOA-BRNN has been tested against nine highly imbalanced datasets one of them is big dataset in terms of area under curve (AUC) against four of the most common use machine learning algorithms (Naïve Bayes, AdaBoostM1, decision table, random tree), in addition to GWO-MLP (training multilayer perceptron using Gray Wolf Optimizer), then we test our algorithm over four well-known datasets against GWO-MLP and particle swarm optimization (PSO-MLP), genetic algorithm (GA-MLP), ant colony optimization (ACO-MLP), evolution strategy (ES-MLP), and population-based incremental learning (PBIL-MLP) in terms of classification accuracy. Experimental results proved that our proposed algorithm WOA + BRNN has achieved promising accuracy and high local optima avoidance, and outperformed four of the most common use machine learning algorithms, and GWO-MLP in terms of AUC.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Ahmed E et al (2017) The role of big data analytics in Internet of Things. Comput Netw 129:459–471CrossRef Ahmed E et al (2017) The role of big data analytics in Internet of Things. Comput Netw 129:459–471CrossRef
go back to reference Al-Smadi M et al (2018) Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J Comput Sci 27:386–393CrossRef Al-Smadi M et al (2018) Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J Comput Sci 27:386–393CrossRef
go back to reference Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44CrossRef Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44CrossRef
go back to reference Barrow D, Kourentzes N (2018) The impact of special days in call arrivals forecasting: a neural network approach to modelling special days. Eur J Oper Res 264(3):967–977MathSciNetCrossRef Barrow D, Kourentzes N (2018) The impact of special days in call arrivals forecasting: a neural network approach to modelling special days. Eur J Oper Res 264(3):967–977MathSciNetCrossRef
go back to reference Bennin KE et al (2018) Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Software Eng 44(6):534–550CrossRef Bennin KE et al (2018) Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Software Eng 44(6):534–550CrossRef
go back to reference Chaudhary P, Gupta BB (2017) A novel framework to alleviate dissemination of XSS worms in online social network (OSN) using view segregation. Neural Netw World 27(1):5CrossRef Chaudhary P, Gupta BB (2017) A novel framework to alleviate dissemination of XSS worms in online social network (OSN) using view segregation. Neural Netw World 27(1):5CrossRef
go back to reference Chaudhary P, Gupta S, Gupta BB (2016) Auditing defense against XSS worms in online social network-based web applications. In: Gupta B, Agrawal DP, Yamaguchi S (eds) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI Global, Pennsylvania, pp 216–245CrossRef Chaudhary P, Gupta S, Gupta BB (2016) Auditing defense against XSS worms in online social network-based web applications. In: Gupta B, Agrawal DP, Yamaguchi S (eds) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI Global, Pennsylvania, pp 216–245CrossRef
go back to reference Chawla NV et al (2012) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV et al (2012) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
go back to reference Din S et al (2018) Service orchestration of optimizing continuous features in industrial surveillance using big data based fog-enabled internet of things. IEEE Access 6:21582–21591CrossRef Din S et al (2018) Service orchestration of optimizing continuous features in industrial surveillance using big data based fog-enabled internet of things. IEEE Access 6:21582–21591CrossRef
go back to reference Faris H, Aljarah I, Mirjalili S (2016) Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell 45(2):322–332CrossRef Faris H, Aljarah I, Mirjalili S (2016) Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell 45(2):322–332CrossRef
go back to reference Goodfellow I et al (2016) Deep learning, vol 1. MIT Press, CambridgeMATH Goodfellow I et al (2016) Deep learning, vol 1. MIT Press, CambridgeMATH
go back to reference Grover V et al (2018) Creating strategic business value from big data analytics: a research framework. J Manag Inf Syst 35(2):388–423CrossRef Grover V et al (2018) Creating strategic business value from big data analytics: a research framework. J Manag Inf Syst 35(2):388–423CrossRef
go back to reference Guan Y et al (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Design automation conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE Guan Y et al (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Design automation conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE
go back to reference Gupta BB (ed) (2018) Computer and cyber security: principles, algorithm, applications, and perspectives. CRC Press, New York Gupta BB (ed) (2018) Computer and cyber security: principles, algorithm, applications, and perspectives. CRC Press, New York
go back to reference Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239CrossRef Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239CrossRef
go back to reference Hassib EM et al (2018) LSH-SMOTE: a modified SMOTE algorithm for imbalanced data-sets. Ciência e Técnica Vitivinícola 33:50–65 Hassib EM et al (2018) LSH-SMOTE: a modified SMOTE algorithm for imbalanced data-sets. Ciência e Técnica Vitivinícola 33:50–65
go back to reference Huang W et al (2015) Scalable Gaussian process regression using deep neural networks. In: IJCAI Huang W et al (2015) Scalable Gaussian process regression using deep neural networks. In: IJCAI
go back to reference Huang J et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol 4 Huang J et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol 4
go back to reference Kim JS, Jung S (2015) Implementation of the RBF neural chip with the back-propagation algorithm for on-line learning. Appl Soft Comput 29:233–244CrossRef Kim JS, Jung S (2015) Implementation of the RBF neural chip with the back-propagation algorithm for on-line learning. Appl Soft Comput 29:233–244CrossRef
go back to reference Li J et al (2017) Rare event prediction using similarity majority under-sampling technique. In: International conference on soft computing in data science. Springer, Singapore Li J et al (2017) Rare event prediction using similarity majority under-sampling technique. In: International conference on soft computing in data science. Springer, Singapore
go back to reference Linggard R, Myers DJ, Nightingale C (eds) (2012) Neural networks for vision, speech and natural language, vol 1. Springer, BerlinMATH Linggard R, Myers DJ, Nightingale C (eds) (2012) Neural networks for vision, speech and natural language, vol 1. Springer, BerlinMATH
go back to reference Liu W et al (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26CrossRef Liu W et al (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26CrossRef
go back to reference Manogaran G, Thota C, Lopez D (2018) Human–computer interaction with big data analytics. In: Lopez D, Durai MA (eds) HCI challenges and privacy preservation in big data security. IGI Global, Pennsylvania, pp 1–22 Manogaran G, Thota C, Lopez D (2018) Human–computer interaction with big data analytics. In: Lopez D, Durai MA (eds) HCI challenges and privacy preservation in big data security. IGI Global, Pennsylvania, pp 1–22
go back to reference Mirjalili S (2015) How effective is the Grey Wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161CrossRef Mirjalili S (2015) How effective is the Grey Wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161CrossRef
go back to reference Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073CrossRef Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073CrossRef
go back to reference Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67CrossRef Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67CrossRef
go back to reference Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61CrossRef Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61CrossRef
go back to reference Pascanu R, Montufar G, Bengio Y (2013) On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098 Pascanu R, Montufar G, Bengio Y (2013) On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:​1312.​6098
go back to reference Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29CrossRef Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29CrossRef
go back to reference Plageras AP et al (2017) Efficient large-scale medical data (ehealth big data) analytics in internet of things. In: 2017 IEEE 19th conference on business informatics (CBI), vol 2. IEEE Plageras AP et al (2017) Efficient large-scale medical data (ehealth big data) analytics in internet of things. In: 2017 IEEE 19th conference on business informatics (CBI), vol 2. IEEE
go back to reference Plageras AP et al (2018) Efficient IoT-based sensor BIG Data collection—processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357CrossRef Plageras AP et al (2018) Efficient IoT-based sensor BIG Data collection—processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357CrossRef
go back to reference Pour SG, Girosi F (2016) Joint prediction of chronic conditions onset: comparing multivariate probits with multiclass support vector machines. In: Symposium on conformal and probabilistic prediction with applications. Springer, Cham Pour SG, Girosi F (2016) Joint prediction of chronic conditions onset: comparing multivariate probits with multiclass support vector machines. In: Symposium on conformal and probabilistic prediction with applications. Springer, Cham
go back to reference Qin P, Xu W, Guo J (2017) Designing an adaptive attention mechanism for relation classification. In: 2017 International joint conference on neural networks (IJCNN). IEEE Qin P, Xu W, Guo J (2017) Designing an adaptive attention mechanism for relation classification. In: 2017 International joint conference on neural networks (IJCNN). IEEE
go back to reference Rennie JD et al (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (icml-03) Rennie JD et al (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (icml-03)
go back to reference Rezaeianzadeh M et al (2014) Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput Appl 25(1):25–37CrossRef Rezaeianzadeh M et al (2014) Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput Appl 25(1):25–37CrossRef
go back to reference Sahoo RR, Ray M (2018) Metaheuristic techniques for test case generation: a review. J Inf Technol Res 11(1):158–171CrossRef Sahoo RR, Ray M (2018) Metaheuristic techniques for test case generation: a review. J Inf Technol Res 11(1):158–171CrossRef
go back to reference Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRef
go back to reference Schuster M, Paliwal KK, Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Ryan EE (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef Schuster M, Paliwal KK, Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Ryan EE (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef
go back to reference Sivakumar S, Sivakumar S (2017) Marginally stable triangular recurrent neural network architecture for time series prediction. IEEE Trans Cybern 48(10):2836–2850CrossRef Sivakumar S, Sivakumar S (2017) Marginally stable triangular recurrent neural network architecture for time series prediction. IEEE Trans Cybern 48(10):2836–2850CrossRef
go back to reference Sivarajah U et al (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286CrossRef Sivarajah U et al (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286CrossRef
go back to reference Storey VC, Song I-Y (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67CrossRef Storey VC, Song I-Y (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67CrossRef
go back to reference Voyant C et al (2017) Machine learning methods for solar radiation forecasting: a review. Renewable Energy 105:569–582CrossRef Voyant C et al (2017) Machine learning methods for solar radiation forecasting: a review. Renewable Energy 105:569–582CrossRef
go back to reference Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863CrossRef Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863CrossRef
go back to reference Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13CrossRef Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13CrossRef
go back to reference Warde-Farley D (2018) Feedforward deep architectures for classification and synthesis Warde-Farley D (2018) Feedforward deep architectures for classification and synthesis
go back to reference Zalesky A et al (2016) Connectome sensitivity or specificity: which is more important? Neuroimage 142:407–420CrossRef Zalesky A et al (2016) Connectome sensitivity or specificity: which is more important? Neuroimage 142:407–420CrossRef
go back to reference Zhou L et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361CrossRef Zhou L et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361CrossRef
Metadata
Title
WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network
Authors
Eslam. M. Hassib
Ali. I. El-Desouky
Labib. M. Labib
El-Sayed M. El-kenawy
Publication date
11-03-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 8/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03901-y

Other articles of this Issue 8/2020

Soft Computing 8/2020 Go to the issue

Premium Partner