Top

Published in:

29-01-2021 | Original Article

Deep neural networks architecture driven by problem-specific information

Authors: Daniel Urda, Francisco J. Veredas, Javier González-Enrique, Juan J. Ruiz-Aguilar, Jose M. Jerez, Ignacio J. Turias

Published in: Neural Computing and Applications | Issue 15/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep learning provides a variety of neural network-based models, known as deep neural networks (DNNs), which are being successfully used in several domains to build highly accurate predictors. A key factor which usually makes DNNs to outperform traditional machine learning models is the amount of data that is nowadays accessible and available. Nevertheless, there are other factors linked to DNNs topologies that may also have influence on the predictive performance of DNN models. In particular, fully connected deep neural networks (fc-DNNs) typically struggle in achieving good performance rates when applied to small datasets. This is due to the high number of parameters which need to be learned when training this kind of models, which makes them prone to over-fitting issues. In this paper, authors propose the use of problem-specific information in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc-DNN), in such a way that network topology is driven by prior knowledge. This work compares two baseline models, the elastic net and fc-DNNs, to pc-DNNs applied on three synthetic datasets with different number of samples. Synthetic data was generated to estimate the goodness of using problem-specific information to drive network architectures. Furthermore, a similar analysis is performed herein on a real-world problem dataset to show the benefits of pc-DNN models in term of predictive performance. The results of the analysis showed that pc-DNNs with built-in problem-specific information clearly outperformed the elastic net and fc-DNNs in most of the datasets used, in either synthetic or real-world problems. The pc-DNNs turned out to be a useful model, especially when it is applied to small- or medium-size datasets, on which it significantly outperformed the baseline models considered in this study. Specifically, the pc-DNNs achieved AUC and MSE improvement rates of (\(8.21\%\), \(19.79\%\)) and (\(6.65\%\), \(20.54\%\)) in small- and medium-size datasets for both case studies analyzed, the synthetic and real-world problem, respectively.

previous article On-chip trainable hardware-based deep Q-networks approximating a backpropagation algorithm

next article Optimizing a two-level closed-loop supply chain under the vendor managed inventory contract and learning: Fibonacci, GA, IWO, MFO algorithms

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Code available at: https://github.com/durda-ubu/pcDNNs/blob/main/code/activations.py.

Data available at: https://github.com/durda-ubu/pcDNNs/tree/main/synthetic%20dataset.

Apply for access at: http://www.juntadeandalucia.es/medioambiente/servtc5/WebClima/menu_consultas.jsp?b=s.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR. IEEE Computer Society, Washington, pp 770–778

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., New York, pp 1097–1105

Cao Y, Geddes TA, Hwa Yang JY, Yang P (2020) Ensemble deep learning in bioinformatics. Nat Mach Intell 2(9):500–508CrossRef

Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878CrossRef

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118CrossRef

Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(26):286

Amodei D, Ananthanarayanan S, et al (2016) Deep speech 2 : end-to-end speech recognition in english and mandarin. In: Balcan MF, Weinberger KQ (eds.) Proceedings of The 33rd international conference on machine learning, proceedings of machine learning research. PMLR. vol. 48, pp. 173–182

Shang C, Yang F, Huang D, Lyu W (2014) Data-driven soft sensor development based on deep learning technique. J Process Control 24(3):223–233CrossRef

10.

Lee D, Kang S, Shin J (2017) Using deep learning techniques to forecast environmental consumption level. Sustain Sci Pract Policy 9(10):1894

11.

Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, ACL ’01, pp 26–33

12.

Pereira F, Norvig P, Halevy A (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24:8–12

13.

Koumakis L (2020) Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 18:1466–1473CrossRef

14.

Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L (2019) Deep learning intervention for health care challenges: some biomedical domain considerations. JMIR mHealth uHealth 7(8):e11966CrossRef

15.

Moradi R, Berangi C, Minaei B (2020) A survey of regularization strategies for deep models. Artif Intell Rev 53(6):3947–3985. https://doi.org/10.1007/s10462-019-09784-7CrossRef

16.

Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) Transfer learning using computational intelligence: a survey. Knowl Based Syst 80:14–23. https://doi.org/10.1016/j.knosys.2015.01.010CrossRef

17.

Shorten C, Khoshgoftaar TM (2019) A n for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0CrossRef

18.

Antoniou A, Storkey A, Edwards H (2018) Data augmentation generative adversarial networks

19.

Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Gr Stat 22(2):231–245MathSciNetCrossRef

20.

Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249CrossRef

21.

Nusrat I, Jang SB (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10(11):648CrossRef

22.

Ghods A, Cook DJ (2020) A survey of deep network techniques all classifiers can adopt. Data Min Knowl Discov. https://doi.org/10.1007/s10618-020-00722-8CrossRef

23.

Noh H, You T, Mun J, Han B (2017) Regularizing deep neural networks by noise: its interpretation and optimization. In: Guyon I, Luxburg UV, Bengio , Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5109–5118. Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2017/file/217e342fc01668b10cb1188d40d3370e-Paper.pdf

24.

Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90. https://doi.org/10.1016/j.neunet.2018.09.009CrossRef

25.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef

26.

Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks

27.

Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L (2018) Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), pp 728–734

28.

Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings

29.

Li X, Zhang W, Ding Q, Sun JQ (2020) Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J Intell Manuf 31:433–452. https://doi.org/10.1007/s10845-018-1456-1CrossRef

30.

Liu S, Lee K, Lee I (2020) Document-level multi-topic sentiment classification of email data with bilstm and data augmentation. Knowl Based Syst 197(105):918. https://doi.org/10.1016/j.knosys.2020.105918CrossRef

31.

Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef

32.

Liang G, Zheng L (2020) A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput Method Progr Biomed 187(104):964. https://doi.org/10.1016/j.cmpb.2019.06.023CrossRef

33.

Khan S, Islam N, Jan Z, Ud Din I, Rodrigues JJPC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1–6. https://doi.org/10.1016/j.patrec.2019.03.022CrossRef

34.

Wei W,Meng D, Zhao Q, Xu Z, Wu (2019) emi-supervised transfer learning for image rain removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

35.

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67MathSciNetMATH

36.

Al-Smadi M, Al-Zboon S, Jararweh Y, Juola P (2020) Transfer learning for Arabic named entity recognition with deep neural networks. IEEE Access 8:37736–37745. https://doi.org/10.1109/ACCESS.2020.2973319CrossRef

37.

López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 15(3):e0230536CrossRef

38.

Pesciullesi G, Schwaller P, Laino T, Reymond JL (2020) Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat Commun 11:4874. https://doi.org/10.1038/s41467-020-18671-7CrossRef

39.

Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383CrossRef

40.

Nosrati MS, Hamarneh G (2016) Incorporating prior knowledge in medical image segmentation: a survey. CoRR abs/1607.01092. http://arxiv.org/abs/1607.01092

41.

Luque-Baena R, Urda D, Gonzalo Claros M, Franco L, Jerez J (2014) Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J Biomed Inform 49:32–44. https://doi.org/10.1016/j.jbi.2014.01.006CrossRef

42.

Kim Y, Kim Y, Lee S, Yang H, Kim S (2019) Personalized prediction of acquired resistance to EGFR-targeted inhibitors using a pathway-based machine learning approach. Cancers 11(1):45. https://doi.org/10.3390/cancers11010045CrossRef

43.

Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM (2018) BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol 12(Suppl 5):94CrossRef

44.

Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 8301–8311

45.

Tian S, Wang C, Wang B (2019) Incorporating pathway information into feature selection towards better performed gene signatures. BioMed Res Int 2019. https://doi.org/10.1155/2019/2497509CrossRef

46.

Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATHCrossRef

47.

Nilashi M, Bagherifard K, Rahmani M, Rafe V (2017) A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques. Comput Ind Eng 109:357–368. https://doi.org/10.1016/j.cie.2017.05.016CrossRef

48.

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320MathSciNetMATHCrossRef

49.

Tibshirani R (1996) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288MATH

50.

Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef

51.

Hassabis D, Kumaran D, Summerfield C, Botvinick M (2017) Neuroscience-inspired artificial intelligence. Neuron 95(2):245–258. https://doi.org/10.1016/j.neuron.2017.06.011CrossRef

52.

Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10, pp 807–814

53.

KiseÎák J, Lu Y, Svihra J, Szépe P, Stehlík M, (2020) SPOCU: scaled polynomial constant unit activation function. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05182-1CrossRef

54.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH

55.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR. http://arxiv.org/abs/1502.03167

56.

Chollet F, Allaire J, et al (2017) R interface to keras. https://github.com/rstudio/keras

57.

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. http://arxiv.org/abs/1412.6980

58.

Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, Suh H (2019) Long-term NO2 exposures and cause-specific mortality in american older adults. Environ Int 124:10–15CrossRef

59.

Sanyal S, Rochereau T, Maesano CN, Com-Ruelle L, Annesi-Maesano I (2018) Long-Term effect of outdoor air pollution on mortality and morbidity: a 12-year Follow-Up study for metropolitan France. Int J Environ Res Public Health 15(11):2487CrossRef

60.

Sabolová R, Sečkárová V, Dušek J, Stehlík M (2015) Entropy based statistical inference for methane emissions released from wetland. Chemom Intell Lab Syst 141:125–133. https://doi.org/10.1016/j.chemolab.2014.12.008CrossRef

61.

Kříž R (2014) Chaos in nitrogen dioxide concentration time series and its prediction. In: Zelinka I, Suganthan PN, Chen G, Snasel V, Abraham A, Rössler O (eds) Nostradamus 2014: prediction, modeling and analysis of complex systems. Springer International Publishing, Cham, pp 365–376

62.

Liu Y, Tian Y, Chen M (2017) Research on the prediction of carbon emission based on the chaos theory and neural network. Int J Bioautom 21(4):339–348

63.

Stehlík M, Dusek J, Kiselák J, (2016) Missing chaos in global climate change data interpreting? Ecol Complex 25:53–59. https://doi.org/10.1016/j.ecocom.2015.12.003CrossRef

64.

Navares R, Aznarte JL (2020) Predicting air quality with deep learning lstm: towards comprehensive models. Ecol Inform 55:101019. https://doi.org/10.1016/j.ecoinf.2019.101019CrossRef

65.

Izonin I, Greguš ml, M, Tkachenko R, Logoyda M, Mishchuk O, Kynash Y, (2019) Sgd-based wiener polynomial approximation for missing data recovery in air pollution monitoring dataset. In: Rojas I, Joya G, Catala A (eds) Adv Comput Intell. Springer International Publishing, Cham, pp 781–793CrossRef

66.

Wang J, Song G (2018) A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314:198–206. https://doi.org/10.1016/j.neucom.2018.06.049CrossRef

67.

Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. Volume 2, IJCAI’95, pp 1137–1143

68.

AlBadawy EA, Saha A, Mazurowski MA (2018) Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 45(3):1150–1158. https://doi.org/10.1002/mp.12752CrossRef

69.

Chui KT, Tsang KF, Chi HR, Ling BWK, Wu CK (2016) An accurate ECG-based transportation safety drowsiness detection scheme. IEEE Trans Ind Inform 12(4):1438–1452. https://doi.org/10.1109/TII.2016.2573259CrossRef

70.

Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83. https://doi.org/10.1016/j.csda.2017.11.003MathSciNetMATHCrossRef

71.

Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2018) mlrMBO: a modular framework for model-based optimization of expensive black-box functions. http://arxiv.org/abs/1703.03373

72.

Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Int Res 11(1):169–198MATH

73.

Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923CrossRef

74.

Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH

75.

Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. Proc Fifteenth Int Conf Artif Intell Stat 22:665–675

Title: Deep neural networks architecture driven by problem-specific information
Authors: Daniel Urda
Francisco J. Veredas
Javier González-Enrique
Juan J. Ruiz-Aguilar
Jose M. Jerez
Ignacio J. Turias
Publication date: 29-01-2021
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 15/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-05702-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 15/2021

Speech synthesis using generative adversarial network for improving readability of Hindi words to recuperate from dyslexia

Cross-sample entropy for the study of coordinated brain activity in calm and distress conditions with electroencephalographic recordings

Full-state neural network observer-based hybrid quantum diagonal recurrent neural network adaptive tracking control

Brain tumor classification in magnetic resonance image using hard swish-based RELU activation function-convolutional neural network

Making recommendations using transfer learning

Design optimization for a compliant mechanism based on computational intelligence method

Premium Partner