Skip to main content
Top
Published in: Neural Computing and Applications 15/2021

29-01-2021 | Original Article

Deep neural networks architecture driven by problem-specific information

Authors: Daniel Urda, Francisco J. Veredas, Javier González-Enrique, Juan J. Ruiz-Aguilar, Jose M. Jerez, Ignacio J. Turias

Published in: Neural Computing and Applications | Issue 15/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Deep learning provides a variety of neural network-based models, known as deep neural networks (DNNs), which are being successfully used in several domains to build highly accurate predictors. A key factor which usually makes DNNs to outperform traditional machine learning models is the amount of data that is nowadays accessible and available. Nevertheless, there are other factors linked to DNNs topologies that may also have influence on the predictive performance of DNN models. In particular, fully connected deep neural networks (fc-DNNs) typically struggle in achieving good performance rates when applied to small datasets. This is due to the high number of parameters which need to be learned when training this kind of models, which makes them prone to over-fitting issues. In this paper, authors propose the use of problem-specific information in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc-DNN), in such a way that network topology is driven by prior knowledge. This work compares two baseline models, the elastic net and fc-DNNs, to pc-DNNs applied on three synthetic datasets with different number of samples. Synthetic data was generated to estimate the goodness of using problem-specific information to drive network architectures. Furthermore, a similar analysis is performed herein on a real-world problem dataset to show the benefits of pc-DNN models in term of predictive performance. The results of the analysis showed that pc-DNNs with built-in problem-specific information clearly outperformed the elastic net and fc-DNNs in most of the datasets used, in either synthetic or real-world problems. The pc-DNNs turned out to be a useful model, especially when it is applied to small- or medium-size datasets, on which it significantly outperformed the baseline models considered in this study. Specifically, the pc-DNNs achieved AUC and MSE improvement rates of (\(8.21\%\), \(19.79\%\)) and (\(6.65\%\), \(20.54\%\)) in small- and medium-size datasets for both case studies analyzed, the synthetic and real-world problem, respectively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
2.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR. IEEE Computer Society, Washington, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR. IEEE Computer Society, Washington, pp 770–778
3.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., New York, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., New York, pp 1097–1105
4.
go back to reference Cao Y, Geddes TA, Hwa Yang JY, Yang P (2020) Ensemble deep learning in bioinformatics. Nat Mach Intell 2(9):500–508CrossRef Cao Y, Geddes TA, Hwa Yang JY, Yang P (2020) Ensemble deep learning in bioinformatics. Nat Mach Intell 2(9):500–508CrossRef
5.
go back to reference Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878CrossRef Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878CrossRef
6.
go back to reference Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118CrossRef Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118CrossRef
7.
go back to reference Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(26):286 Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(26):286
8.
go back to reference Amodei D, Ananthanarayanan S, et al (2016) Deep speech 2 : end-to-end speech recognition in english and mandarin. In: Balcan MF, Weinberger KQ (eds.) Proceedings of The 33rd international conference on machine learning, proceedings of machine learning research. PMLR. vol. 48, pp. 173–182 Amodei D, Ananthanarayanan S, et al (2016) Deep speech 2 : end-to-end speech recognition in english and mandarin. In: Balcan MF, Weinberger KQ (eds.) Proceedings of The 33rd international conference on machine learning, proceedings of machine learning research. PMLR. vol. 48, pp. 173–182
9.
go back to reference Shang C, Yang F, Huang D, Lyu W (2014) Data-driven soft sensor development based on deep learning technique. J Process Control 24(3):223–233CrossRef Shang C, Yang F, Huang D, Lyu W (2014) Data-driven soft sensor development based on deep learning technique. J Process Control 24(3):223–233CrossRef
10.
go back to reference Lee D, Kang S, Shin J (2017) Using deep learning techniques to forecast environmental consumption level. Sustain Sci Pract Policy 9(10):1894 Lee D, Kang S, Shin J (2017) Using deep learning techniques to forecast environmental consumption level. Sustain Sci Pract Policy 9(10):1894
11.
go back to reference Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, ACL ’01, pp 26–33 Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, ACL ’01, pp 26–33
12.
go back to reference Pereira F, Norvig P, Halevy A (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24:8–12 Pereira F, Norvig P, Halevy A (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24:8–12
13.
go back to reference Koumakis L (2020) Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 18:1466–1473CrossRef Koumakis L (2020) Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 18:1466–1473CrossRef
14.
go back to reference Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L (2019) Deep learning intervention for health care challenges: some biomedical domain considerations. JMIR mHealth uHealth 7(8):e11966CrossRef Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L (2019) Deep learning intervention for health care challenges: some biomedical domain considerations. JMIR mHealth uHealth 7(8):e11966CrossRef
18.
go back to reference Antoniou A, Storkey A, Edwards H (2018) Data augmentation generative adversarial networks Antoniou A, Storkey A, Edwards H (2018) Data augmentation generative adversarial networks
19.
21.
go back to reference Nusrat I, Jang SB (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10(11):648CrossRef Nusrat I, Jang SB (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10(11):648CrossRef
25.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef
26.
go back to reference Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks
27.
go back to reference Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L (2018) Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), pp 728–734 Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L (2018) Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), pp 728–734
28.
go back to reference Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
31.
go back to reference Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRef
34.
go back to reference Wei W,Meng D, Zhao Q, Xu Z, Wu (2019) emi-supervised transfer learning for image rain removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Wei W,Meng D, Zhao Q, Xu Z, Wu (2019) emi-supervised transfer learning for image rain removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
35.
go back to reference Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67MathSciNetMATH Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67MathSciNetMATH
37.
go back to reference López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 15(3):e0230536CrossRef López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 15(3):e0230536CrossRef
39.
go back to reference Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383CrossRef Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383CrossRef
43.
go back to reference Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM (2018) BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol 12(Suppl 5):94CrossRef Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM (2018) BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol 12(Suppl 5):94CrossRef
44.
go back to reference Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 8301–8311 Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 8301–8311
49.
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288MATH Tibshirani R (1996) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288MATH
50.
go back to reference Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRef
52.
go back to reference Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10, pp 807–814 Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10, pp 807–814
54.
go back to reference Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
58.
go back to reference Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, Suh H (2019) Long-term NO2 exposures and cause-specific mortality in american older adults. Environ Int 124:10–15CrossRef Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, Suh H (2019) Long-term NO2 exposures and cause-specific mortality in american older adults. Environ Int 124:10–15CrossRef
59.
go back to reference Sanyal S, Rochereau T, Maesano CN, Com-Ruelle L, Annesi-Maesano I (2018) Long-Term effect of outdoor air pollution on mortality and morbidity: a 12-year Follow-Up study for metropolitan France. Int J Environ Res Public Health 15(11):2487CrossRef Sanyal S, Rochereau T, Maesano CN, Com-Ruelle L, Annesi-Maesano I (2018) Long-Term effect of outdoor air pollution on mortality and morbidity: a 12-year Follow-Up study for metropolitan France. Int J Environ Res Public Health 15(11):2487CrossRef
61.
go back to reference Kříž R (2014) Chaos in nitrogen dioxide concentration time series and its prediction. In: Zelinka I, Suganthan PN, Chen G, Snasel V, Abraham A, Rössler O (eds) Nostradamus 2014: prediction, modeling and analysis of complex systems. Springer International Publishing, Cham, pp 365–376 Kříž R (2014) Chaos in nitrogen dioxide concentration time series and its prediction. In: Zelinka I, Suganthan PN, Chen G, Snasel V, Abraham A, Rössler O (eds) Nostradamus 2014: prediction, modeling and analysis of complex systems. Springer International Publishing, Cham, pp 365–376
62.
go back to reference Liu Y, Tian Y, Chen M (2017) Research on the prediction of carbon emission based on the chaos theory and neural network. Int J Bioautom 21(4):339–348 Liu Y, Tian Y, Chen M (2017) Research on the prediction of carbon emission based on the chaos theory and neural network. Int J Bioautom 21(4):339–348
65.
go back to reference Izonin I, Greguš ml, M, Tkachenko R, Logoyda M, Mishchuk O, Kynash Y, (2019) Sgd-based wiener polynomial approximation for missing data recovery in air pollution monitoring dataset. In: Rojas I, Joya G, Catala A (eds) Adv Comput Intell. Springer International Publishing, Cham, pp 781–793CrossRef Izonin I, Greguš ml, M, Tkachenko R, Logoyda M, Mishchuk O, Kynash Y, (2019) Sgd-based wiener polynomial approximation for missing data recovery in air pollution monitoring dataset. In: Rojas I, Joya G, Catala A (eds) Adv Comput Intell. Springer International Publishing, Cham, pp 781–793CrossRef
67.
go back to reference Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. Volume 2, IJCAI’95, pp 1137–1143 Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. Volume 2, IJCAI’95, pp 1137–1143
72.
go back to reference Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Int Res 11(1):169–198MATH Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Int Res 11(1):169–198MATH
73.
go back to reference Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923CrossRef Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923CrossRef
74.
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
75.
go back to reference Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. Proc Fifteenth Int Conf Artif Intell Stat 22:665–675 Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. Proc Fifteenth Int Conf Artif Intell Stat 22:665–675
Metadata
Title
Deep neural networks architecture driven by problem-specific information
Authors
Daniel Urda
Francisco J. Veredas
Javier González-Enrique
Juan J. Ruiz-Aguilar
Jose M. Jerez
Ignacio J. Turias
Publication date
29-01-2021
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 15/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-05702-7

Other articles of this Issue 15/2021

Neural Computing and Applications 15/2021 Go to the issue

Premium Partner