Skip to main content
Top
Published in: Neural Computing and Applications 1/2013

01-01-2013 | Cont. Dev. of Neural Compt. & Appln.

A novel weight pruning method for MLP classifiers based on the MAXCORE principle

Published in: Neural Computing and Applications | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We introduce a novel weight pruning methodology for MLP classifiers that can be used for model and/or feature selection purposes. The main concept underlying the proposed method is the MAXCORE principle, which is based on the observation that relevant synaptic weights tend to generate higher correlations between error signals associated with the neurons of a given layer and the error signals propagated back to the previous layer. Nonrelevant (i.e. prunable) weights tend to generate smaller correlations. Using the MAXCORE as a guiding principle, we perform a cross-correlation analysis of the error signals at successive layers. Weights for which the cross-correlations are smaller than a user-defined error tolerance are gradually discarded. Computer simulations using synthetic and real-world data sets show that the proposed method performs consistently better than standard pruning techniques, with much lower computational costs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The AIC has the follow structure \(AIC=-2\ln(\varepsilon_{\rm train})+2N_c \) [23].
 
2
Since the proposed approach is dependent on the classifier model, it belongs to the class of wrappers for feature subset selection ([16]).
 
3
Recall that the task now is feature selection, not pattern classification. Thus, we can train the network with all the available pattern vectors.
 
Literature
1.
go back to reference Aran O, Yildiz OT, Alpaydin E (2009) An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron. Int J Pattern Recogn Artif Intell 23(2):159–190CrossRef Aran O, Yildiz OT, Alpaydin E (2009) An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron. Int J Pattern Recogn Artif Intell 23(2):159–190CrossRef
2.
go back to reference Benardos PG, Vosniakos GC (2007) Optimizing feedforward artificial neural network architecture. Eng Appl Artif Intell 20(3):365–382CrossRef Benardos PG, Vosniakos GC (2007) Optimizing feedforward artificial neural network architecture. Eng Appl Artif Intell 20(3):365–382CrossRef
3.
go back to reference Berthonnaud E, Dimnet J, Roussouly P, Labelle H (2005) Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters. J Spinal Disorders Tech 18(1):40–47CrossRef Berthonnaud E, Dimnet J, Roussouly P, Labelle H (2005) Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters. J Spinal Disorders Tech 18(1):40–47CrossRef
4.
go back to reference Bishop CM (1992) Exact calculation of the hessian matrix for the multi-layer perceptron. Neural Comput 4(4):494–501CrossRef Bishop CM (1992) Exact calculation of the hessian matrix for the multi-layer perceptron. Neural Comput 4(4):494–501CrossRef
5.
go back to reference Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
6.
go back to reference Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531CrossRef Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531CrossRef
7.
go back to reference Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M (1999) No free lunch for early stopping. Neural Comput 11(4):995–1009CrossRef Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M (1999) No free lunch for early stopping. Neural Comput 11(4):995–1009CrossRef
9.
go back to reference Dandurand F, Berthiaume V, Shultz TR (2007) A systematic comparison of flat and standard cascade-correlation using a student-teacher network approximation task. Connect Sci 19(3):223–244CrossRef Dandurand F, Berthiaume V, Shultz TR (2007) A systematic comparison of flat and standard cascade-correlation using a student-teacher network approximation task. Connect Sci 19(3):223–244CrossRef
10.
go back to reference Delogu R, Fanni A, Montisci A (2008) Geometrical synthesis of MLP neural networks. Neurocomputing 71:919–930CrossRef Delogu R, Fanni A, Montisci A (2008) Geometrical synthesis of MLP neural networks. Neurocomputing 71:919–930CrossRef
11.
go back to reference Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399CrossRef Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399CrossRef
12.
go back to reference Fahlman SE, Lebiere C (1990) The cascade-correlation learning architecture. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 2, pp 524–532 Fahlman SE, Lebiere C (1990) The cascade-correlation learning architecture. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 2, pp 524–532
13.
go back to reference Gómez I, Franco L, Jerez JM (2009) Neural network architecture selection: can function complexity help? Neural Process Lett 30:71–87CrossRef Gómez I, Franco L, Jerez JM (2009) Neural network architecture selection: can function complexity help? Neural Process Lett 30:71–87CrossRef
14.
go back to reference Hammer B, Micheli A, Sperduti A (2006) Universal approximation capability of cascade correlation for structures. Neural Comput 17(5):1109–1159MathSciNetCrossRef Hammer B, Micheli A, Sperduti A (2006) Universal approximation capability of cascade correlation for structures. Neural Comput 17(5):1109–1159MathSciNetCrossRef
15.
go back to reference Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 5, pp 164–171 Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 5, pp 164–171
16.
go back to reference Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef
17.
go back to reference Littmann E, Ritter H (1996) Learning and generalization in cascade network architectures. Neural Comput 8(7):1521–1539CrossRef Littmann E, Ritter H (1996) Learning and generalization in cascade network architectures. Neural Comput 8(7):1521–1539CrossRef
18.
go back to reference Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Exp Syst Appl 38(4):4600–4607CrossRef Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Exp Syst Appl 38(4):4600–4607CrossRef
19.
go back to reference Moustakidis S, Theocharis J (2010) SVM-FuzCoC: a novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recogn 43(11):3712–3729MATHCrossRef Moustakidis S, Theocharis J (2010) SVM-FuzCoC: a novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recogn 43(11):3712–3729MATHCrossRef
20.
go back to reference Nakamura T, Judd K, Mees AI, Small M (2006) A comparative study of information criteria for model selection. Int J Bifur Chaos 16(8):2153–2175MathSciNetMATHCrossRef Nakamura T, Judd K, Mees AI, Small M (2006) A comparative study of information criteria for model selection. Int J Bifur Chaos 16(8):2153–2175MathSciNetMATHCrossRef
21.
go back to reference Parekh R, Yang J, Honavar V (2000) Constructive neural-network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2):436–451CrossRef Parekh R, Yang J, Honavar V (2000) Constructive neural-network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2):436–451CrossRef
22.
go back to reference Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208 Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
23.
go back to reference Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems. Wiley, London Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems. Wiley, London
24.
go back to reference Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747CrossRef Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747CrossRef
25.
go back to reference Rocha M, Cortez P, Neves J (2007) Evolution of neural networks for classification and regression. Neurocomputing 70(16–18):1054–1060 Rocha M, Cortez P, Neves J (2007) Evolution of neural networks for classification and regression. Neurocomputing 70(16–18):1054–1060
26.
go back to reference Rocha Neto AR, Barreto GA (2009) On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: a comparative analysis. IEEE Latin Am Trans 7(4):487–496CrossRef Rocha Neto AR, Barreto GA (2009) On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: a comparative analysis. IEEE Latin Am Trans 7(4):487–496CrossRef
27.
go back to reference Saxena A, Saad A (2007) Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl Soft Comput 7(1):441–454CrossRef Saxena A, Saad A (2007) Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl Soft Comput 7(1):441–454CrossRef
28.
go back to reference Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the kullback-leibler divergence. IEEE Trans Neural Netw 18(1):97–106CrossRef Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the kullback-leibler divergence. IEEE Trans Neural Netw 18(1):97–106CrossRef
29.
go back to reference Stathakis D, Kanellopoulos I (2008) Global optimization versus deterministic pruning for the classification of remotely sensed imagery. Photogrammetr Eng Remote Sens 74(10):1259–1265 Stathakis D, Kanellopoulos I (2008) Global optimization versus deterministic pruning for the classification of remotely sensed imagery. Photogrammetr Eng Remote Sens 74(10):1259–1265
30.
go back to reference Trenn S (2008) Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Trans Neural Netw 19(5):836–844CrossRef Trenn S (2008) Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Trans Neural Netw 19(5):836–844CrossRef
31.
go back to reference Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414CrossRef Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414CrossRef
32.
go back to reference Weigend AS, Rumelhart DE, Huberman AB (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems. Morgan Kauffman, San Mateo, vol 3, pp 875–882 Weigend AS, Rumelhart DE, Huberman AB (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems. Morgan Kauffman, San Mateo, vol 3, pp 875–882
33.
go back to reference Xiang C, Ding SQ, Lee TH (2005) Geometric interpretation and architecture selection of the MLP. IEEE Trans Neural Netw 16(1):84–96CrossRef Xiang C, Ding SQ, Lee TH (2005) Geometric interpretation and architecture selection of the MLP. IEEE Trans Neural Netw 16(1):84–96CrossRef
34.
go back to reference Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447CrossRef Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447CrossRef
35.
go back to reference Yu J, Wanga S, Xi L (2008) Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 71(4–6):1054–1060CrossRef Yu J, Wanga S, Xi L (2008) Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 71(4–6):1054–1060CrossRef
Metadata
Title
A novel weight pruning method for MLP classifiers based on the MAXCORE principle
Publication date
01-01-2013
Published in
Neural Computing and Applications / Issue 1/2013
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-011-0748-6

Other articles of this Issue 1/2013

Neural Computing and Applications 1/2013 Go to the issue

Premium Partner