Skip to main content
Erschienen in: Neural Computing and Applications 6/2014

01.05.2014 | Original Article

Multi-label incremental learning applied to web page categorization

verfasst von: Patrick Marques Ciarelli, Elias Oliveira, Evandro O. T. Salles

Erschienen in: Neural Computing and Applications | Ausgabe 6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-label problems are challenging because each instance may be associated with an unknown number of categories, and the relationship among the categories is not always known. A large amount of data is necessary to infer the required information regarding the categories, but these data are normally available only in small batches and distributed over a period of time. In this work, multi-label problems are tackled using an incremental neural network known as the evolving Probabilistic Neural Network (ePNN). This neural network is capable of continuous learning while maintaining a reduced architecture, so that it can always receive training data when available with no drastic growth of its structure. We carried out a series of experiments on web page data sets and compared the performance of ePNN to that of other multi-label categorizers. On average, ePNN outperformed the other categorizers in four out of five metrics used for evaluation, and the structure of ePNN was less complex than that of the other algorithms evaluated.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Times were obtained using a PC with an Intel Dual Core 2.30 GHz processor with 4 GB of RAM.
 
Literatur
1.
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1998) Modern information retrieval, 1st edn. Addison-Wesley, New York Baeza-Yates R, Ribeiro-Neto B (1998) Modern information retrieval, 1st edn. Addison-Wesley, New York
2.
Zurück zum Zitat Bevington PR, Robinson DK (2003) Data reduction and error analysis for the physical sciences, 3rd edn. Mc Graw Hill, New York Bevington PR, Robinson DK (2003) Data reduction and error analysis for the physical sciences, 3rd edn. Mc Graw Hill, New York
3.
Zurück zum Zitat Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771CrossRef Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771CrossRef
4.
Zurück zum Zitat Bueno R, Traina AJM, Traina JC (2007) Genetic algorithms for approximate similarity queries. Data Knowl Eng 62(3):459–482CrossRef Bueno R, Traina AJM, Traina JC (2007) Genetic algorithms for approximate similarity queries. Data Knowl Eng 62(3):459–482CrossRef
5.
Zurück zum Zitat Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225CrossRef Cheng W, Hullermeier E (2009) Combining instance-based learning and logistic regression for multilabel classification. Mach Learn 76(2–3):211–225CrossRef
6.
Zurück zum Zitat Ciarelli PM, Oliveira E, Salles EOT (2010) An evolving system based on probabilistic neural network. 11th Brazilian symposium on neural networks, pp 1–6 Ciarelli PM, Oliveira E, Salles EOT (2010) An evolving system based on probabilistic neural network. 11th Brazilian symposium on neural networks, pp 1–6
7.
Zurück zum Zitat Ciarelli PM, Oliveira E, Salles EOT (2012) An incremental neural network with a reduced architecture. Neural Netw 35:70–81CrossRef Ciarelli PM, Oliveira E, Salles EOT (2012) An incremental neural network with a reduced architecture. Neural Netw 35:70–81CrossRef
8.
Zurück zum Zitat CNAE (2003) Classificaçõ Nacional de Atividades Econômicas—Fiscal (CNAE-Fiscal) 1.1. Tech. rep., Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro, RJ CNAE (2003) Classificaçõ Nacional de Atividades Econômicas—Fiscal (CNAE-Fiscal) 1.1. Tech. rep., Instituto Brasileiro de Geografia e Estatística (IBGE), Rio de Janeiro, RJ
9.
Zurück zum Zitat Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition, vol 2734, pp 35–49 Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition, vol 2734, pp 35–49
10.
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38MATHMathSciNet Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38MATHMathSciNet
11.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley-Interscience, New York
12.
Zurück zum Zitat Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687 Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687
13.
Zurück zum Zitat Oliveira E, Ciarelli PM, Badue C, Souza AFD (2008) A comparison between a kNN based approach and a PNN algorithm for a multi-label classification problem. In: 8th international conference on intelligent systems design and applications, pp 628–633 Oliveira E, Ciarelli PM, Badue C, Souza AFD (2008) A comparison between a kNN based approach and a PNN algorithm for a multi-label classification problem. In: 8th international conference on intelligent systems design and applications, pp 628–633
14.
Zurück zum Zitat Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes—the art of scientific computing, 3rd edn. Cambridge University Press, New York Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes—the art of scientific computing, 3rd edn. Cambridge University Press, New York
15.
Zurück zum Zitat Saad R, Halgamuge SK, Li J (2007) Polynomial kernel adaptation and extensions to the SVM classifier learning. Neural Comput Appl 17(1):19–25CrossRef Saad R, Halgamuge SK, Li J (2007) Polynomial kernel adaptation and extensions to the SVM classifier learning. Neural Comput Appl 17(1):19–25CrossRef
16.
Zurück zum Zitat Sarinnapakorn K, Kubat M (2008) Induction from multi-label examples in information retrieval systems: a case study. Appl Artif Intell 22(5):407–432CrossRef Sarinnapakorn K, Kubat M (2008) Induction from multi-label examples in information retrieval systems: a case study. Appl Artif Intell 22(5):407–432CrossRef
17.
Zurück zum Zitat Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168CrossRefMATH Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168CrossRefMATH
18.
Zurück zum Zitat Souza AFD, Pedroni F, Oliveira E, Ciarelli PM, Henrique WF, Veronese L, Badue C (2009) Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing 72(10–12):2209–2217CrossRef Souza AFD, Pedroni F, Oliveira E, Ciarelli PM, Henrique WF, Veronese L, Badue C (2009) Automated multi-label text categorization with VG-RAM weightless neural networks. Neurocomputing 72(10–12):2209–2217CrossRef
19.
Zurück zum Zitat Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. IEEE Int Conf Neural Netw 1(24):525–532CrossRef Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. IEEE Int Conf Neural Netw 1(24):525–532CrossRef
20.
Zurück zum Zitat Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. SETN ’08: proceedings of the 5th Hellenic conference on artificial intelligence, pp 401–406 Spyromitros E, Tsoumakas G, Vlahavas I (2008) An empirical study of lazy multilabel classification algorithms. SETN ’08: proceedings of the 5th Hellenic conference on artificial intelligence, pp 401–406
21.
Zurück zum Zitat Vlassis NA, Papakonstantinou G, Tsanakas P (1999) Mixture density estimation based on maximum likelihood and sequential test statistics. Neural Process Lett 9:63–76CrossRef Vlassis NA, Papakonstantinou G, Tsanakas P (1999) Mixture density estimation based on maximum likelihood and sequential test statistics. Neural Process Lett 9:63–76CrossRef
22.
Zurück zum Zitat Yu C, Cui B, Wang S, Su J (2007) Efficient index-based kNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344CrossRef Yu C, Cui B, Wang S, Su J (2007) Efficient index-based kNN join processing for high-dimensional data. Inf Softw Technol 49(4):332–344CrossRef
23.
Zurück zum Zitat Zhang ML, Zhou ZH (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048CrossRefMATH Zhang ML, Zhou ZH (2007) ML-kNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048CrossRefMATH
24.
Zurück zum Zitat Zhang ML, Pena JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229CrossRefMATH Zhang ML, Pena JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229CrossRefMATH
25.
Zurück zum Zitat Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recognit 36:1973–1983CrossRefMATH Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for gaussian mixtures with split-and-merge operation. Pattern Recognit 36:1973–1983CrossRefMATH
Metadaten
Titel
Multi-label incremental learning applied to web page categorization
verfasst von
Patrick Marques Ciarelli
Elias Oliveira
Evandro O. T. Salles
Publikationsdatum
01.05.2014
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 6/2014
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-013-1345-7

Weitere Artikel der Ausgabe 6/2014

Neural Computing and Applications 6/2014 Zur Ausgabe

Premium Partner