Skip to main content
Erschienen in: Computing 6/2019

03.01.2019

Improved randomized learning algorithms for imbalanced and noisy educational data classification

verfasst von: Ming Li, Changqin Huang, Dianhui Wang, Qintai Hu, Jia Zhu, Yong Tang

Erschienen in: Computing | Ausgabe 6/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite that neural networks have demonstrated their good potential to be used in constructing learners which exhibit strong predictive performance, there are still some uncertainty issues that can greatly affect the effectiveness of the employed supervised learning algorithms, such as class imbalance and labeling errors (or class noise). Technically, imbalanced data resource can cause more difficulties or limitations for learning algorithms to distinguish different classes, while data with labeling errors can lead to an unreasonable problem formulation due to incorrect hypotheses. Indeed, noise and class imbalance are pervasive problems in the domain of educational data analytics. This study aims at developing improved randomized learning algorithms by investigating a novel type of cost function that focuses on the combined effects of class imbalance and class noise. Instead of concerning these uncertainty issues isolation, we present a convex combination of robust and imbalanced modelling objectives, contributing to a generalized formulation of weighted least squares problems by which the improved randomized learner models can be built. Our experimental study on several educational data classification tasks have verified the advantages of our proposed algorithms, in comparison with some existing methods that either takes no account of class imbalance and labeling errors, or merely consider one specific aspect in problem-solving.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abellán J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: International symposium on foundations of information and knowledge systems, Springer, pp 248–265 Abellán J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: International symposium on foundations of information and knowledge systems, Springer, pp 248–265
2.
Zurück zum Zitat Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167CrossRefMATH Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167CrossRefMATH
3.
Zurück zum Zitat Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Proceedings of the 5th future business technology conference, pp 5–12 Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Proceedings of the 5th future business technology conference, pp 5–12
4.
Zurück zum Zitat Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefMATH Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefMATH
5.
Zurück zum Zitat Gorban AN, Tyukin IY, Prokhorov DV, Sofeikov KI (2016) Approximation with random bases: Pro et contra. Inf Sci 364:129–145CrossRef Gorban AN, Tyukin IY, Prokhorov DV, Sofeikov KI (2016) Approximation with random bases: Pro et contra. Inf Sci 364:129–145CrossRef
6.
Zurück zum Zitat He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284 He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284
7.
Zurück zum Zitat Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329CrossRef Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329CrossRef
8.
Zurück zum Zitat Khardon R, Wachman G (2007) Noise tolerant variants of the perceptron algorithm. J Mach Learn Res 8(Feb):227–248MATH Khardon R, Wachman G (2007) Noise tolerant variants of the perceptron algorithm. J Mach Learn Res 8(Feb):227–248MATH
9.
Zurück zum Zitat Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830CrossRef Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830CrossRef
10.
Zurück zum Zitat Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern A Syst Hum 41(3):552–568CrossRef Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern A Syst Hum 41(3):552–568CrossRef
11.
Zurück zum Zitat Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef
12.
Zurück zum Zitat Lancaster P, Tismenetsky M (1985) The theory of matrices: with applications, 2nd edn. Academic Press, San DiegoMATH Lancaster P, Tismenetsky M (1985) The theory of matrices: with applications, 2nd edn. Academic Press, San DiegoMATH
13.
Zurück zum Zitat Li M, Huang C, Wang D (2019) Robust stochastic configuration networks with maximum correntropy criterion for uncertain data regression. Inf Sci 473:73–86CrossRef Li M, Huang C, Wang D (2019) Robust stochastic configuration networks with maximum correntropy criterion for uncertain data regression. Inf Sci 473:73–86CrossRef
14.
Zurück zum Zitat Li M, Wang D (2016) Insights into randomized algorithms for neural networks: Practical issues and common pitfalls. Inf Sci 382:170–178 Li M, Wang D (2016) Insights into randomized algorithms for neural networks: Practical issues and common pitfalls. Inf Sci 382:170–178
16.
Zurück zum Zitat Lin CF, Wang SD (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern Recognit Lett 25(14):1647–1656CrossRef Lin CF, Wang SD (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern Recognit Lett 25(14):1647–1656CrossRef
17.
Zurück zum Zitat Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056 Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056
18.
Zurück zum Zitat Oza NC (2004) Aveboost2: boosting for noisy data. In: International workshop on multiple classifier systems, Springer, pp 31–40 Oza NC (2004) Aveboost2: boosting for noisy data. In: International workshop on multiple classifier systems, Springer, pp 31–40
19.
Zurück zum Zitat Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180CrossRef Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180CrossRef
20.
Zurück zum Zitat Pao YH, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79CrossRef Pao YH, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79CrossRef
22.
Zurück zum Zitat Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, Springer, pp 884–893 Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, Springer, pp 884–893
23.
Zurück zum Zitat Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRefMATH Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRefMATH
24.
Zurück zum Zitat Wang D, Cui C (2017) Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Inf Sci 417:55–71CrossRef Wang D, Cui C (2017) Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Inf Sci 417:55–71CrossRef
25.
Zurück zum Zitat Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cybern q 47(10):3466–3479CrossRef Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cybern q 47(10):3466–3479CrossRef
26.
Zurück zum Zitat Wang D, Li M (2017) Robust stochastic configuration networks with kernel density estimation for uncertain data regression. Inf Sci 412:210–222MathSciNetCrossRef Wang D, Li M (2017) Robust stochastic configuration networks with kernel density estimation for uncertain data regression. Inf Sci 412:210–222MathSciNetCrossRef
27.
Zurück zum Zitat Wang D, Li M (2018) Deep stochastic configuration networks with universal approximation property. In: Proceedings of international joint conference on neural networks, IEEE, pp 1–8 Wang D, Li M (2018) Deep stochastic configuration networks with universal approximation property. In: Proceedings of international joint conference on neural networks, IEEE, pp 1–8
Metadaten
Titel
Improved randomized learning algorithms for imbalanced and noisy educational data classification
verfasst von
Ming Li
Changqin Huang
Dianhui Wang
Qintai Hu
Jia Zhu
Yong Tang
Publikationsdatum
03.01.2019
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 6/2019
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-018-00698-w

Weitere Artikel der Ausgabe 6/2019

Computing 6/2019 Zur Ausgabe