Skip to main content

2017 | OriginalPaper | Buchkapitel

On the Robustness of Decision Tree Learning Under Label Noise

verfasst von : Aritra Ghosh, Naresh Manwani, P. S. Sastry

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In most practical problems of classifier learning, the training data suffers from label noise. Most theoretical results on robustness to label noise involve either estimation of noise rates or non-convex optimization. Further, none of these results are applicable to standard decision tree learning algorithms. This paper presents some theoretical analysis to show that, under some assumptions, many popular decision tree learning algorithms are inherently robust to label noise. We also present some sample complexity results which provide some bounds on the sample size for the robustness to hold with a high probability. Through extensive simulations we illustrate this robustness.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
For simplicity, we do not consider pruning of the tree.
 
Literatur
1.
Zurück zum Zitat Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)MATH Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)MATH
3.
Zurück zum Zitat Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH
4.
Zurück zum Zitat Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)CrossRef Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)CrossRef
5.
Zurück zum Zitat du Plessis, M.C., Niu, G, Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems (2014) du Plessis, M.C., Niu, G, Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: Advances in Neural Information Processing Systems (2014)
6.
Zurück zum Zitat Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)CrossRef Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)CrossRef
7.
Zurück zum Zitat Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)CrossRef Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)CrossRef
8.
Zurück zum Zitat Lichman, M.: UCI machine learning repository (2013) Lichman, M.: UCI machine learning repository (2013)
9.
Zurück zum Zitat Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef Long, P.M., Servedio, R.A.: Random classification noise defeats all convex potential boosters. Mach. Learn. 78(3), 287–304 (2010)MathSciNetCrossRef
10.
Zurück zum Zitat Manwani, N., Sastry, P.S.: Geometric decision tree. IEEE Trans. Syst. Man Cybern. 42(1), 181–192 (2012)CrossRef Manwani, N., Sastry, P.S.: Geometric decision tree. IEEE Trans. Syst. Man Cybern. 42(1), 181–192 (2012)CrossRef
11.
Zurück zum Zitat Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)CrossRef Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)CrossRef
12.
Zurück zum Zitat Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (2013) Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems (2013)
13.
Zurück zum Zitat Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)CrossRef Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010)CrossRef
14.
Zurück zum Zitat Patrini, G., Nielsen, F., Nock, R., Carioni, M.: Loss factorization, weakly supervised learning and label noise robustness. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 708–717 (2016) Patrini, G., Nielsen, F., Nock, R., Carioni, M.: Loss factorization, weakly supervised learning and label noise robustness. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 708–717 (2016)
15.
Zurück zum Zitat Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
16.
Zurück zum Zitat Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
17.
Zurück zum Zitat Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: The 26th Annual Conference on Learning Theory, 12–14 June 2013, pp. 489–511 (2013) Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: The 26th Annual Conference on Learning Theory, 12–14 June 2013, pp. 489–511 (2013)
18.
Zurück zum Zitat van Rooyen, B., Menon, A., Williamson, R.C.: Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18 (2015) van Rooyen, B., Menon, A., Williamson, R.C.: Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18 (2015)
19.
Zurück zum Zitat Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRef Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRef
Metadaten
Titel
On the Robustness of Decision Tree Learning Under Label Noise
verfasst von
Aritra Ghosh
Naresh Manwani
P. S. Sastry
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57454-7_53

Premium Partner