skip to main content
research-article

Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance

Published:05 March 2021Publication History
Skip Abstract Section

Abstract

The study of model bias and variance with respect to decision boundaries is critically important in supervised learning and artificial intelligence. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual entities and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model’s learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data present many labeling challenges, and we show that this new approach improves model performance and reduces model variance.

References

  1. Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. 2012. Learning from Data. AMLBook.Google ScholarGoogle Scholar
  2. Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the KDD Workshop, Vol. 10. 359--370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Thomas B. Berrett, Richard J. Samworth, and Ming Yuan. 2016. Efficient multivariate entropy estimation via -nearest neighbour distances. The Annals of Statistics 47, 1 (2016), 288--318. DOI:10.1214/18-AOS1688Google ScholarGoogle ScholarCross RefCross Ref
  4. Wenhao Bian, Jie Wang, Bojin Zhuang, Jiankui Yang, Shaojun Wang, and Jing Xiao. 2019. Audio-based music classification with DenseNet and data augmentation. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, 56--65.Google ScholarGoogle ScholarCross RefCross Ref
  5. Gilles Blanchard, Gyemin Lee, and Clayton Scott. 2010. Semi-supervised novelty detection. Journal of Machine Learning Research 11, Nov (2010), 2973--3009.Google ScholarGoogle Scholar
  6. François Chollet et al. 2015. Keras. Retrieved from https://keras.io.Google ScholarGoogle Scholar
  7. Scott E. Crouter, Jennifer I. Flynn, and David R. Bassett Jr. 2015. Estimating physical activity in youth using a wrist accelerometer. Medicine and Science in Sports and Exercise 47, 5 (2015), 944.Google ScholarGoogle ScholarCross RefCross Ref
  8. Steven Davis and Paul Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (1980), 357--366.Google ScholarGoogle ScholarCross RefCross Ref
  9. Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. 2017. FMA: A dataset for music analysis. In Proceedings of the 18th International Society for Music Information Retrieval Conference. DOI:https://arxiv.org/abs/1612.01840.Google ScholarGoogle Scholar
  10. Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105--112.Google ScholarGoogle ScholarCross RefCross Ref
  11. Wei Ding, Tom Stepinski, and J. Salazar. 2009. Discovery of geospatial discriminating patterns from remote sensing datasets. In Proceedings of the 2009 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics.Google ScholarGoogle Scholar
  12. Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2018. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning 107, 3 (2018), 481--508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2016. Robustness of classifiers: From adversarial to random noise. In Proceedings of the 30th International Conference on Neural Information Processing Systems. ACM, 1632--1640.Google ScholarGoogle Scholar
  14. Todor Ganchev, Nikos Fakotakis, and George Kokkinakis. 2005. Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the 10th International Conference on Speech and Computer.Google ScholarGoogle Scholar
  15. Weihao Gao, Sewoong Oh, and Pramod Viswanath. 2017. Density functional estimators with k-nearest neighbor bandwidths. In Proceedings of 2017 IEEE International Symposium on Information Theory. IEEE, 1351--1355.Google ScholarGoogle ScholarCross RefCross Ref
  16. Weihao Gao, Sewoong Oh, and Pramod Viswanath. 2018. Demystifying fixed k-nearest neighbor information estimators. IEEE Transactions on Information Theory 64, 8 (2018), 5629–5661.Google ScholarGoogle ScholarCross RefCross Ref
  17. Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1 (1992), 1--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Neil Gershenfeld. 1999. The Nature of Mathematical Modeling. Cambridge University Press, New York, NY.Google ScholarGoogle Scholar
  19. Jacob Goldberger and Ehud Ben-Reuven. 2016. Training deep neural-networks using a noise adaptation layer. In Proceedings of the 5th International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Retrieved from http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. Shixiang Gu and Luca Rigazio. 2015. Towards deep neural network architectures robust to adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15), San Diego, CA, USA, May 7-9, 2015, Yoshua Bengio and Yann LeCun (Eds.). https://dblp.org/rec/journals/corr/GuR14.bib.Google ScholarGoogle Scholar
  23. Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. 2005. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27, 2 (2005), 83--85.Google ScholarGoogle ScholarCross RefCross Ref
  24. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.Google ScholarGoogle ScholarCross RefCross Ref
  25. Gareth M. James. 2003. Variance and bias for general loss functions. Machine Learning 51, 2 (2003), 115--135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Regularizing very deep neural networks on corrupted labels. In Proceedings of the 35th International Conference on Machine Learning.Google ScholarGoogle Scholar
  27. L. F. Kozachenko and Nikolai N. Leonenko. 1987. Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23, 2 (1987), 9--16.Google ScholarGoogle Scholar
  28. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  29. Donmoon Lee, Jaejun Lee, Jeongsoo Park, and Kyogu Lee. 2019. Enhancing music features by knowledge transfer from user-item log data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 386--390.Google ScholarGoogle ScholarCross RefCross Ref
  30. Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. 1993. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6, 6 (1993), 861--867.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447--461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference.Google ScholarGoogle ScholarCross RefCross Ref
  33. Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Bob Williamson. 2015. Learning from corrupted binary labels via class-probability estimation. In Proceedings of the International Conference on Machine Learning. 125--134.Google ScholarGoogle Scholar
  34. Yang Mu, Henry Z. Lo, Wei Ding, Kevin Amaral, and Scott E. Crouter. 2014. Bipart: Learning block structure for activity detection. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2397--2409.Google ScholarGoogle ScholarCross RefCross Ref
  35. Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, and Ambuj Tewari. 2013. Learning with noisy labels. In Proceedings of the Advances in Neural Information Processing Systems, J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc.Google ScholarGoogle Scholar
  36. Andrew Ng. 2017. The State of Artificial Intelligence. Retrieved May 14, 2018 from https://youtu.be/NKpuX_yzdYs.Google ScholarGoogle Scholar
  37. Curtis G. Northcutt, Tailin Wu, and Isaac L. Chuang. 2017. Learning with confident examples: Rank pruning for robust classification with noisy labels. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI'17), Sydney, Australia, August 11-15, 2017, Gal Elidan, Kristian Kersting, and Alexander T. Ihler. AUAI Press. http://auai.org/uai2017/proceedings/papers/35.pdf.Google ScholarGoogle Scholar
  38. Travis E. Oliphant. 2006. A Guide to NumPy. Vol. 1. Trelgol Publishing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jiyoung Park, Jongpil Lee, Jangyeon Park, Jung-Woo Ha, and Juhan Nam. 2018. Representation learning of music using artist labels. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR'18), Paris, France, September 23-27, 2018, Emilia Gómez, Xiao Hu, Eric Humphrey, and Emmanouil Benetos (Eds.). 717--724. http://ismir2018.ircam.fr/doc/pdfs/168_Paper.pdf.Google ScholarGoogle Scholar
  40. Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. arXiv:1803.09050.Google ScholarGoogle Scholar
  41. Stan Salvador and Philip Chan. 2007. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 5 (2007), 561--580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Clayton Scott. 2015. A rate of convergence for mixture proportion estimation, with application to learning from noisy labels. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 838--846.Google ScholarGoogle Scholar
  43. Clayton Scott, Gilles Blanchard, and Gregory Handy. 2013. Classification with asymmetric label noise: Consistency and maximal denoising. In Proccedings of the Conference on Learning Theory. 489--511.Google ScholarGoogle Scholar
  44. Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Leslie N. Smith. 2018. A Disciplined Approach to Neural Network Hyper-Parameters: Part 1—Learning Rate, Batch Size, Momentum, and Weight Decay. Technical Report 5510-026. US Naval Research Laboratory.Google ScholarGoogle Scholar
  46. Tom Stepinski, Wei Ding, , and R. Vilalta. 2012. Detecting impact craters in planetary images using machine learning. In Intelligent Data Analysis for Real-Life Applications: Theory and Practice. IGI Global, 146–159.Google ScholarGoogle Scholar
  47. Sethu Vijayakumar. 2007. The Bias-Variance Tradeoff (PDF). Retrieved from http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf.Google ScholarGoogle Scholar
  48. Dawei Wang, Wei Ding, Kui Yu, Xindong Wu, Ping Chen, David L. Small, and Shafiqul Islam. 2013. Towards long-lead forecasting of extreme flood events: A data mining framework for precipitation cluster precursors identification. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1285–1293.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
        Survey Paper and Regular Papers
        April 2021
        524 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3446665
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 March 2021
        • Accepted: 1 October 2020
        • Revised: 1 July 2020
        • Received: 1 July 2019
        Published in tkdd Volume 15, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format