research-article

Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance

Authors:
Matthew Almeida

University of Massachusetts Boston, Boston, MA

University of Massachusetts Boston, Boston, MA
View Profile

,
Yong Zhuang

University of Massachusetts Boston, Boston, MA

University of Massachusetts Boston, Boston, MA
View Profile

,
Wei Ding

University of Massachusetts Boston, Boston, MA

University of Massachusetts Boston, Boston, MA

0000-0002-3383-551X
View Profile

,
Scott E. Crouter

University of Tennesee Knoxville, Knoxville, TN

University of Tennesee Knoxville, Knoxville, TN
View Profile

,
Ping Chen

University of Massachusetts Boston, Boston, MA

University of Massachusetts Boston, Boston, MA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 2Article No.: 27pp 1–18https://doi.org/10.1145/3429447

Published:05 March 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

The study of model bias and variance with respect to decision boundaries is critically important in supervised learning and artificial intelligence. There is generally a tradeoff between the two, as fine-tuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary fine-tuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual entities and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model’s learning algorithm and exert less pull on the decision boundary. In a real-world physical activity recognition case study, the data present many labeling challenges, and we show that this new approach improves model performance and reduces model variance.

References

Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. 2012. Learning from Data. AMLBook.Google Scholar
Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the KDD Workshop, Vol. 10. 359--370.Google ScholarDigital Library
Thomas B. Berrett, Richard J. Samworth, and Ming Yuan. 2016. Efficient multivariate entropy estimation via -nearest neighbour distances. The Annals of Statistics 47, 1 (2016), 288--318. DOI:10.1214/18-AOS1688Google ScholarCross Ref
Wenhao Bian, Jie Wang, Bojin Zhuang, Jiankui Yang, Shaojun Wang, and Jing Xiao. 2019. Audio-based music classification with DenseNet and data augmentation. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, 56--65.Google ScholarCross Ref
Gilles Blanchard, Gyemin Lee, and Clayton Scott. 2010. Semi-supervised novelty detection. Journal of Machine Learning Research 11, Nov (2010), 2973--3009.Google Scholar
François Chollet et al. 2015. Keras. Retrieved from https://keras.io.Google Scholar
Scott E. Crouter, Jennifer I. Flynn, and David R. Bassett Jr. 2015. Estimating physical activity in youth using a wrist accelerometer. Medicine and Science in Sports and Exercise 47, 5 (2015), 944.Google ScholarCross Ref
Steven Davis and Paul Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (1980), 357--366.Google ScholarCross Ref
Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. 2017. FMA: A dataset for music analysis. In Proceedings of the 18th International Society for Music Information Retrieval Conference. DOI:https://arxiv.org/abs/1612.01840.Google Scholar
Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter? Structural Safety 31, 2 (2009), 105--112.Google ScholarCross Ref
Wei Ding, Tom Stepinski, and J. Salazar. 2009. Discovery of geospatial discriminating patterns from remote sensing datasets. In Proceedings of the 2009 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics.Google Scholar
Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2018. Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning 107, 3 (2018), 481--508.Google ScholarDigital Library
Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2016. Robustness of classifiers: From adversarial to random noise. In Proceedings of the 30th International Conference on Neural Information Processing Systems. ACM, 1632--1640.Google Scholar
Todor Ganchev, Nikos Fakotakis, and George Kokkinakis. 2005. Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the 10th International Conference on Speech and Computer.Google Scholar
Weihao Gao, Sewoong Oh, and Pramod Viswanath. 2017. Density functional estimators with k-nearest neighbor bandwidths. In Proceedings of 2017 IEEE International Symposium on Information Theory. IEEE, 1351--1355.Google ScholarCross Ref
Weihao Gao, Sewoong Oh, and Pramod Viswanath. 2018. Demystifying fixed k-nearest neighbor information estimators. IEEE Transactions on Information Theory 64, 8 (2018), 5629–5661.Google ScholarCross Ref
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1 (1992), 1--58.Google ScholarDigital Library
Neil Gershenfeld. 1999. The Nature of Mathematical Modeling. Cambridge University Press, New York, NY.Google Scholar
Jacob Goldberger and Ehud Ben-Reuven. 2016. Training deep neural-networks using a noise adaptation layer. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Retrieved from http://www.deeplearningbook.org.Google ScholarDigital Library
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.Google Scholar
Shixiang Gu and Luca Rigazio. 2015. Towards deep neural network architectures robust to adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15), San Diego, CA, USA, May 7-9, 2015, Yoshua Bengio and Yann LeCun (Eds.). https://dblp.org/rec/journals/corr/GuR14.bib.Google Scholar
Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. 2005. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27, 2 (2005), 83--85.Google ScholarCross Ref
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700--4708.Google ScholarCross Ref
Gareth M. James. 2003. Variance and bias for general loss functions. Machine Learning 51, 2 (2003), 115--135.Google ScholarDigital Library
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Regularizing very deep neural networks on corrupted labels. In Proceedings of the 35th International Conference on Machine Learning.Google Scholar
L. F. Kozachenko and Nikolai N. Leonenko. 1987. Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23, 2 (1987), 9--16.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Donmoon Lee, Jaejun Lee, Jeongsoo Park, and Kyogu Lee. 2019. Enhancing music features by knowledge transfer from user-item log data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 386--390.Google ScholarCross Ref
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. 1993. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6, 6 (1993), 861--867.Google ScholarDigital Library
Tongliang Liu and Dacheng Tao. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2016), 447--461.Google ScholarDigital Library
Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference.Google ScholarCross Ref
Aditya Menon, Brendan Van Rooyen, Cheng Soon Ong, and Bob Williamson. 2015. Learning from corrupted binary labels via class-probability estimation. In Proceedings of the International Conference on Machine Learning. 125--134.Google Scholar
Yang Mu, Henry Z. Lo, Wei Ding, Kevin Amaral, and Scott E. Crouter. 2014. Bipart: Learning block structure for activity detection. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2397--2409.Google ScholarCross Ref
Nagarajan Natarajan, Inderjit S. Dhillon, Pradeep K. Ravikumar, and Ambuj Tewari. 2013. Learning with noisy labels. In Proceedings of the Advances in Neural Information Processing Systems, J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc.Google Scholar
Andrew Ng. 2017. The State of Artificial Intelligence. Retrieved May 14, 2018 from https://youtu.be/NKpuX_yzdYs.Google Scholar
Curtis G. Northcutt, Tailin Wu, and Isaac L. Chuang. 2017. Learning with confident examples: Rank pruning for robust classification with noisy labels. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence (UAI'17), Sydney, Australia, August 11-15, 2017, Gal Elidan, Kristian Kersting, and Alexander T. Ihler. AUAI Press. http://auai.org/uai2017/proceedings/papers/35.pdf.Google Scholar
Travis E. Oliphant. 2006. A Guide to NumPy. Vol. 1. Trelgol Publishing.Google ScholarDigital Library
Jiyoung Park, Jongpil Lee, Jangyeon Park, Jung-Woo Ha, and Juhan Nam. 2018. Representation learning of music using artist labels. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR'18), Paris, France, September 23-27, 2018, Emilia Gómez, Xiao Hu, Eric Humphrey, and Emmanouil Benetos (Eds.). 717--724. http://ismir2018.ircam.fr/doc/pdfs/168_Paper.pdf.Google Scholar
Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. arXiv:1803.09050.Google Scholar
Stan Salvador and Philip Chan. 2007. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis 11, 5 (2007), 561--580.Google ScholarDigital Library
Clayton Scott. 2015. A rate of convergence for mixture proportion estimation, with application to learning from noisy labels. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 838--846.Google Scholar
Clayton Scott, Gilles Blanchard, and Gregory Handy. 2013. Classification with asymmetric label noise: Consistency and maximal denoising. In Proccedings of the Conference on Learning Theory. 489--511.Google Scholar
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.Google ScholarDigital Library
Leslie N. Smith. 2018. A Disciplined Approach to Neural Network Hyper-Parameters: Part 1—Learning Rate, Batch Size, Momentum, and Weight Decay. Technical Report 5510-026. US Naval Research Laboratory.Google Scholar
Tom Stepinski, Wei Ding, , and R. Vilalta. 2012. Detecting impact craters in planetary images using machine learning. In Intelligent Data Analysis for Real-Life Applications: Theory and Practice. IGI Global, 146–159.Google Scholar
Sethu Vijayakumar. 2007. The Bias-Variance Tradeoff (PDF). Retrieved from http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf.Google Scholar
Dawei Wang, Wei Ding, Kui Yu, Xindong Wu, Ping Chen, David L. Small, and Shafiqul Islam. 2013. Towards long-lead forecasting of extreme flood events: A data mining framework for precipitation cluster precursors identification. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1285–1293.Google ScholarDigital Library

Index Terms

Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Towards Mitigating the Class-Imbalance Problem for Partial Label Learning
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Partial label (PL) learning aims to induce a multi-class classifier from training examples where each of them is associated with a set of candidate labels, among which only one is valid. It is well-known that the problem of class-imbalance stands as a ...
Read More
Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Multi-label learning is concerned with learning from data examples that are represented by a single feature vector while associated with multiple labels simultaneously. Existing multi-label learning approaches mainly focus on exploiting label ...
Read More
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 2
Survey Paper and Regular Papers
April 2021
524 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3446665
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 March 2021
- Accepted: 1 October 2020
- Revised: 1 July 2020
- Received: 1 July 2019
Published in tkdd Volume 15, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bias and variance
label uncertainty
neural networks
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 154
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Mitigating Class-Boundary Label Uncertainty to Reduce Both Model Bias and Variance

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Towards Mitigating the Class-Imbalance Problem for Partial Label Learning

Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Transductive Multilabel Learning via Label Set Propagation