Skip to main content
Top
Published in: International Journal of Computer Assisted Radiology and Surgery 12/2020

23-09-2020 | Original Article

Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging

Authors: Wendi Qu, Indranil Balki, Mauro Mendez, John Valen, Jacob Levman, Pascal N. Tyrrell

Published in: International Journal of Computer Assisted Radiology and Surgery | Issue 12/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Purpose

Machine learning (ML) algorithms are well known to exhibit variations in prediction accuracy when provided with imbalanced training sets typically seen in medical imaging (MI) due to the imbalanced ratio of pathological and normal cases. This paper presents a thorough investigation of the effects of class imbalance and methods for mitigating class imbalance in ML algorithms applied to MI.

Methods

We first selected five classes from the Image Retrieval in Medical Applications (IRMA) dataset, performed multiclass classification using the random forest model (RFM), and then performed binary classification using convolutional neural network (CNN) on a chest X-ray dataset. An imbalanced class was created in the training set by varying the number of images in that class. Methods tested to mitigate class imbalance included oversampling, undersampling, and changing class weights of the RFM. Model performance was assessed by overall classification accuracy, overall F1 score, and specificity, recall, and precision of the imbalanced class.

Results

A close-to-balanced training set resulted in the best model performance, and a large imbalance with overrepresentation was more detrimental to model performance than underrepresentation. Oversampling and undersampling methods were both effective in mitigating class imbalance, and efficacy of oversampling techniques was class specific.

Conclusion

This study systematically demonstrates the effect of class imbalance on two public X-ray datasets on RFM and CNN, making these findings widely applicable as a reference. Furthermore, the methods employed here can guide researchers in assessing and addressing the effects of class imbalance, while considering the data-specific characteristics to optimize imbalance mitigating methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
2.
go back to reference Chen C, Liaw A, Brieman L (2004) Using random forest to learn imbalanced data: Technical Report No. 666. University of California, Berkley. Using Random Forest to Learn Imbalanced Data Chen C, Liaw A, Brieman L (2004) Using random forest to learn imbalanced data: Technical Report No. 666. University of California, Berkley. Using Random Forest to Learn Imbalanced Data
4.
go back to reference Klement W, Wilk S, Michalowski W, Matwin S (2011) Classifying severely imbalanced data Klement W, Wilk S, Michalowski W, Matwin S (2011) Classifying severely imbalanced data
5.
go back to reference Tang A, Tam R, Cadrin-Chênevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Mitchell JR, Cicero MD, Poudrette MG, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R, O’Connell T, Babyn P, Koff D, Ferguson D, Derkatch S, Bilbily A, Shabana W (2018) Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J 69:120–135CrossRef Tang A, Tam R, Cadrin-Chênevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Mitchell JR, Cicero MD, Poudrette MG, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R, O’Connell T, Babyn P, Koff D, Ferguson D, Derkatch S, Bilbily A, Shabana W (2018) Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol J 69:120–135CrossRef
6.
go back to reference Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, Garcia-Pedrero A, Ramirez SC, Kong D, Moody AR, Tyrrell PN (2019) Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 70:344–353CrossRef Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, Garcia-Pedrero A, Ramirez SC, Kong D, Moody AR, Tyrrell PN (2019) Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 70:344–353CrossRef
7.
go back to reference Lehmann T, Fischer B, Güld M, Thies C, Keysers D, Deselaers T, Schubert H, Wein B, Spitzer K (2004) The IRMA reference database and its use for content-based image retrieval in medical applications. Science 5:3–6 Lehmann T, Fischer B, Güld M, Thies C, Keysers D, Deselaers T, Schubert H, Wein B, Spitzer K (2004) The IRMA reference database and its use for content-based image retrieval in medical applications. Science 5:3–6
8.
go back to reference Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting M, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172:1122–1131.e9. https://doi.org/10.1016/j.cell.2018.02.010CrossRefPubMed Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting M, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172:1122–1131.e9. https://​doi.​org/​10.​1016/​j.​cell.​2018.​02.​010CrossRefPubMed
14.
go back to reference Chawla NV (2006) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 853–867 Chawla NV (2006) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 853–867
15.
go back to reference Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1:332–340 Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1:332–340
Metadata
Title
Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging
Authors
Wendi Qu
Indranil Balki
Mauro Mendez
John Valen
Jacob Levman
Pascal N. Tyrrell
Publication date
23-09-2020
Publisher
Springer International Publishing
Published in
International Journal of Computer Assisted Radiology and Surgery / Issue 12/2020
Print ISSN: 1861-6410
Electronic ISSN: 1861-6429
DOI
https://doi.org/10.1007/s11548-020-02260-6

Other articles of this Issue 12/2020

International Journal of Computer Assisted Radiology and Surgery 12/2020 Go to the issue

Premium Partner