Transfer learning for class imbalance problems with inadequate data

Al-Stouhi, Samir; Reddy, Chandan K.

doi:10.1007/s10115-015-0870-3

Transfer learning for class imbalance problems with inadequate data

Regular Paper
Published: 25 August 2015

Volume 48, pages 201–228, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Samir Al-Stouhi¹ &
Chandan K. Reddy²

2280 Accesses
77 Citations
Explore all metrics

Abstract

A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Article Open access 28 May 2016

Notes

The terms class and label are used interchangeably in our discussion.
Only concepts that are relevant for “Absolute Rarity” are discussed.
In this paper, a “Rare Dataset” refers to a dataset with “Absolute Rarity”.
All mentions of “convergence” refer to a sequence (weight) that converges to zero.
Slower or decreased convergence rate means that a weight converges to zero with higher number of boosting iterations.
Faster or increased convergence rate means that a weight converges to zero with lower number of boosting iterations.
The Up/Down arrow next to each error measure signifies that an algorithm produced better/worse results in comparison with the other algorithm.
http://people.csail.mit.edu/jrennie/20Newsgroups/.

References

He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Li Y, Vinzamuri B, Reddy CK (2015) Constrained elastic net based knowledge transfer for healthcare information exchange. Data Min Knowl Discov 29(4):1094–1112
Article MathSciNet Google Scholar
Waters D (2009) Spam overwhelms e-mail messages. BBC News. http://news.bbc.co.uk/2/hi/technology/7988579.stm
Halliday J (2011) Email spam level bounces back after record low. The Guardian; Retrieved 2011-01-11
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Google Scholar
Mitchell T (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19
Article Google Scholar
He J (2010) Rare category analysis. Ph.D. thesis; Carnegie Mellon University
Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 26–33
Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354
MATH Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
MATH Google Scholar
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Article Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233–240
Dai W, Yang Q, Xue GR, Yu Y (2007a) Boosting for transfer learning. In: Proceedings of the international conference on machine learning, pp 193–200
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second European conference on computational learning theory, pp 23–37
Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: Proceedings of the 30th annual symposium on foundations of computer science, pp 256–261
Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1855–1862
Al-Stouhi S, Reddy CK, Lanfear DE (2012) Label space transfer learning. In: IEEE 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, pp 727–734
Vieriu RL, Rajagopal A, Subramanian R, Lanz O, Ricci E, Sebe N et al (2012) Boosting-based transfer learning for multi-view head-pose classification from surveillance videos. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp 649–653
Luo W, Li X, Li W, Hu W (2011) Robust visual tracking via transfer learning. In: ICIP, pp 485–488
Eaton E, Des Jardins M (2009) Set-based boosting for instance-level transfer. In: Proceedings of the 2009 IEEE international conference on data mining workshops, pp 422 –428
Venkatesan A, Krishnan N, Panchanathan S (2010) Cost–sensitive boosting for concept drift. In: Proceedings of the 2010 international workshop on handling concept drift in adaptive information systems, pp 41–47
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article MATH Google Scholar
Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on machine learning, pp 863–870
Eaton E (2009) Selective knowledge transfer for machine learning. Ph.D. thesis. University of Maryland Baltimore County
Al-Stouhi S, Reddy CK (2011) Adaptive boosting for transfer learning using dynamic updates. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 60–75
Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the American association for artificial intelligence workshop, pp 1–3
Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 127–136
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Article Google Scholar
Guyon I, Aliferis CF, Cooper GF, Elisseeff A, Pellet JP, Spirtes P et al (2008) Design and analysis of the causation and prediction challenge. J Mach Learn Res Proc Track 3:1–33
Google Scholar
Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, ISBN:0408709294
Brodersen K, Ong CS, Stephan K, Buhmann J (2010) The balanced accuracy and its posterior distribution. In: Pattern recognition (ICPR), 2010 20th international conference on, pp 3121–3124
Vinzamuri B, Reddy CK (2013) Cox regression with correlation based regularization for electronic health records. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 757–766
Clancy C, Munier W, Crosson K, Moy E, Ho K, Freeman W et al (2011) 2010 National healthcare quality and disparities reports. Tech. Rep, Agency for Healthcare Research and Quality (AHRQ)
Gertler P, Molyneaux J (1994) How economic development and family planning programs combined to reduce indonesian fertility. Demography 31(1):33–63. doi:10.2307/2061907
Article Google Scholar
Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO (2009) Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Trans Biomed Eng 56(4):1015–1022
Article Google Scholar
Fahn S, Elton R, Committee UD et al (1987) Unified parkinson’s disease rating scale. Recent Dev Parkinson’s Dis 2:153–163
Google Scholar
Dai W, Xue GR, Yang Q, Yu Y (2007b) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 210–219
Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf Process Manag 39(1):45–65
Article MathSciNet MATH Google Scholar
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Kohavi R, et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; vol. 14. Lawrence Erlbaum Associates Ltd, pp 1137–1145
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90
Article Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119
Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Article Google Scholar
Wang Y, Xiao J (2011) Transfer ensemble model for customer churn prediction with imbalanced class distribution. In: Information technology, computer engineering and management sciences (ICM), 2011 international conference on. vol. 3. IEEE, pp 177–181
Palit I, Reddy CK (2012) Scalable and parallel boosting with mapreduce. IEEE Trans Knowl Data Eng 24(10):1904–1916
Article Google Scholar
Reddy CK, Park JH (2011) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst 29(2):435–456
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R21CA175974 and the US National Science Foundation grants IIS-1231742, IIS-1242304, and IIS-1527827. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and NSF.

Author information

Authors and Affiliations

Honda Automobile Technology Research, Southfield, MI, USA
Samir Al-Stouhi
Department of Computer Science, Wayne State University, Detroit, MI, USA
Chandan K. Reddy

Authors

Samir Al-Stouhi
View author publications
You can also search for this author in PubMed Google Scholar
Chandan K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandan K. Reddy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Stouhi, S., Reddy, C.K. Transfer learning for class imbalance problems with inadequate data. Knowl Inf Syst 48, 201–228 (2016). https://doi.org/10.1007/s10115-015-0870-3

Download citation

Received: 31 August 2014
Revised: 30 May 2015
Accepted: 02 August 2015
Published: 25 August 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10115-015-0870-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transfer learning for class imbalance problems with inadequate data

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transfer learning for class imbalance problems with inadequate data

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation