Skip to main content
Log in

Transfer learning for class imbalance problems with inadequate data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The terms class and label are used interchangeably in our discussion.

  2. Only concepts that are relevant for “Absolute Rarity” are discussed.

  3. In this paper, a “Rare Dataset” refers to a dataset with “Absolute Rarity”.

  4. All mentions of “convergence” refer to a sequence (weight) that converges to zero.

  5. Slower or decreased convergence rate means that a weight converges to zero with higher number of boosting iterations.

  6. Faster or increased convergence rate means that a weight converges to zero with lower number of boosting iterations.

  7. The Up/Down arrow next to each error measure signifies that an algorithm produced better/worse results in comparison with the other algorithm.

  8. http://people.csail.mit.edu/jrennie/20Newsgroups/.

References

  1. He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  2. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  3. Li Y, Vinzamuri B, Reddy CK (2015) Constrained elastic net based knowledge transfer for healthcare information exchange. Data Min Knowl Discov 29(4):1094–1112

    Article  MathSciNet  Google Scholar 

  4. Waters D (2009) Spam overwhelms e-mail messages. BBC News. http://news.bbc.co.uk/2/hi/technology/7988579.stm

  5. Halliday J (2011) Email spam level bounces back after record low. The Guardian; Retrieved 2011-01-11

  6. Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge

    Google Scholar 

  7. Mitchell T (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  8. Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19

    Article  Google Scholar 

  9. He J (2010) Rare category analysis. Ph.D. thesis; Carnegie Mellon University

  10. Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 26–33

  11. Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354

    MATH  Google Scholar 

  12. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    MATH  Google Scholar 

  13. Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  14. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  15. Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233–240

  16. Dai W, Yang Q, Xue GR, Yu Y (2007a) Boosting for transfer learning. In: Proceedings of the international conference on machine learning, pp 193–200

  17. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second European conference on computational learning theory, pp 23–37

  18. Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: Proceedings of the 30th annual symposium on foundations of computer science, pp 256–261

  19. Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1855–1862

  20. Al-Stouhi S, Reddy CK, Lanfear DE (2012) Label space transfer learning. In: IEEE 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, pp 727–734

  21. Vieriu RL, Rajagopal A, Subramanian R, Lanz O, Ricci E, Sebe N et al (2012) Boosting-based transfer learning for multi-view head-pose classification from surveillance videos. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp 649–653

  22. Luo W, Li X, Li W, Hu W (2011) Robust visual tracking via transfer learning. In: ICIP, pp 485–488

  23. Eaton E, Des Jardins M (2009) Set-based boosting for instance-level transfer. In: Proceedings of the 2009 IEEE international conference on data mining workshops, pp 422 –428

  24. Venkatesan A, Krishnan N, Panchanathan S (2010) Cost–sensitive boosting for concept drift. In: Proceedings of the 2010 international workshop on handling concept drift in adaptive information systems, pp 41–47

  25. Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378

    Article  MATH  Google Scholar 

  26. Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on machine learning, pp 863–870

  27. Eaton E (2009) Selective knowledge transfer for machine learning. Ph.D. thesis. University of Maryland Baltimore County

  28. Al-Stouhi S, Reddy CK (2011) Adaptive boosting for transfer learning using dynamic updates. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 60–75

  29. Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the American association for artificial intelligence workshop, pp 1–3

  30. Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 127–136

  31. Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215

    Article  Google Scholar 

  32. Guyon I, Aliferis CF, Cooper GF, Elisseeff A, Pellet JP, Spirtes P et al (2008) Design and analysis of the causation and prediction challenge. J Mach Learn Res Proc Track 3:1–33

    Google Scholar 

  33. Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, ISBN:0408709294

  34. Brodersen K, Ong CS, Stephan K, Buhmann J (2010) The balanced accuracy and its posterior distribution. In: Pattern recognition (ICPR), 2010 20th international conference on, pp 3121–3124

  35. Vinzamuri B, Reddy CK (2013) Cox regression with correlation based regularization for electronic health records. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 757–766

  36. Clancy C, Munier W, Crosson K, Moy E, Ho K, Freeman W et al (2011) 2010 National healthcare quality and disparities reports. Tech. Rep, Agency for Healthcare Research and Quality (AHRQ)

  37. Gertler P, Molyneaux J (1994) How economic development and family planning programs combined to reduce indonesian fertility. Demography 31(1):33–63. doi:10.2307/2061907

    Article  Google Scholar 

  38. Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO (2009) Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Trans Biomed Eng 56(4):1015–1022

    Article  Google Scholar 

  39. Fahn S, Elton R, Committee UD et al (1987) Unified parkinson’s disease rating scale. Recent Dev Parkinson’s Dis 2:153–163

    Google Scholar 

  40. Dai W, Xue GR, Yang Q, Yu Y (2007b) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 210–219

  41. Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf Process Manag 39(1):45–65

    Article  MathSciNet  MATH  Google Scholar 

  42. Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  43. Kohavi R, et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; vol. 14. Lawrence Erlbaum Associates Ltd, pp 1137–1145

  44. Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90

    Article  Google Scholar 

  45. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119

  46. Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29

    Article  Google Scholar 

  47. Wang Y, Xiao J (2011) Transfer ensemble model for customer churn prediction with imbalanced class distribution. In: Information technology, computer engineering and management sciences (ICM), 2011 international conference on. vol. 3. IEEE, pp 177–181

  48. Palit I, Reddy CK (2012) Scalable and parallel boosting with mapreduce. IEEE Trans Knowl Data Eng 24(10):1904–1916

    Article  Google Scholar 

  49. Reddy CK, Park JH (2011) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst 29(2):435–456

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R21CA175974 and the US National Science Foundation grants IIS-1231742, IIS-1242304, and IIS-1527827. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandan K. Reddy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Stouhi, S., Reddy, C.K. Transfer learning for class imbalance problems with inadequate data. Knowl Inf Syst 48, 201–228 (2016). https://doi.org/10.1007/s10115-015-0870-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0870-3

Keywords

Navigation