Abstract
A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.
Similar content being viewed by others
Notes
The terms class and label are used interchangeably in our discussion.
Only concepts that are relevant for “Absolute Rarity” are discussed.
In this paper, a “Rare Dataset” refers to a dataset with “Absolute Rarity”.
All mentions of “convergence” refer to a sequence (weight) that converges to zero.
Slower or decreased convergence rate means that a weight converges to zero with higher number of boosting iterations.
Faster or increased convergence rate means that a weight converges to zero with lower number of boosting iterations.
The Up/Down arrow next to each error measure signifies that an algorithm produced better/worse results in comparison with the other algorithm.
References
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Li Y, Vinzamuri B, Reddy CK (2015) Constrained elastic net based knowledge transfer for healthcare information exchange. Data Min Knowl Discov 29(4):1094–1112
Waters D (2009) Spam overwhelms e-mail messages. BBC News. http://news.bbc.co.uk/2/hi/technology/7988579.stm
Halliday J (2011) Email spam level bounces back after record low. The Guardian; Retrieved 2011-01-11
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge
Mitchell T (1997) Machine learning. McGraw-Hill, New York
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19
He J (2010) Rare category analysis. Ph.D. thesis; Carnegie Mellon University
Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 26–33
Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19(1):315–354
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233–240
Dai W, Yang Q, Xue GR, Yu Y (2007a) Boosting for transfer learning. In: Proceedings of the international conference on machine learning, pp 193–200
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Proceedings of the second European conference on computational learning theory, pp 23–37
Littlestone N, Warmuth MK (1989) The weighted majority algorithm. In: Proceedings of the 30th annual symposium on foundations of computer science, pp 256–261
Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1855–1862
Al-Stouhi S, Reddy CK, Lanfear DE (2012) Label space transfer learning. In: IEEE 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, pp 727–734
Vieriu RL, Rajagopal A, Subramanian R, Lanz O, Ricci E, Sebe N et al (2012) Boosting-based transfer learning for multi-view head-pose classification from surveillance videos. In: Proceedings of the 20th European signal processing conference (EUSIPCO), pp 649–653
Luo W, Li X, Li W, Hu W (2011) Robust visual tracking via transfer learning. In: ICIP, pp 485–488
Eaton E, Des Jardins M (2009) Set-based boosting for instance-level transfer. In: Proceedings of the 2009 IEEE international conference on data mining workshops, pp 422 –428
Venkatesan A, Krishnan N, Panchanathan S (2010) Cost–sensitive boosting for concept drift. In: Proceedings of the 2010 international workshop on handling concept drift in adaptive information systems, pp 41–47
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Pardoe D, Stone P (2010) Boosting for regression transfer. In: Proceedings of the 27th international conference on machine learning, pp 863–870
Eaton E (2009) Selective knowledge transfer for machine learning. Ph.D. thesis. University of Maryland Baltimore County
Al-Stouhi S, Reddy CK (2011) Adaptive boosting for transfer learning using dynamic updates. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 60–75
Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the American association for artificial intelligence workshop, pp 1–3
Ertekin S, Huang J, Bottou L, Giles L (2007) Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management, pp 127–136
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Guyon I, Aliferis CF, Cooper GF, Elisseeff A, Pellet JP, Spirtes P et al (2008) Design and analysis of the causation and prediction challenge. J Mach Learn Res Proc Track 3:1–33
Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, ISBN:0408709294
Brodersen K, Ong CS, Stephan K, Buhmann J (2010) The balanced accuracy and its posterior distribution. In: Pattern recognition (ICPR), 2010 20th international conference on, pp 3121–3124
Vinzamuri B, Reddy CK (2013) Cox regression with correlation based regularization for electronic health records. In: Data mining (ICDM), 2013 IEEE 13th international conference on. IEEE, pp 757–766
Clancy C, Munier W, Crosson K, Moy E, Ho K, Freeman W et al (2011) 2010 National healthcare quality and disparities reports. Tech. Rep, Agency for Healthcare Research and Quality (AHRQ)
Gertler P, Molyneaux J (1994) How economic development and family planning programs combined to reduce indonesian fertility. Demography 31(1):33–63. doi:10.2307/2061907
Little MA, McSharry PE, Hunter EJ, Spielman J, Ramig LO (2009) Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. IEEE Trans Biomed Eng 56(4):1015–1022
Fahn S, Elton R, Committee UD et al (1987) Unified parkinson’s disease rating scale. Recent Dev Parkinson’s Dis 2:153–163
Dai W, Xue GR, Yang Q, Yu Y (2007b) Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 210–219
Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf Process Manag 39(1):45–65
Chawla N, Bowyer K, Hall L, Kegelmeyer W (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Kohavi R, et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence; vol. 14. Lawrence Erlbaum Associates Ltd, pp 1137–1145
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119
Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Wang Y, Xiao J (2011) Transfer ensemble model for customer churn prediction with imbalanced class distribution. In: Information technology, computer engineering and management sciences (ICM), 2011 international conference on. vol. 3. IEEE, pp 177–181
Palit I, Reddy CK (2012) Scalable and parallel boosting with mapreduce. IEEE Trans Knowl Data Eng 24(10):1904–1916
Reddy CK, Park JH (2011) Multi-resolution boosting for classification and regression problems. Knowl Inf Syst 29(2):435–456
Acknowledgments
This work was supported in part by the National Cancer Institute of the National Institutes of Health under Award Number R21CA175974 and the US National Science Foundation grants IIS-1231742, IIS-1242304, and IIS-1527827. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and NSF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al-Stouhi, S., Reddy, C.K. Transfer learning for class imbalance problems with inadequate data. Knowl Inf Syst 48, 201–228 (2016). https://doi.org/10.1007/s10115-015-0870-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0870-3