Abstract
Churn prediction in telecom has recently gained substantial interest of stakeholders because of associated revenue losses.
Predicting telecom churners, is a challenging problem due to the enormous nature of the telecom datasets. In this regard, we propose an intelligent churn prediction system for telecom by employing efficient feature extraction technique and ensemble method. We have used Random Forest, Rotation Forest, RotBoost and DECORATE ensembles in combination with minimum redundancy and maximum relevance (mRMR), Fisher’s ratio and F-score methods to model the telecom churn prediction problem. We have observed that mRMR method returns most explanatory features compared to Fisher’s ratio and F-score, which significantly reduces the computations and help ensembles in attaining improved performance. In comparison to Random Forest, Rotation Forest and DECORATE, RotBoost in combination with mRMR features attains better prediction performance on the standard telecom datasets. The better performance of RotBoost ensemble is largely attributed to the rotation of feature space, which enables the base classifier to learn different aspects of the churners and non-churners. Moreover, the Adaboosting process in RotBoost also contributes in achieving higher prediction accuracy by handling hard instances. The performance evaluation is conducted on standard telecom datasets using AUC, sensitivity and specificity based measures. Simulation results reveal that the proposed approach based on RotBoost in combination with mRMR features (CP-MRB) is effective in handling high dimensionality of the telecom datasets. CP-MRB offers higher accuracy in predicting churners and thus is quite prospective in modeling the challenging problems of customer churn prediction in telecom.
Similar content being viewed by others
References
Reinartz WJ, Kumar V (2003) The impact of customer relationship characteristics on profitable lifetime duration. J Mark 67(1):77
Lee T-S, Chiu C-C, Chou Y-C, Lu C-J (2004) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130
Ruta D, Nauck D, Azvine B (2006) K nearest sequence method and its application to churn prediction. In: Intelligent data engineering and automated learning—IDEAL 2006. Lecture notes in computer sciences, vol 4224, pp 207–215
Khan A, Khan MF, Choi T-S (2008) Proximity base GPCRs prediction in transform domain. Biochem Biophys Res Commun 371(3):411–415
Tan S (2006) An effective refinement strategy for KNN text classifiers. Expert Syst Appl 30(2):290–298
Zhao L, Wang L, Xu Q (2012) Data stream classification with artificial endocrine system. Appl Intell 37(3):390–404
Zhang Y, Qi J, Shu H, Cao J (2007) A hybrid KNN-LR classifier and its application in customer churn prediction. In: IEEE international conference on systems, man and cybernetics, pp 3265–3269
Mozer MC, Wolniewicz R, Grimes DB, Johnson E, Kaushansky H (2000) Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. IEEE Trans Neural Netw 11(3):690–696
Kim Y (2006) Toward a successful CRM: variable selection, sampling, and ensemble. Decis Support Syst 41(2):542–553
Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Mark Res 43(2):276–286
Bose I, Chen X (2009) Hybrid models using unsupervised clustering for prediction of customer churn. J Organ Comput Electron Commer 19(2):133–151
Dietterich TG (2000) Ensemble methods in machine learning. In: MCS’00 proceedings of the first international workshop on multiple classifier systems. Springer, London, pp 1–15
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting and variants. Mach Learn 36(2):105–139
Wang C-W, You W-H (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell. doi:10.1007/s10489-013-0425-9
Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: a survey and results of new tests. Pattern Recognit 44(2):330–349
Xie Y, Li X, Ngai EWT, Ying W (2009) Customer churn prediction using improved balanced random forests. Expert Syst Appl 36(3):5445–5449
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Zhang C-X, Zhang J-S (2008) RotBoost: a technique for combining rotation forest and AdaBoost. Pattern Recognit Lett 29(10):1524–1536
Bock KWD, Van den Poel D (2011) An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction. Expert Syst Appl 38(10):12293–12301. doi:10.1016/j.eswa.2011.04.007
Dietterich TG (2000) An experimental comparison of three methods for constructing ensemble of decision trees: bagging, boosting and randomization. Mach Learn 40(2):139–157
Huang BQ, Kechadi TM, Buckley B, Kiernan G, Keogh E, Rashid T (2010) A new feature set with new window techniques for customer churn prediction in land-line telecommunications. Expert Syst Appl 37(5):3657–3665
Huang B, Kechadi MT, Buckley B (2012) Customer churn prediction in telecommunications. Expert Syst Appl 39(1):1414–1425. doi:10.1016/j.eswa.2011.08.024
Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36(3):4626–4636. doi:10.1016/j.eswa.2008.05.027
Owczarczuk M (2010) Churn models for prepaid customers in the cellular telecommunication industry using large data marts. Expert Syst Appl 37(6):4710–4712
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Sorokina D (2009) Application of additive groves ensemble with multiple counts feature evaluation to KDD cup ’09 small data set. In: JMLR workshop and conference proceedings, Paris, France, June 28, 2009, vol 7, pp 101–109
Vinh L, Lee S, Park Y-T, Auriol BD (2012) A novel feature selection method based on normalized mutual information. Appl Intell 37(1):100–120
Li H, Wu X, Li Z, Wu G (2013) A relation extraction method of Chinese named entities based on location and semantic features. Appl Intell 38(1):1–15
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Zhang C-X, Wang G-W, Zhang J-S (2012) An empirical bias-variance analysis of DECORATE ensemble method at different training sample sizes. J Appl Stat 39(4):829–850
Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur J Oper Res 218(1):211–229
KDDCup 2009 challenge (2009) http://kddcup-orange.com
The Center for Customer Relationship Management, Duke University. http://www.fuqua.duke.edu/centers/ccrm/
Marquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330
Miller H, Clarke S, Lane S, Lonie A, Lazaridiz D, Petrovski S, Jones O (2009) Predicting customer behaviour: the University of Melbourne’s KDD Cup report. In: JMLR workshop and conference proceedings, Paris, France, June 28, 2009, vol 28, pp 45–55
Busa-Fekete R, Kegl B (2009) Accelerating AdaBoost using UCB. In: JMLR workshop and conference proceedings, Paris, France, June 28, 2009, vol 7, pp 111–122
Komoto K, Sugawara T, Tetu TI, Xuejuan X (2009) Stochastic gradient boosting. http://www.kddcup-orange.com/factsheet.php?id=23>
Acknowledgement
This work is supported by the Higher Education Commission of Pakistan (HEC) as per award No. 17-5-6(Ps6-002)/HEC/Sch/2010 and Korean National Research Foundation as per grant No. (NRF-2011-0006806).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Idris, A., Khan, A. & Lee, Y.S. Intelligent churn prediction in telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39, 659–672 (2013). https://doi.org/10.1007/s10489-013-0440-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0440-x