Building comprehensible customer churn prediction models with advanced rule induction techniques

https://doi.org/10.1016/j.eswa.2010.08.023Get rights and content

Abstract

Customer churn prediction models aim to detect customers with a high propensity to attrite. Predictive accuracy, comprehensibility, and justifiability are three key aspects of a churn prediction model. An accurate model permits to correctly target future churners in a retention marketing campaign, while a comprehensible and intuitive rule-set allows to identify the main drivers for customers to churn, and to develop an effective retention strategy in accordance with domain knowledge. This paper provides an extended overview of the literature on the use of data mining in customer churn prediction modeling. It is shown that only limited attention has been paid to the comprehensibility and the intuitiveness of churn prediction models. Therefore, two novel data mining techniques are applied to churn prediction modeling, and benchmarked to traditional rule induction techniques such as C4.5 and RIPPER. Both AntMiner+ and ALBA are shown to induce accurate as well as comprehensible classification rule-sets. AntMiner+ is a high performing data mining technique based on the principles of Ant Colony Optimization that allows to include domain knowledge by imposing monotonicity constraints on the final rule-set. ALBA on the other hand combines the high predictive accuracy of a non-linear support vector machine model with the comprehensibility of the rule-set format. The results of the benchmarking experiments show that ALBA improves learning of classification techniques, resulting in comprehensible models with increased performance. AntMiner+ results in accurate, comprehensible, but most importantly justifiable models, unlike the other modeling techniques included in this study.

Research highlights

► The literature has paid very limited attention to the comprehensibility and justifiability of customer churn prediction models. ► ALBA improves learning, resulting in comprehensible customer churn prediction models with increased performance. ► AntMiner+ results in accurate, comprehensible, and most importantly, justifiable customer churn prediction models.

Introduction

In recent decades we have witnessed an explosion of data. Valuable information is contained in this data, but is hidden in the vast collection of raw data. Data mining entails the overall process of extracting knowledge from this data. Data mining techniques have been successfully applied in many different domains. Well-known examples are breast-cancer detection in the biomedical sector, market basket analysis in the retail sector (Berry & Linoff, 2004), and credit scoring in the financial sector (Baesens et al., 2003). This paper however focuses on the use of data mining to predict customer churn.

Customer churn prediction models aim to detect customers with a high propensity to attrite. An accurate segmentation of the customer base allows a company to target the customers that are most likely to churn in a retention marketing campaign, which improves the efficient use of the limited resources for such a campaign. Customer retention is profitable to a company, because: (1) Attracting new clients costs five to six times more than customer retention (Athanassopoulos, 2000, Bhattacharya, 1998, Colgate and Danaher, 2000, Rasmusson, 1999). (2) Long-term customers generate higher profits, tend to be less sensitive to competitive marketing activities, become less costly to serve, and may provide new referrals through positive word-of-mouth, while dissatisfied customers might spread negative word-of-mouth (Colgate et al., 1996, Ganesh et al., 2000, Mizerski, 1982, Paulin et al., 1998, Reichheld, 1996, Stum and Thiry, 1991, Zeithaml et al., 1996). (3) Losing customers leads to opportunity costs because of reduced sales (Rust & Zahorik, 1993). A small improvement in customer retention hence can lead to a significant increase in profit (Van den Poel & Larivière, 2004). That is why both accurate and comprehensible churn prediction models are needed, in order to identify respectively the customers who are about to churn and their reasons to do so. As will be discussed in Section 2, many data mining techniques have already been tested on their churn predictive power. Much less attention has been paid however to the comprehensibility and the justifiability of the developed models. Note that churn prediction is just one of the applications of data mining for marketing, others include customer lifetime value prediction (Glady, Baesens, & Croux, 2009), frequent itemset mining (Agrawal & Srikant, 1994) and sales forecasting (Thomassey & Happiette, 2007).

In this paper we introduce the application of two novel data mining techniques for customer churn prediction. The first technique, AntMiner+, uses Ant Colony Optimization (ACO) to infer rules from data, and explicitly seeks to induce accurate, comprehensible, and intuitive classification rule-sets (Martens et al., 2007). So far AntMiner+ has been successfully applied to credit scoring (Martens et al., 2006), software mining (Vandecruys et al., 2008), audit mining (Martens, Bruynseels, Baesens, Willekens, & Vanthienen, 2008), and business/ICT alignment prediction (Cumps et al., 2009). An advantage of AntMiner+ is the possibility to incorporate domain knowledge (Martens et al., 2006), ensuring intuitive decision support models.

The second technique is an Active Learning Based Approach (ALBA) for support vector machine (SVM) rule extraction (Martens, Van Gestel, & Baesens, 2009). ALBA manipulates a dataset by changing the class labels of data instances by the SVM predicted labels, and by generating additional data instances close to the class boundaries. Applying simple rule induction techniques such as C4.5 or RIPPER on the manipulated dataset results in improved learning, and thus in a more accurate, but still comprehensible, rule-set.

The remainder of this paper is structured as follows. First, in Section 2, the domain of customer churn prediction modeling is introduced by means of a broad literature study. Then in Section 3, the workings of AntMiner+ and ALBA are briefly explained. In Section 4 both techniques are applied to predict customer churn, and the setup and results of a series of experiments are discussed. The final section concludes the paper.

Section snippets

Customer churn prediction modeling

Customer relationship management, and customer churn prediction in particular, have received a growing attention during the last decade. Table 1 provides an overview of the literature on the use of data mining techniques for customer churn prediction modeling. The table summarizes the applied modeling techniques, the characteristics of the assessed datasets, and the validation and evaluation of the results. Also included are preprocessing steps like sampling and variable selection.

In this paper

Advanced rule induction techniques: AntMiner+ and ALBA

As churn prediction models should be both accurate and comprehensible, we will focus on the use of rule-based classification techniques. More specifically, we will induce rule-sets from a churn dataset using AntMiner+ and ALBA, as well as with more traditional rule induction techniques C4.5 and RIPPER. The workings of AntMiner+ and ALBA are explained briefly in the next two sections.

Dataset

AntMiner+ and ALBA are applied on a publicly available dataset downloaded from the KDD library.2 The dataset is obtained from a wireless telecom operator, and consists of 5000 observations. For each observation 21 features are available, with no missing values. 14.3% of the customers are indicated to churn in the coming three months. For a full description of the dataset, one may refer to Larose (2005).

Data preprocessing

Data preprocessing was conducted in the form

Conclusion

As discussed in the literature review, churn prediction models should be both accurate and comprehensible in order to improve the efficiency of retention marketing campaigns. This paper presents the application of AntMiner+ and ALBA on a publicly available churn prediction dataset. Both techniques explicitly seek to induce accurate as well as comprehensible rule-sets. The results are benchmarked to C4.5, RIPPER, SVM, and logistic regression. It is shown that ALBA, combined with RIPPER or C4.5,

Acknowledgements

We extend our gratitude to the Flemish Research Council for financial support (FWO postdoctoral research grant, Odysseus Grant B.0915.09), and the National Bank of Belgium (NBB/10/006).

References (68)

  • S. Hung et al.

    Applying data mining to telecom churn management

    Expert Systems with Applications

    (2006)
  • H. Hwang et al.

    An LTV model and customer segmentation based on customer value: a case study on the wireless telecommunication industry

    Expert Systems with Applications

    (2004)
  • B. Larivière et al.

    Predicting customer retention and profitability by using random forest and regression forest techniques

    Expert Systems with Applications

    (2005)
  • D. Martens et al.

    Comprehensible credit scoring models using rule extraction from support vector machines

    European Journal of Operational Research

    (2007)
  • D. Martens et al.

    Predicting going concern opinion with data mining

    Decision Support Systems

    (2008)
  • R. Rust et al.

    Customer satisfaction, customer retention, and market share

    Journal of Retailing

    (1993)
  • T. Stützle et al.

    MAX-MIN ant system

    Future Generation Computer Systems

    (2000)
  • S. Thomassey et al.

    A neural clustering and classification system for sales forecasting of new apparel items

    Applied Soft Computing

    (2007)
  • O. Vandecruys et al.

    Mining software repositories for comprehensible software fault prediction models

    Journal of Systems and Software

    (2008)
  • D. Van den Poel et al.

    Customer attrition analysis for financial services using proportional hazard models

    European Journal of Operational Research

    (2004)
  • J. Vanthienen et al.

    A tool-supported approach to inter-tabular verification

    Expert Systems with Applications

    (1998)
  • C. Wei et al.

    Turning telecommunications call details to churn prediction: A data mining approach

    Expert Systems with Applications

    (2002)
  • A. Abraham et al.

    Web usage mining using artificial ant colony clustering

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB (pp....
  • W. Au et al.

    A novel evolutionary data mining algorithm with applications to churn prediction

    IEEE Transactions on Evolutionary Computation

    (2003)
  • B. Baesens et al.

    Benchmarking state-of-the-art classification algorithms for credit scoring

    Journal of the Operational Research Society

    (2003)
  • M. Berry et al.

    Data mining techniques: For marketing, sales and customer relationship management

    (2004)
  • C. Bhattacharya

    When customers are members: Customer retention in paid membership contexts

    Journal of the Academy of Marketing Science

    (1998)
  • Bullnheimer, B., Hartl, R., & Strauss, C. (1999). Applying the ant system to the vehicle routing problem. In Voss, S.,...
  • G.D. Caro et al.

    Antnet: Distributed stigmergetic control for communications networks

    Journal of Artificial Intelligence Research

    (1998)
  • D. Cohn et al.

    Improving generalization with active learning

    Machine Learning

    (1994)
  • M. Colgate et al.

    Implementing a customer relationship strategy: The assymetric impact of poor versus excellent execution

    Journal of the Academy of Marketing Science

    (2000)
  • M. Colgate et al.

    Customer defection: A study of the student market in ireland

    International Journal of Bank Marketing

    (1996)
  • A. Colorni et al.

    Ant system for jobÜshop scheduling

    Journal of Operations Research, Statistics and Computer Science

    (1994)
  • Cited by (0)

    View full text