A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction

https://doi.org/10.1016/j.eswa.2009.10.040Get rights and content

Abstract

This paper proposes a hybrid method for effective bankruptcy prediction, based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance with variable weight. Unlike the existing case-based reasoning methods using the Euclidean distance, we introduce the Mahalanobis distance in locating the nearest neighbors, which considers the covariance structure of variables in measuring the closeness. Since hundreds of financial ratio variables are available in analyzing credit management problems, the model performance is also affected by input variable selection strategies. Variables selected by the decision trees induction tend to have an interaction compared to those produced by the regression approaches. The Mahalanobis distance is a more true measure of proximity than the Euclidean distance when variables are correlated with each other. The experimental results indicate that the proposed approach outperforms some currently-in-use techniques.

Introduction

Diagnosing a firm’s credit risk level for possible bankruptcy has been a major problem to both scholars and practitioners. The case of Korea is a good example for illustrating the importance of credit risk management problems during the foreign currency crisis in 1990s. During the period, the whole Korean industry has gone through business restructuring resulting in a massive bankruptcy of incompetent firms nationwide. The total balance of bank loan to Korean firms was estimated to be US$279 billion as of January, 2006, and 285 firms, on average, went bankrupt each month in 2005 (Bank of Korea, 2006).

Improvement in the prediction accuracy associated with evaluating the default risk of firms may result in a considerable amount of savings for an economy. The sources of savings include cost reduction in credit analysis, better monitoring, and an increased debt collection rate. Traditionally, banks have used the internally developed credit risk scoring systems in which both quantitative and qualitative aspects are evaluated (Treacy & Carey, 2000). To quantitatively assess the credit risk level, more systematic models were developed in the areas of statistics and machine learning techniques. Multiple discriminant analysis was, to the best of our knowledge, a pioneering statistical approach in discovering the factors influencing bankruptcy. Logistic regression model became popular due to its relatively relaxed assumptions. In the 1990s, a number of studies attempted to apply artificial intelligence (AI) techniques to credit risk management (Dimitras, Zanakis, & Zopounidis, 1996).

The purpose of this study is to propose a case-based reasoning approach that incorporates the covariance structure of variables and variable weights in locating the nearest neighbors. The prediction for a given firm is made by applying a voting algorithm on the bankruptcy status of k-nearest neighbors. According to our literature review, most former studies were not interested in investigating the effect of input variable selection processes on the stability of model performance. Thereby, we evaluate the prediction accuracy of our model by applying two AI-based input variable selection strategies, as well as two other strategies using stepwise regression combined with expert opinions.

Section snippets

Literature review

The bankruptcy prediction measures whether a firm will go bankrupt or not. In the areas of statistics and artificial intelligence, bankruptcy measuring techniques often estimate the probability of bankruptcy. In these techniques, the prediction of bankruptcy is made if the estimated probability is greater than a threshold value. Discriminant analysis, logistic regression, neural networks, and decision trees methods adopt this probabilistic approach. Linear or non-linear equations that capture

Variable selection method

Most widely used techniques to select input variables of the model when many variables are available for analysis include stepwise regressions and field expert selection methods. In the credit management problems, it is common to have even more than 100 variables. Even after applying a stepwise selection technique, we often end up with a few dozens of significant variables. This is the reason why field expert selection methods are jointly used with statistical selection techniques. Financial

Data and input variable selection methods

For the experiment, we used the yearly financial data consisting of 1000 Korean manufacturing firms with an asset size of US$1 million to US$7 million in the fiscal year 2000–2002. The number of bankrupt firms and the number of healthy (non-bankrupt) firms are equally balanced as 500:500 in order to allow the learning occur within the AI techniques such as neural networks and decision trees. One hundred and thirty financial variables were available in total. To reduce the number of variables

Conclusion and discussion

The current research considered corporate bankruptcy problem and suggested a CBR method. The former CBR studies have used the Euclidean distance in measuring the proximity between two records. The ideal situation for using the Euclidean distance is that all variables are statistically independent of each other and thus the correlation coefficients of all pairs are zeros. This rarely happens in the real world data analysis. Thus, we introduce the Mahalanobis distance which incorporates the

References (35)

  • W. Treacy et al.

    Credit risk rating at large US banks

    Journal of Banking and Finance

    (2000)
  • A. Tsakonas et al.

    Bankruptcy prediction with neural logic networks by means of grammar-guided genetic programming

    Expert Systems with Applications

    (2006)
  • G. Zhang et al.

    Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis

    European Journal of Operational Research

    (1999)
  • Abdelwahed, T., & Amir, E. M. (2005). New evolutionary bankruptcy forecasting model based on genetic algorithms and...
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    Journal of Finance

    (1968)
  • M. Anandarajan et al.

    Bankruptcy prediction of financially stressed firms: An examination of the predictive accuracy of artificially neural networks

    International Journal of Intelligent Systems in Accounting, Finance and Management

    (2001)
  • A.F. Atiya

    Bankruptcy prediction for credit risk using neural networks: A survey and new results

    IEEE Transactions on Neural Networks

    (2001)
  • Cited by (100)

    View all citing articles on Scopus
    View full text