An improved boosting based on feature selection for corporate bankruptcy prediction

https://doi.org/10.1016/j.eswa.2013.09.033Get rights and content

Highlights

  • There is no overall best method has been used in predicting corporate bankruptcy.

  • A new and improved Boosting, FS-Boosting, is proposed to predict corporate bankruptcy.

  • Two datasets are selected to demonstrate the effectiveness and feasibility of FS-Boosting.

  • Experimental results reveal that FS-Boosting could be used as an alternative method.

Abstract

With the recent financial crisis and European debt crisis, corporate bankruptcy prediction has become an increasingly important issue for financial institutions. Many statistical and intelligent methods have been proposed, however, there is no overall best method has been used in predicting corporate bankruptcy. Recent studies suggest ensemble learning methods may have potential applicability in corporate bankruptcy prediction. In this paper, a new and improved Boosting, FS-Boosting, is proposed to predict corporate bankruptcy. Through injecting feature selection strategy into Boosting, FS-Booting can get better performance as base learners in FS-Boosting could get more accuracy and diversity. For the testing and illustration purposes, two real world bankruptcy datasets were selected to demonstrate the effectiveness and feasibility of FS-Boosting. Experimental results reveal that FS-Boosting could be used as an alternative method for the corporate bankruptcy prediction.

Introduction

Predicting corporation bankruptcy is an important management science problem and its main goal is to differentiate those corporations with a probability of distress from healthy corporations. Moreover, incorrect decision-making in financial institutions may run into financial difficulty or distress and cause many social costs affecting owners or shareholders, managers, government, etc. As a result, how to predict corporate bankruptcy has become a hot topic for both industrial application and academic research (Li et al., 2012, Olson et al., 2012, Zhou et al., 2014).

As there are no mature theories of corporate bankruptcy, studies in corporate bankruptcy have largely been based on trial and error iterative processes of selecting features and predictive models (Li and Sun, 2009, Zhou et al., 2014). With the development of statistics, artificial intelligence (AI), some statistical methods and intelligent methods have been proposed for corporate bankruptcy prediction. The statistical methods applied in corporate bankruptcy prediction mainly include Linear Discriminant Analysis (LDA), Multivariate Discriminant Analysis (MDA), Quadratic Discriminant Analysis (QDA), Logistic Regression Analysis (LRA), and Factor Analysis (FA) (Li and Sun, 2009, Zmijewski, 1984). However, the problem with applying these statistical techniques to corporate bankruptcy prediction is that some assumptions, such as the multivariate normality assumptions for independent variables, are frequently violated in the practice, which makes these techniques theoretically invalid for finite samples (Shin & Lee, 2002). In recent years, many studies have demonstrated that intelligent techniques such as Artificial Neural Networks (ANN), Decision Tree (DT), Case-Based Reasoning (CBR), Support Vector Machine (SVM) can be used as alternative methods for corporate bankruptcy prediction (Olson et al., 2012, Tsai and Wu, 2008). In contrast with statistical techniques, intelligent techniques do not assume certain data distributions and automatically extract knowledge from training samples (Wang, Ma, Huang, & Xu, 2012).

However, there is still no overall best intelligent methods used in predicting corporate bankruptcy. The performance of prediction depends on the details of the problem, the data structure, the used characteristics, the extent to which it is possible to segregate the classes by using those characteristics, and the objective of the classification (Duéñez-Guzmán & Vose, 2013). Recently, integrating multiple predictors into an aggregated output, i.e. ensemble methods, has been demonstrated to be an efficient strategy for achieving high prediction performance, especially when the component predictors have different structures that lead to independent prediction errors (Breiman, 1996, Polikar, 2006). Moreover, latest studies have shown that such ensemble techniques have performed better than single intelligent technique in financial distress prediction (Deligianni and Kotsiantis, 2012, Sun and Li, 2012). In this paper, a novel and improved Boosting, FS-Boosting, is proposed to predict corporate bankruptcy. Through injecting feature selection strategy into Boosting, FS-Booting can get better performance as base learners in FS-Boosting could get more accuracy and diversity. For the testing and illustration purposes, two real world bankruptcy datasets were selected to demonstrate the effectiveness and feasibility of FS-Boosting. Among eight methods, FS-Boosting gets the best prediction accuracy on two datasets. Experimental results reveal that FS-Boosting can be used as an alternative method for the corporate bankruptcy prediction.

The remainder of the paper is organized as follows. In Section 2, we review the related work of corporate bankruptcy prediction. In Section 3, an improved boosting, FS-Boosting, is proposed for corporate bankruptcy prediction. In Section 4, we present the details of experiment design. Sections 5 Experimental results, 6 Discussion summarize and analyze empirical results and discussion. Based on the results and observations of these experiments, Section 7 draws conclusions and future research directions.

Section snippets

Related work

Many techniques have been proposed by prior research. In this study, we classified these techniques into statistical techniques and intelligent techniques.

Feature selection

Feature selection has been an active research area in machine learning and data mining communities (Liu & Motoda, 1998). The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. Feature selection reduces the dimensionality of feature space, and removes redundant, irrelevant, or noisy data. It brings the immediate effects for application: speeding up an algorithm, improving the data quality and thereof the

Real world bankruptcy dataset

Two real world datasets were used to test the performance of proposed method. The first dataset was collected by Pietruszkiewicz (2008). It contains 240 companies including 112 failed companies. The time period of dataset is from 1997 to 2001 before bankruptcy toke place. A total of 30 financial variables were used for prediction. The particular explanations of these financial variables are listed in Table 1.

Then second dataset is selected from the CD-ROM database (Shmueli, Patel, & Bruce, 2011

Experimental results

In this paper, an improved Boosting, FS-Boosting, is proposed to predict bankruptcy and reduce the loss of financial institutions. Table 4 summarizes the performance indicators of different methods on the two bankruptcy datasets, where the values following “±” are standard deviations.

Firstly, we consider the results of bankruptcy I dataset. As shown in Table 4, FS-Boosting gets the highest average accuracy, 81.50%. Two other ensemble methods, i.e. Bagging and Boosting, also get the higher

Discussion

In order to ensure that the assessment does not happen by chance, we tested the significance of above results by means of the paired t-test. The null hypothesis is “Model A’s mean of Average Accuracy / Type I Error / Type II Error = Model B’s mean Average Accuracy / Type I Error / Type II Error”. The alternative hypothesis is “Model A’s mean Average Accuracy / Type I Error / Type II Error  Model B’s mean Average Accuracy / Type I Error / Type II Error”. The column ‘improvement’ gives the relative

Conclusions and future directions

Owing to recent financial crisis and European debt crisis, bankruptcy prediction has become an increasingly important issue for financial institutions. Meanwhile ensemble learning is a powerful machine learning paradigm which has exhibited apparent advantages in many applications. In this paper, an improved Boosting, FS-Boosting, is proposed to predict bankruptcy and reduce the loss of financial institutions. FS-Boosting integrates the advantage of Boosting and feature selection to enhance the

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (Nos. 71071045, 71131002, 71101042), Specialized Research Fund for the Doctoral Program of Higher Education (20110111120014), the China Postdoctoral Science Foundation (2011M501041, 2013T60611), Special Fund of AnHui Province Key Research Institute of Humanities and Social Sciences at Universities (SK2013B400).

References (48)

  • K. Shin et al.

    A case-based approach using inductive indexing for corporate bond rating

    Decision Support Systems

    (2001)
  • K.S. Shin et al.

    A genetic algorithm application in bankruptcy prediction modeling

    Expert Systems with Applications

    (2002)
  • J. Sun et al.

    Financial distress prediction using support vector machines: Ensemble vs. individual

    Applied Soft Computing

    (2012)
  • T.C. Tang et al.

    Neural networks analysis in business failure prediction of Chinese importers: A between-countries approach

    Expert Systems with Applications

    (2005)
  • C.-F. Tsai et al.

    Using neural network ensembles for bankruptcy prediction and credit scoring

    Expert Systems with Applications

    (2008)
  • G. Wang et al.

    A comparative assessment of ensemble learning for credit scoring

    Expert Systems with Applications

    (2011)
  • G. Wang et al.

    Two credit scoring models based on dual strategy ensemble trees

    Knowledge-Based Systems

    (2012)
  • R.C. West

    A factor-analytic approach to bank condition

    Journal of Banking & Finance

    (1985)
  • Alfaro-Cid, E., Castillo, P. A., Esparcia, A., Sharman, K., Merelo, J. J., Prieto, A., Mora, A. M., & Laredo, J. L. J....
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    The journal of finance

    (1968)
  • W.H. Beaver

    Financial ratios as predictors of failure

    Journal of Accounting Research

    (1966)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • P. Buta

    Mining for financial knowledge with CBR

    AI Expert

    (1994)
  • Cited by (0)

    View full text