Elsevier

Decision Support Systems

Volume 102, October 2017, Pages 91-97
Decision Support Systems

Solvency prediction for small and medium enterprises in banking

https://doi.org/10.1016/j.dss.2017.08.001Get rights and content

Highlights

  • Parametric and ensemble models to estimate default probability for SMEs

  • Local Outlier Factor (LOF) technique for credit risk modeling

  • Multivariate outlier identification to improve predictive performances of the models

  • Empirical evidence on a real data set provided by UniCredit bank

Abstract

This paper describes novel approaches to predict default for SMEs. Multivariate outlier detection techniques based on Local Outlier Factor are proposed to improve the out of sample performance of parametric and non-parametric models for credit risk estimation. The models are tested on a real data set provided by UniCredit Bank. The results at hand confirm that our proposal improves the results in terms of predictive capability and support financial institutions to make decision. Single and ensemble models are compared and in particular, inside parametric models, the generalized extreme value regression model is proposed as a suitable competitor of the logistic regression.

Introduction

It is well known that Basel II and Basel III [1] established that banks should develop credit risk models that are specific for SMEs [2]. Statistical models to measure the default probability in credit risk have a long history but, understandably, they have received an increasing attention in correspondence with the Basel II framework delivery in the 2000. There is a wide statistical literature in this field focused on risk prediction methods for SMEs (e.g. [2], [3], [4], [5]). More precisely, machine learning methods, data mining algorithms, artificial intelligence and operations research have been introduced to discriminate between safe and risky debtors [6], [7]. It is recognized that statistical models based on parametric techniques, such as the logistic regression, are considered benchmark models in credit risk analysis for their reliability and ease of interpretation.

Recent research is focused on predicting the expected profitability of investing in peer-to-peer loans and in general in lending credit scoring model (see e.g. [8]); the models proposed can be considered of course also to estimate credit risk in SMEs. When constructing a credit scoring model, three common problems often arise: low frequency of defaults, non-linear relationship between the response and predictors, and outliers in predictor variables.

Our attempt is to predict the probability of default of SMEs by studying parametric models (logistic regression (GLM), linear discriminant analysis (LDA) and generalized extreme value model (BGEV, [9]), non-parametric models (Classification Tree (CT), k-nearest neighbour algorithm (k-NN)) and ensemble techniques (bagging and boosting). The models are compared using key performance measures able to assess the predictive capability, discriminatory power and the stability of the results. In order to improve the predictive performance of the models, we introduced a multivariate outlier analysis [10] based on the local outlier factor technique.

Empirical evidence shows that ensemble methods and parametric models based on the BGEV perform better with respect to classical credit risk models based on logistic regression. Furthermore, BGEV model can be really appreciated also by regulators because it assumes that there is linearity in the parameters and that the results are easily interpretable.

For decision maker and practitioners the paper shows an extreme value approach based on the BGEV which is appealing particularly when default is a rare event. The BGEV approach on the data at hand outperforms the Logistic Regression model.

The paper is structured as follows: Section 2 shows the state of the art in credit risk modeling; Section 3 describes the multivariate outlier index that is later included in the analysis to improve the performance of the models; Section 4 reports a description of the models tested on the real data set as Random Forest (RF), BGEV and Gradient Boosting Machines (GBM), Section 5 describes the performance indexes employed for model selection, while Sections 6 and 7 contain the data set description and the empirical evidence obtained, respectively. In Section 8 conclusions and further ideas of research are discussed.

Section snippets

Literature review

The measure of credit risk for SMEs has become a major challenge for financial institutions and many statistical models have been proposed in literature, tracing back to Altman [11] and Beaver [12]. Historically, research in this field started with the introduction of primitive reduced models to later develop through structural ones [13]. In recent years, progressively more sophisticated models have been implemented [6], [7]. A growing interest especially in credit risk assessment for SMEs has

Multivariate outlier detection in credit risk analysis

The role of credit risk analysis is to assess and evaluate the potential credit risk with any customer or borrower, and to advise on decisions about granting credit, providing loans and borrowing facilities. Credit risk analysis is quite significant due to making correct business decisions and managing large loan portfolios. This is also a highly important problem in practice since banks can take precautions against risks by, for example, limiting the maximum amount of debt contracted by

Predictive models

Parametric models for credit risk estimation for SMEs can be grouped in Logistic Regression (GLM), Linear Discriminant Analysis (LDA) and, if default is a rare event, the binary generalized extreme value model (BGEV) can be a flexible approach to derive a coherent estimation of the probability of default without resorting to oversampling techniques which typically affect the final results.

Non-parametric models for binary outcome based on a single classifier, as the Classification Tree (CT) can

Credit risk model assessment

In classification problems, a model is evaluated through its capability to correctly predict future outcomes of a target variable. Much research has been done on the statistical metrics to assess the forecasting accuracy of a given model in terms of both measures of classification performance and measures of ranking performance.

To evaluate the accuracy of a model in forecasting corporate defaults, and then providing a relative ranking of models performance, a performance metric should not only

Data description

A real data set, provided by UniCredit bank, concerning credit risk of Italian SMEs has been analyzed to predict the PD. The available sample refers to enterprises with annual revenue not exceeding 5 million euros in January 2015. Although information about the enterprises was collected throughout the years, the data available for the analysis consider only two years.

Information about SMEs includes generic data (such as dimension, legal form, and default status), financial ratios derived from

Results

This section shows the results obtained running the models described in Section 4 on the data sets at hand. The ability of the models under comparison is studied inside a cross-validation framework by randomly partitioning the data sets into a training and validation set. The two disjoint sets include the 70% and 30% of the data, respectively, reflecting the a priori probability of default rate of the entire data set. Further validation approaches have been tested on the data (k-fold, k = 5) and

Concluding remarks

This paper shows that multivariate outlier information improves the results of the predictive models in bankruptcy prediction of small and medium enterprises. Business experts can also use the LOF value to identify anomalous enterprise behavior.

Another significant result found in this paper is the performance obtained using BGEV. The out of sample results obtained using BGEV are similar in terms of predictive performance to ensemble methods as Random Forest. Moreover, modeling risks through the

Acknowledgments

The paper is the result of a collaboration among the authors. The paper has been structured from a methodological point of views and written by Silvia Figini. This research was supported by the RIDS. The authors acknowledge the Group Chief Risk Officer and the Head of Group Credit Risk Governance of UniCredit to provide real data for sake of research. The authors acknowledge the help of the referees for helping to improve the paper.

Silvia Figini – National Full Professor qualification in Applied Statistics (Abilitazione Scientifica Nazionale 2013). From 2014 Associate Professor, University of Pavia, Department of Political and Social Sciences, Italy. 2006: PhD in Statistics Bocconi University Milan. More details available at: http://www-3.unipv.it/webdsps/en/docente.php?id=figini.

References (39)

  • S. Lin et al.

    Predicting default of a small business using different definitions of financial distress

    J. Oper. Res. Soc.

    (2012)
  • C.R. Abrahams et al.

    Credit Risk Assessment: The New Lending System for Borrowers, Lenders, and Investors

    (2009)
  • R. Calabrese et al.

    Generalized extreme value regression for binary rare events data: an application to credit defaults

  • M.M. Breunig et al.

    LOF: identifying density-based local outliers

    SIGMOD Rec.

    (2000)
  • E.I. Altman

    Financial ratios, discriminant analysis and the prediction of corporate bankruptcy

    J. Financ.

    (1968)
  • W.H. Beaver

    The information content of annual earnings announcements

    J. Account. Res.

    (1968)
  • D. Duffie

    Measuring Corporate Default Risk

    (2011)
  • H.A. Abdou et al.

    Credit scoring, statistical techniques and evaluation criteria: a review of the literature

    Intell. Syst. Account. Financ. Manag.

    (2011)
  • B. Baesens et al.

    Benchmarking state-of-the-art classification algorithms for credit scoring

    J. Oper. Res. Soc.

    (2003)
  • Cited by (30)

    • A Machine Learning-based DSS for mid and long-term company crisis prediction

      2021, Expert Systems with Applications
      Citation Excerpt :

      In this section, we discuss the results of our model, which is described in Section 4. We also compare our results with the those of Son et al. (2019) and Figini et al. (2017). The performance is measured by using the following metrics: AUC, F1, Matt Coefficient, Log-Loss, Precision and Recall.

    • Capital shortfall: A multicriteria decision support system for the identification of weak banks

      2021, Decision Support Systems
      Citation Excerpt :

      Thus, our work complements earlier studies that propose models for assessing the soundness and creditworthiness of individual banks from a micro-prudential perspective [14,15], predicting banking crises at the country-level [16], modeling and analyzing scenarios in bank stress testing [17], and measuring systemic risk [18]. More broadly, our work also relates to efforts to develop decision support systems that measure credit risk and predict the bankruptcy of non-financial firms [19–24]. The difference is that these models are used by banks to assess their clients, whereas the model that we propose could be used by regulators to assess the banking institutions themselves.

    • Do you know your customer? Bank risk assessment based on machine learning

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      Some experiments were performed using data from several large banks through a combination of three techniques: decision trees, random forests, and logistical regression. The authors in [12] and [7] focused on non-bank firms to predict credit risk. The authors in [12] applied boosting, bagging, and random forest to observe credit risk in small and medium enterprises.

    • A novel similarity classifier with multiple ideal vectors based on k-means clustering

      2018, Decision Support Systems
      Citation Excerpt :

      One common type of problem in machine learning is classification, which means using characteristics of observations to assign these observations to discrete classes [4]. Classification algorithms support the decision-making for enterprises and individuals in numerous applications, including medical diagnostics [28], product positioning [26], recommendation systems [21] and sentiment analysis in social media [13]. A common interest in these algorithms in finance is with respect to the evaluation of the creditworthiness of customers and for the credit granting decision [13,19,42].

    View all citing articles on Scopus

    Silvia Figini – National Full Professor qualification in Applied Statistics (Abilitazione Scientifica Nazionale 2013). From 2014 Associate Professor, University of Pavia, Department of Political and Social Sciences, Italy. 2006: PhD in Statistics Bocconi University Milan. More details available at: http://www-3.unipv.it/webdsps/en/docente.php?id=figini.

    Federico Bonelli, PhD in Mathematics (Politecnico di Milano), Head of RIDS Data Science Laboratory.

    Emanuele Giovannini, Head of Italy Credit Risk Modelling, UniCredit Bank.

    View full text