Solvency prediction for small and medium enterprises in banking
Introduction
It is well known that Basel II and Basel III [1] established that banks should develop credit risk models that are specific for SMEs [2]. Statistical models to measure the default probability in credit risk have a long history but, understandably, they have received an increasing attention in correspondence with the Basel II framework delivery in the 2000. There is a wide statistical literature in this field focused on risk prediction methods for SMEs (e.g. [2], [3], [4], [5]). More precisely, machine learning methods, data mining algorithms, artificial intelligence and operations research have been introduced to discriminate between safe and risky debtors [6], [7]. It is recognized that statistical models based on parametric techniques, such as the logistic regression, are considered benchmark models in credit risk analysis for their reliability and ease of interpretation.
Recent research is focused on predicting the expected profitability of investing in peer-to-peer loans and in general in lending credit scoring model (see e.g. [8]); the models proposed can be considered of course also to estimate credit risk in SMEs. When constructing a credit scoring model, three common problems often arise: low frequency of defaults, non-linear relationship between the response and predictors, and outliers in predictor variables.
Our attempt is to predict the probability of default of SMEs by studying parametric models (logistic regression (GLM), linear discriminant analysis (LDA) and generalized extreme value model (BGEV, [9]), non-parametric models (Classification Tree (CT), k-nearest neighbour algorithm (k-NN)) and ensemble techniques (bagging and boosting). The models are compared using key performance measures able to assess the predictive capability, discriminatory power and the stability of the results. In order to improve the predictive performance of the models, we introduced a multivariate outlier analysis [10] based on the local outlier factor technique.
Empirical evidence shows that ensemble methods and parametric models based on the BGEV perform better with respect to classical credit risk models based on logistic regression. Furthermore, BGEV model can be really appreciated also by regulators because it assumes that there is linearity in the parameters and that the results are easily interpretable.
For decision maker and practitioners the paper shows an extreme value approach based on the BGEV which is appealing particularly when default is a rare event. The BGEV approach on the data at hand outperforms the Logistic Regression model.
The paper is structured as follows: Section 2 shows the state of the art in credit risk modeling; Section 3 describes the multivariate outlier index that is later included in the analysis to improve the performance of the models; Section 4 reports a description of the models tested on the real data set as Random Forest (RF), BGEV and Gradient Boosting Machines (GBM), Section 5 describes the performance indexes employed for model selection, while Sections 6 and 7 contain the data set description and the empirical evidence obtained, respectively. In Section 8 conclusions and further ideas of research are discussed.
Section snippets
Literature review
The measure of credit risk for SMEs has become a major challenge for financial institutions and many statistical models have been proposed in literature, tracing back to Altman [11] and Beaver [12]. Historically, research in this field started with the introduction of primitive reduced models to later develop through structural ones [13]. In recent years, progressively more sophisticated models have been implemented [6], [7]. A growing interest especially in credit risk assessment for SMEs has
Multivariate outlier detection in credit risk analysis
The role of credit risk analysis is to assess and evaluate the potential credit risk with any customer or borrower, and to advise on decisions about granting credit, providing loans and borrowing facilities. Credit risk analysis is quite significant due to making correct business decisions and managing large loan portfolios. This is also a highly important problem in practice since banks can take precautions against risks by, for example, limiting the maximum amount of debt contracted by
Predictive models
Parametric models for credit risk estimation for SMEs can be grouped in Logistic Regression (GLM), Linear Discriminant Analysis (LDA) and, if default is a rare event, the binary generalized extreme value model (BGEV) can be a flexible approach to derive a coherent estimation of the probability of default without resorting to oversampling techniques which typically affect the final results.
Non-parametric models for binary outcome based on a single classifier, as the Classification Tree (CT) can
Credit risk model assessment
In classification problems, a model is evaluated through its capability to correctly predict future outcomes of a target variable. Much research has been done on the statistical metrics to assess the forecasting accuracy of a given model in terms of both measures of classification performance and measures of ranking performance.
To evaluate the accuracy of a model in forecasting corporate defaults, and then providing a relative ranking of models performance, a performance metric should not only
Data description
A real data set, provided by UniCredit bank, concerning credit risk of Italian SMEs has been analyzed to predict the PD. The available sample refers to enterprises with annual revenue not exceeding 5 million euros in January 2015. Although information about the enterprises was collected throughout the years, the data available for the analysis consider only two years.
Information about SMEs includes generic data (such as dimension, legal form, and default status), financial ratios derived from
Results
This section shows the results obtained running the models described in Section 4 on the data sets at hand. The ability of the models under comparison is studied inside a cross-validation framework by randomly partitioning the data sets into a training and validation set. The two disjoint sets include the 70% and 30% of the data, respectively, reflecting the a priori probability of default rate of the entire data set. Further validation approaches have been tested on the data (k-fold, k = 5) and
Concluding remarks
This paper shows that multivariate outlier information improves the results of the predictive models in bankruptcy prediction of small and medium enterprises. Business experts can also use the LOF value to identify anomalous enterprise behavior.
Another significant result found in this paper is the performance obtained using BGEV. The out of sample results obtained using BGEV are similar in terms of predictive performance to ensemble methods as Random Forest. Moreover, modeling risks through the
Acknowledgments
The paper is the result of a collaboration among the authors. The paper has been structured from a methodological point of views and written by Silvia Figini. This research was supported by the RIDS. The authors acknowledge the Group Chief Risk Officer and the Head of Group Credit Risk Governance of UniCredit to provide real data for sake of research. The authors acknowledge the help of the referees for helping to improve the paper.
Silvia Figini – National Full Professor qualification in Applied Statistics (Abilitazione Scientifica Nazionale 2013). From 2014 Associate Professor, University of Pavia, Department of Political and Social Sciences, Italy. 2006: PhD in Statistics Bocconi University Milan. More details available at: http://www-3.unipv.it/webdsps/en/docente.php?id=figini.
References (39)
- et al.
Recent developments in consumer credit risk assessment
Eur. J. Oper. Res.
(2007) - et al.
The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending
Decis. Support. Syst.
(2016) - et al.
Credit risk measurement and early warning of SMEs: an empirical study of listed SMEs in China
Decis. Support. Syst.
(2010) - et al.
Classification methods applied to credit scoring: systematic review and overall comparison
Surv. Oper. Res. Manag. Sci.
(2016) - et al.
Optimal prediction pools
J. Econ.
(2011) - et al.
Multivariate outlier detection in exploration geochemistry
Comput. Geosci.
(2005) Sound practices for backtesting counterparty credit risk models - final document
- et al.
Modelling credit risk for SMEs: evidence from the US market
Abacus
(2007) - et al.
Using economic-financial ratios for small enterprise default prediction modeling: an empirical analysis
- et al.
Random survival forests models for SME credit risk measurement
Methodol. Comput. Appl. Probab.
(2009)
Predicting default of a small business using different definitions of financial distress
J. Oper. Res. Soc.
Credit Risk Assessment: The New Lending System for Borrowers, Lenders, and Investors
Generalized extreme value regression for binary rare events data: an application to credit defaults
LOF: identifying density-based local outliers
SIGMOD Rec.
Financial ratios, discriminant analysis and the prediction of corporate bankruptcy
J. Financ.
The information content of annual earnings announcements
J. Account. Res.
Measuring Corporate Default Risk
Credit scoring, statistical techniques and evaluation criteria: a review of the literature
Intell. Syst. Account. Financ. Manag.
Benchmarking state-of-the-art classification algorithms for credit scoring
J. Oper. Res. Soc.
Cited by (30)
Representing and discovering heterogeneous interactions for financial risk assessment of SMEs
2024, Expert Systems with ApplicationsA Machine Learning-based DSS for mid and long-term company crisis prediction
2021, Expert Systems with ApplicationsCitation Excerpt :In this section, we discuss the results of our model, which is described in Section 4. We also compare our results with the those of Son et al. (2019) and Figini et al. (2017). The performance is measured by using the following metrics: AUC, F1, Matt Coefficient, Log-Loss, Precision and Recall.
Capital shortfall: A multicriteria decision support system for the identification of weak banks
2021, Decision Support SystemsCitation Excerpt :Thus, our work complements earlier studies that propose models for assessing the soundness and creditworthiness of individual banks from a micro-prudential perspective [14,15], predicting banking crises at the country-level [16], modeling and analyzing scenarios in bank stress testing [17], and measuring systemic risk [18]. More broadly, our work also relates to efforts to develop decision support systems that measure credit risk and predict the bankruptcy of non-financial firms [19–24]. The difference is that these models are used by banks to assess their clients, whereas the model that we propose could be used by regulators to assess the banking institutions themselves.
Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection
2021, Decision Support SystemsDo you know your customer? Bank risk assessment based on machine learning
2020, Applied Soft Computing JournalCitation Excerpt :Some experiments were performed using data from several large banks through a combination of three techniques: decision trees, random forests, and logistical regression. The authors in [12] and [7] focused on non-bank firms to predict credit risk. The authors in [12] applied boosting, bagging, and random forest to observe credit risk in small and medium enterprises.
A novel similarity classifier with multiple ideal vectors based on k-means clustering
2018, Decision Support SystemsCitation Excerpt :One common type of problem in machine learning is classification, which means using characteristics of observations to assign these observations to discrete classes [4]. Classification algorithms support the decision-making for enterprises and individuals in numerous applications, including medical diagnostics [28], product positioning [26], recommendation systems [21] and sentiment analysis in social media [13]. A common interest in these algorithms in finance is with respect to the evaluation of the creditworthiness of customers and for the credit granting decision [13,19,42].
Silvia Figini – National Full Professor qualification in Applied Statistics (Abilitazione Scientifica Nazionale 2013). From 2014 Associate Professor, University of Pavia, Department of Political and Social Sciences, Italy. 2006: PhD in Statistics Bocconi University Milan. More details available at: http://www-3.unipv.it/webdsps/en/docente.php?id=figini.
Federico Bonelli, PhD in Mathematics (Politecnico di Milano), Head of RIDS Data Science Laboratory.
Emanuele Giovannini, Head of Italy Credit Risk Modelling, UniCredit Bank.