Introduction
Literature review
Data preprocessing and unbalanced data set transformation
Data preparation and preprocessing
Index | Label | Description |
---|---|---|
Y | loan_default | Payment default in the first EMI on due date |
X1 | UniqueID | Identifier for customers |
X2 | disbursed_amount | Amount of loan disbursed |
X3 | asset_cost | Cost of the Asset |
X4 | ltv | Loan to Value of the asset |
X5 | branch_id | Branch where the loan was disbursed |
X6 | supplier_id | Vehicle dealer where the loan was disbursed |
X7 | manufacturer_id | Vehicle manufacturer (Hero, Honda, TVS) |
X8 | Current_pincode_ID | Current pincode of the customer |
X9 | Date.of.Birth | Date of birth of the customer |
X10 | Employment.Type | Employment type of the customer (Salaried/Self Employed) |
X11 | DisbursalDate | Date of disbursement |
X12 | State_ID | State of disbursement |
X13 | Employee_code_ID | Employee of the organization who logged the disbursement |
X14 | MobileNo_Avl_Flag | If Mobile no. was shared by the customer then flag as 1 |
X15 | Aadhar_flag | If aadhar was shared by the customer then flag as 1 |
X16 | PAN_flag | If pan was shared by the customer then flag as 1 |
X17 | VoterID_flag | If voter was shared by the customer then flag as 1 |
X18 | Driving_flag | If DL was shared by the customer then flagged as 1 |
X19 | Passport_flag | If passport was shared by the customer then flag as 1 |
X20 | PERFORM_CNS.SCORE | Bureau Score |
X21 | PERFORM_CNS.SCORE.DESCRIPTION | Bureau score description |
X22 | PRI.NO.OF.ACCTS | Count of total loans taken by the customer at the time of first disbursement |
X23 | PRI.ACTIVE.ACCTS | Count of active loans taken by the customer at the time of first disbursement |
X24 | PRI.OVERDUE.ACCTS | Count of default accounts at the time of first disbursement |
X25 | PRI.CURRENT.BALANCE | Total principal outstanding of the active loans at the time of first disbursement |
X26 | PRI.SANCTIONED.AMOUNT | Total amount that was sanctioned for all the loans at the time of first disbursement |
X27 | PRI.DISBURSED.AMOUNT | Total amount that was disbursed for all the loans at the time of first disbursement |
X28 | SEC.NO.OF.ACCTS | Count of total loans taken by the customer at the time of second disbursement |
X29 | SEC.ACTIVE.ACCTS | Count of active loans taken by the customer at the time of second disbursement |
X30 | SEC.OVERDUE.ACCTS | Count of default accounts at the time of disbursement |
X31 | SEC.CURRENT.BALANCE | Total principal outstanding of the active loans at the time of second disbursement |
X32 | SEC.SANCTIONED.AMOUNT | Total amount that was sanctioned for all the loans at the time of second disbursement |
X33 | SEC.DISBURSED.AMOUNT | Total amount that was disbursed for all the loans at the time of second disbursement |
X34 | PRIMARY.INSTAL.AMT | Equated Monthly Installment (EMI) Amount of the primary loan |
X35 | SEC.INSTAL.AMT | EMI Amount of the secondary loan |
X36 | NEW.ACCTS.IN.LAST.SIX.MONTHS | New loans taken by the borrower in last 6 months before the disbursement |
X37 | DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS | Loans defaulted in the last 6 months |
X38 | AVERAGE.ACCT.AGE | Average loan tenure |
X39 | CREDIT.HISTORY.LENGTH | Time since first loan |
X40 | NO.OF_INQUIRIES | Enquiries done by the customer for loans |
Risk range of bureau score description | Score |
---|---|
No Bureau History Available | 0 |
Not Scored: Sufficient History Not Available | 0 |
Not Scored: Not Enough Info available on the customer | 0 |
Not Scored: No Activity seen on the customer (Inactive) | 0 |
Not Scored: No Updates available in last 36 months | 0 |
Not Scored: Only a Guarantor | 0 |
Not Scored: More than 50 active Accounts found | 0 |
M-Very High Risk | 1 |
L-Very High Risk | 2 |
K-High Risk | 3 |
J-High Risk | 4 |
I-Medium Risk | 5 |
H-Medium Risk | 6 |
G-Low Risk | 7 |
F-Low Risk | 8 |
E-Low Risk | 9 |
D-Very Low Risk | 10 |
C-Very Low Risk | 11 |
B-Very Low Risk | 12 |
A-Very Low Risk | 13 |
Index | Label | Description |
---|---|---|
X41 | Loan_to_asset_ratio | Ratio of loan disbursed amount to the asset cost |
X42 | Total_no_of_accts | Count of total loans taken by the customer at the first and second time of disbursement |
X43 | Pri_inacitve_accts | Count of total inactive loans taken by the customer at the first time of disbursement |
X44 | Sec_inactive_accts | Count of total invalid loans taken by the customer at the second time of disbursement |
X45 | Total_inactives_accts | Count of total invalid loans taken by the customer at the first and second time of disbursement |
X46 | Total_actives_accts | Count of total active loans taken by the customer at the first and second time of disbursement |
X47 | Total_current_balance | Total principal outstanding amount of the active loans at the first and second time of disbursement |
X48 | Total_sanctioned_amount | Total amount that was not approved for all the loans at the first and second time of disbursement |
X49 | Total_disbursed_amount | Total amount that was disbursed for all the loans at the first and second time of disbursement |
X50 | Total_instal_amt | EMI amount of the primary and secondary loan |
X51 | Pri_loan_proportions | Proportion of the primary total loans to the principal |
X52 | Sec_loan_proportions | Proportion of the secondary total loan to the principal |
X53 | Active_to_inactive_act_ratio | Ratio of the customer’s total loans to the invalid loans |
Pre-screening credit risk assessment indexes
Index | Label | Index | Label |
---|---|---|---|
Z1 | Aadhar_flag | Z24 | VoterID_flag |
Z2 | DELINQUENT.ACCTS.IN.LAST.SIX.MONTHS | Z25 | age |
Z3 | Driving_flag | Z26 | asset_cost |
Z4 | Employment.Type | Z27 | average_acct_age_month |
Z5 | NEW.ACCTS.IN.LAST.SIX.MONTHS | Z28 | credit_history_length_month |
Z6 | NO.OF_INQUIRIES | Z29 | credit_risk_grade |
Z7 | PAN_flag | Z30 | disbursal_months_passed |
Z8 | PERFORM_CNS.SCORE | Z31 | disbursed_amount |
Z9 | PRI.ACTIVE.ACCTS | Z32 | ltv |
Z10 | PRI.CURRENT.BALANCE | Z33 | loan_to_asset_ratio |
Z11 | PRI.DISBURSED.AMOUNT | Z34 | total_no_of_accts |
Z12 | PRI.NO.OF.ACCTS | Z35 | pri_inactive_accts |
Z13 | PRI.OVERDUE.ACCTS | Z36 | sec_inactive_accts |
Z14 | PRI.SANCTIONED.AMOUNT | Z37 | total_inactive_accts |
Z15 | PRIMARY.INSTAL.AMT | Z38 | total_active_accts |
Z16 | Passport_flag | Z39 | total_current_balance |
Z17 | SEC.ACTIVE.ACCTS | Z40 | total_sanctioned_amount |
Z18 | SEC.CURRENT.BALANCE | Z41 | total_disbursed_amount |
Z19 | SEC.DISBURSED.AMOUNT | Z42 | total_instal_amt |
Z20 | SEC.INSTAL.AMT | Z43 | pri_loan_proportion |
Z21 | SEC.NO.OF.ACCTS | Z44 | sec_loan_proportion |
Z22 | SEC.OVERDUE.ACCTS | Z45 | active_to_inactive_act_ratio |
Z23 | SEC.SANCTIONED.AMOUNT |
Transforming unbalanced data set
Smote-Tomek Link algorithm
Unbalanced data set transformation based on Smote-Tomek Link algorithm
Actual category | Prediction category | |
---|---|---|
Positive | Negative | |
Positive | TP | FN |
Negative | FP | TN |
Data set | F1 | G-means | MCC | AUC |
---|---|---|---|---|
Panel A: LR model | ||||
Unprocessed data set | 0.012352 | 0.079060 | 0.032097 | 0.502130 |
Smote algorithm | 0.620625 | 0.618822 | 0.237656 | 0.618827 |
Smote-Tomek Link algorithm | 0.624523 | 0.620736 | 0.239479 | 0.619983 |
Panel B: RF model | ||||
Unprocessed data set | 0.039540 | 0.143128 | 0.066204 | 0.507563 |
Smote algorithm | 0.842869 | 0.842489 | 0.684974 | 0.842489 |
Smote-Tomek Link algorithm | 0.851321 | 0.848532 | 0.670156 | 0.857371 |
Feature selection method of credit risk assessment index
Improved Filter-Wrapper feature selection method
γk | Description |
---|---|
1.0 | Uk-1 is just as important as Uk |
1.2 | Uk-1 is slightly more important than Uk |
1.4 | Uk-1 is obviously more important than Uk |
1.6 | Uk-1 is highly more important than Uk |
1.8 | Uk-1 is extremely more important than Uk |
Analysis of selection of credit risk assessment indexes
Comprehensive ranking of features in Filter stage
Feature | Relief | MIC | Quasi-separable method | Fusion evaluation value | Rank |
---|---|---|---|---|---|
Z1 | 0.656250 | 0.057894 | 0.036012 | 0.287389 | 7 |
Z2 | 0.157813 | 0.002484 | 0.166743 | 0.118088 | 26 |
Z3 | 0.000000 | 0.014015 | 0.084077 | 0.031570 | 45 |
Z4 | 1.000000 | 0.116463 | 0.741940 | 0.672210 | 1 |
Z5 | 0.158703 | 0.031215 | 0.157836 | 0.123396 | 24 |
Z6 | 0.159447 | 0.000837 | 0.000001 | 0.063307 | 44 |
Z7 | 0.218750 | 0.034355 | 0.206417 | 0.164030 | 16 |
Z8 | 0.505691 | 0.378509 | 0.015686 | 0.309199 | 5 |
Z9 | 0.156034 | 0.027407 | 0.133023 | 0.113114 | 27 |
Z10 | 0.156250 | 0.096885 | 0.253947 | 0.172153 | 15 |
Z11 | 0.156250 | 0.179913 | 0.004739 | 0.112797 | 29 |
Z12 | 0.156013 | 0.020293 | 0.065588 | 0.088918 | 40 |
Z13 | 0.158750 | 0.000000 | 0.276819 | 0.154069 | 18 |
Z14 | 0.156250 | 0.203496 | 0.048480 | 0.133697 | 20 |
Z15 | 0.156256 | 0.085870 | 0.502350 | 0.251028 | 9 |
Z16 | 0.125000 | 0.001669 | 0.131155 | 0.093150 | 34 |
Z17 | 0.159247 | 0.000472 | 0.034478 | 0.074495 | 41 |
Z18 | 0.156297 | 0.039070 | 0.000001 | 0.072564 | 42 |
Z19 | 0.156253 | 0.041563 | 0.158975 | 0.125645 | 23 |
Z20 | 0.156250 | 0.034128 | 0.000000 | 0.071187 | 43 |
Z21 | 0.161916 | 0.002349 | 0.079134 | 0.090790 | 37 |
Z22 | 0.168713 | 0.001104 | 0.071656 | 0.090671 | 38 |
Z23 | 0.156263 | 0.043679 | 0.165562 | 0.128402 | 21 |
Z24 | 0.250000 | 0.021885 | 0.136613 | 0.149953 | 19 |
Z25 | 0.393644 | 0.059518 | 0.071771 | 0.195738 | 12 |
Z26 | 0.156303 | 0.128334 | 0.000007 | 0.097089 | 33 |
Z27 | 0.166422 | 0.007934 | 0.099641 | 0.100868 | 32 |
Z28 | 0.160613 | 0.033051 | 0.059533 | 0.092245 | 36 |
Z29 | 0.717303 | 0.157933 | 0.413192 | 0.463379 | 3 |
Z30 | 0.356469 | 0.064443 | 0.014473 | 0.163493 | 17 |
Z31 | 0.156378 | 0.547479 | 1.000000 | 0.541956 | 2 |
Z32 | 0.191141 | 0.425568 | 0.230247 | 0.268431 | 8 |
Z33 | 0.189450 | 0.106937 | 0.229650 | 0.180038 | 13 |
Z34 | 0.156056 | 0.019809 | 0.067408 | 0.089402 | 39 |
Z35 | 0.155972 | 0.021773 | 0.479931 | 0.225917 | 11 |
Z36 | 0.159753 | 0.003789 | 0.118671 | 0.103365 | 30 |
Z37 | 0.155953 | 0.021338 | 0.497559 | 0.231602 | 10 |
Z38 | 0.156181 | 0.027768 | 0.132485 | 0.113094 | 28 |
Z39 | 0.156253 | 0.096018 | 0.264068 | 0.175253 | 14 |
Z40 | 0.156250 | 0.202428 | 0.522285 | 0.289617 | 6 |
Z41 | 0.156250 | 0.179345 | 0.051005 | 0.127895 | 22 |
Z42 | 0.156256 | 0.085308 | 0.052022 | 0.102401 | 31 |
Z43 | 0.156250 | 0.112157 | 0.000000 | 0.092622 | 35 |
Z44 | 0.156350 | 0.035648 | 0.151286 | 0.121523 | 25 |
Z45 | 0.160081 | 1.000000 | 0.038522 | 0.350729 | 4 |
Feature selection in Wrapper stage
Credit risk assessment of personal auto loans using PSO-XGBoost model
XGBoost model
PSO-XGBoost model
Analysis of credit risk assessment of personal auto loans
(1) Data set partitioning
Data set | Number of features | Number of samples | Positive/negative ratio | Missing value |
---|---|---|---|---|
Training | 34 | 130,641 | 1.00 | NA |
Test | 34 | 55,989 | 1.00 | NA |
Performance evaluation of PSO-XGBoost model
Actual category | Prediction category | |
---|---|---|
Positive example (Defaulted: “1”) | Negative example (Non-defaulting: “0”) | |
Positive example (Default: “1”) | TP | FN |
Negative example (No-default: “0”) | FP | TN |
Values | PSO-XGBoost | XGBoost | RF | LR | ||||
---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | |
1 (Defaulted) | 21,392 | 6704 | 20,745 | 7351 | 20,977 | 7119 | 19,369 | 8727 |
0 (No-default) | 2753 | 25,140 | 2794 | 25,099 | 3702 | 24,191 | 7699 | 20,194 |
Evaluation index | PSO-XGBoost | XGBoost | RF | LR |
---|---|---|---|---|
Accuracy | 0.8311 | 0.7888 | 0.8067 | 0.7066 |
Precision | 0.8860 | 0.7613 | 0.8500 | 0.7156 |
Recall | 0.7614 | 0.7384 | 0.7466 | 0.6894 |
Time complexity | 9 s | 5 s | 6 s | 3 s |
Space complexity | 77 M | 74.3 M | 66 M | 36 M |
Further analysis of model performance
Data processing and feature selection
Index | Label | Index | Label |
---|---|---|---|
Z1 | main_account_loan_no | Z16 | Driving_flag |
Z2 | main_account_active_loan_no | Z17 | passport_flag |
Z3 | main_account_overdue_no | Z18 | credit_score |
Z4 | main_account_outstanding_loan | Z19 | main_account_monthly_payment |
Z5 | main_account_sanction_loan | Z20 | sub_account_monthly_payment |
Z6 | main_account_disbursed_loan | Z21 | last_six_month_new_loan_no |
Z7 | sub_account_loan_no | Z22 | last_six_month_defaulted_no |
Z8 | sub_account_active_loan_no | Z23 | average_age |
Z9 | sub_account_overdue_no | Z24 | credit_history |
Z10 | sub_account_outstanding_loan | Z25 | enquirie_no |
Z11 | sub_account_sanction_loan | Z26 | loan_to_asset_ratio |
Z12 | sub_account_disbursed_loan | Z27 | total_account_loan_no |
Z13 | disbursed_amount | Z28 | sub_account_inactive_loan_no |
Z14 | asset_cost | Z29 | total_inactive_loan_no |
Z15 | ltv | Z30 | main_account_inactive_loan_no |
The decision-making process of risk assessment
Data set | Number of features | Number of samples | Positive/negative ratio | Missing value |
---|---|---|---|---|
Training | 30 | 139,802 | 1.00 | NA |
Test | 30 | 59,915 | 1.00 | NA |
Evaluation index | PSO-XGBoost | XGBoost | RF | LR |
---|---|---|---|---|
Accuracy | 0.7805 | 0.7458 | 0.7733 | 0.6527 |
Precision | 0.7827 | 0.7498 | 0.7645 | 0.6418 |
Recall | 0.7745 | 0.7353 | 0.7676 | 0.6853 |
Time complexity | 24 s | 12 s | 13 s | 4 s |
Space complexity | 116.2 M | 110.4 M | 111.7 M | 54.4 M |