Skip to main content
Erschienen in: The International Journal of Advanced Manufacturing Technology 1/2022

26.04.2022 | ORIGINAL ARTICLE

Anomaly credit data detection based on enhanced Isolation Forest

verfasst von: Xiaodong Zhang, Yuan Yao, Congdong Lv, Tao Wang

Erschienen in: The International Journal of Advanced Manufacturing Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In view of the real-world problem of falsity and errors credit data, and the performance degradation of the credit evaluation model caused by these problems, we proposed an outlier detection algorithm, which considered two characteristics of class-imbalance and cost-sensitive in credit data. We use an anomaly detection model called EIF to optimize the credit evaluation models. EIF uses the EasyEnsemble algorithm to construct balanced data sets, and train an Isolation Forest model for anomaly detection by the balanced datasets with different disturbances. On the one hand, the balanced dataset ensures that the class-imbalance problem is solved by undersampling, on the other hand, each sub-model learns from the overall minority class samples in order to solve the cost-sensitive problem. Experiments were performed on UCI German dataset, and the test set with fake data was constructed by correlation. Compared with other anomaly detection algorithms in common credit evaluation models, the EIF-optimized model has a higher F1 score and a lower cost-sensitive error rate. In conclusion, the EIF model is effective in enhancing the performance of the credit evaluation model for forged credit datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ariza-Garzón MJ, Arroyo J, Caparrini A, Segovia-Vargas MJ (2020) Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8:64873–64890CrossRef Ariza-Garzón MJ, Arroyo J, Caparrini A, Segovia-Vargas MJ (2020) Explainability of a machine learning granting scoring model in peer-to-peer lending. Ieee Access 8:64873–64890CrossRef
2.
Zurück zum Zitat Vojtek M, Koèenda E (2006) Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver) 56(3–4):152–167 Vojtek M, Koèenda E (2006) Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver) 56(3–4):152–167
3.
Zurück zum Zitat Uddin MS, Chi G, Al Janabi M et al (2020) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 1(2):1–17 Uddin MS, Chi G, Al Janabi M et al (2020) Leveraging random forest in micro-enterprises credit risk modelling for accuracy and interpretability. Int J Financ Econ 1(2):1–17
4.
Zurück zum Zitat Chen QW, Wang W et al (2018) Class-imbalance credit scoring using Ext-GBDT ensemble. Application Research of Computers 35(2):421–427 Chen QW, Wang W et al (2018) Class-imbalance credit scoring using Ext-GBDT ensemble. Application Research of Computers 35(2):421–427
5.
Zurück zum Zitat Jabeur SB, Sadaaoui A, Sghaier A et al (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society 71(8):1161–1179CrossRef Jabeur SB, Sadaaoui A, Sghaier A et al (2020) Machine learning models and cost-sensitive decision trees for bond rating prediction. Journal of the Operational Research Society 71(8):1161–1179CrossRef
6.
Zurück zum Zitat Itoo F, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13(4):1503–1511 Itoo F, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13(4):1503–1511
7.
Zurück zum Zitat Ye XF, Lu YH (2018) Credit assessment model based on Random Forest and Naive Bayes. J Mathematics in Practice and theory 47:68–73 Ye XF, Lu YH (2018) Credit assessment model based on Random Forest and Naive Bayes. J Mathematics in Practice and theory 47:68–73
8.
Zurück zum Zitat Yu L, Yao X, Wang SY et al (2011) Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Syst Appl 38(12):15392–15399CrossRef Yu L, Yao X, Wang SY et al (2011) Credit risk evaluation using a weighted least squares SVM classifier with design of experiment for parameter selection. Expert Syst Appl 38(12):15392–15399CrossRef
9.
Zurück zum Zitat Liu Y, Yang K (2021) Credit fraud detection for extremely imbalanced data based on ensembled deep Learning. Journal of Computer Research and Development 58(3):539 Liu Y, Yang K (2021) Credit fraud detection for extremely imbalanced data based on ensembled deep Learning. Journal of Computer Research and Development 58(3):539
10.
Zurück zum Zitat Horak J, Vrbka J, Suler P (2020) Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison. Journal of Risk and Financial Management 13(3):60CrossRef Horak J, Vrbka J, Suler P (2020) Support vector machine methods and artificial neural networks used for the development of bankruptcy prediction models and their comparison. Journal of Risk and Financial Management 13(3):60CrossRef
11.
Zurück zum Zitat Le HH, Viviani JL (2018) Predicting bank failure: an improvement by implementing a machine-learning approach to classical financial ratios. Res Int Bus Financ 44:16–25CrossRef Le HH, Viviani JL (2018) Predicting bank failure: an improvement by implementing a machine-learning approach to classical financial ratios. Res Int Bus Financ 44:16–25CrossRef
12.
Zurück zum Zitat Ren JD, Liu XQ et al (2019) An multi-level intrusion detection method based on KNN outlier detection and random forests. Journal of Computer Research and Development 56(3):566 Ren JD, Liu XQ et al (2019) An multi-level intrusion detection method based on KNN outlier detection and random forests. Journal of Computer Research and Development 56(3):566
13.
Zurück zum Zitat Breunig MM, Kriegel HP, Ng RT et al (2000) LOF: Identifying density-based local outliers. ACM SIGMOD Rec 29(2):93–104CrossRef Breunig MM, Kriegel HP, Ng RT et al (2000) LOF: Identifying density-based local outliers. ACM SIGMOD Rec 29(2):93–104CrossRef
14.
Zurück zum Zitat Yang J, Rahardja S, Fränti P (2021) Mean-shift outlier detection and filtering. Pattern Recogn 115:107874CrossRef Yang J, Rahardja S, Fränti P (2021) Mean-shift outlier detection and filtering. Pattern Recogn 115:107874CrossRef
15.
Zurück zum Zitat Campos GO, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927MathSciNetCrossRef Campos GO, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927MathSciNetCrossRef
16.
Zurück zum Zitat Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134CrossRef Erfani SM, Rajasegarar S, Karunasekera S et al (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134CrossRef
17.
Zurück zum Zitat Liu F, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):1–39CrossRef Liu F, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1):1–39CrossRef
18.
Zurück zum Zitat Bandaragoda TR, Ting KM, Albrecht D et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998MathSciNetCrossRef Bandaragoda TR, Ting KM, Albrecht D et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998MathSciNetCrossRef
19.
Zurück zum Zitat Fernández A, Garcia S, Herrera F et al (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61:863–905MathSciNetCrossRef Fernández A, Garcia S, Herrera F et al (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research 61:863–905MathSciNetCrossRef
20.
Zurück zum Zitat X Liu, J Wu, Z Zhou (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550 X Liu, J Wu, Z Zhou (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550
21.
Zurück zum Zitat Frumosu FD, Khan AR, Schiøler H et al (2020) Cost-sensitive learning classification strategy for predicting product failures. Expert Syst Appl 161:113653CrossRef Frumosu FD, Khan AR, Schiøler H et al (2020) Cost-sensitive learning classification strategy for predicting product failures. Expert Syst Appl 161:113653CrossRef
Metadaten
Titel
Anomaly credit data detection based on enhanced Isolation Forest
verfasst von
Xiaodong Zhang
Yuan Yao
Congdong Lv
Tao Wang
Publikationsdatum
26.04.2022
Verlag
Springer London
Erschienen in
The International Journal of Advanced Manufacturing Technology / Ausgabe 1/2022
Print ISSN: 0268-3768
Elektronische ISSN: 1433-3015
DOI
https://doi.org/10.1007/s00170-022-09251-8

Weitere Artikel der Ausgabe 1/2022

The International Journal of Advanced Manufacturing Technology 1/2022 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.