Skip to main content

2018 | OriginalPaper | Buchkapitel

Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study

verfasst von : Rafiq Ahmed Mohammed, Kok-Wai Wong, Mohd Fairuz Shiratuddin, Xuequn Wang

Erschienen in: PRICAI 2018: Trends in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the real world of credit card fraud detection, due to a minority of fraud related transactions, has created a class imbalance problem. With the increase of transactions at massive scale, the imbalanced data is immense and has created a challenging issue on how well Machine Learning (ML) techniques can scale up to efficiently learn to detect fraud from the massive incoming data and to respond faster with high prediction accuracy and reduced misclassification costs. This paper is based on experiments that compared several popular ML techniques and investigated their suitability as a “scalable algorithm” when working with highly imbalanced massive or “Big” datasets. The experiments were conducted on two highly imbalanced datasets using Random Forest, Balanced Bagging Ensemble, and Gaussian Naïve Bayes. We observed that many detection algorithms performed well with medium-sized dataset but struggled to maintain similar predictions when it is massive.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef
2.
Zurück zum Zitat Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)MATH
3.
Zurück zum Zitat Juszczak, P., et al.: Off-the-peg and bespoke classifiers for fraud detection. Comput. Stat. Data Anal. 52(9), 4521–4532 (2008)MathSciNetCrossRef Juszczak, P., et al.: Off-the-peg and bespoke classifiers for fraud detection. Comput. Stat. Data Anal. 52(9), 4521–4532 (2008)MathSciNetCrossRef
4.
5.
Zurück zum Zitat Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7(3), 176–204 (2015) Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7(3), 176–204 (2015)
7.
Zurück zum Zitat Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Procedia Comput. Sci. 48, 679–685 (2015)CrossRef Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: based on bagging ensemble classifier. Procedia Comput. Sci. 48, 679–685 (2015)CrossRef
8.
Zurück zum Zitat Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)CrossRef Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)CrossRef
10.
Zurück zum Zitat West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)CrossRef West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)CrossRef
11.
Zurück zum Zitat Dal Pozzolo, A., et al.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)CrossRef Dal Pozzolo, A., et al.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)CrossRef
13.
Zurück zum Zitat West, J., Bhattacharya, M.: Some experimental issues in financial fraud mining. Procedia Comput. Sci. 80, 1734–1744 (2016)CrossRef West, J., Bhattacharya, M.: Some experimental issues in financial fraud mining. Procedia Comput. Sci. 80, 1734–1744 (2016)CrossRef
14.
Zurück zum Zitat Awoyemi, J.O., Adetunmbi, A.O., Oluwadare, S.A.: Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE (2017) Awoyemi, J.O., Adetunmbi, A.O., Oluwadare, S.A.: Credit card fraud detection using machine learning techniques: a comparative analysis. In: 2017 International Conference on Computing Networking and Informatics (ICCNI). IEEE (2017)
15.
Zurück zum Zitat Liu, B., et al.: Scalable sentiment classification for big data analysis using Naive Bayes Classifier. In: 2013 IEEE International Conference on Big Data. IEEE (2013) Liu, B., et al.: Scalable sentiment classification for big data analysis using Naive Bayes Classifier. In: 2013 IEEE International Conference on Big Data. IEEE (2013)
17.
Zurück zum Zitat Dai, Y., et al.: Online credit card fraud detection: a hybrid framework with big data technologies. In: Trustcom/BigDataSE/I SPA, 2016 IEEE. IEEE (2016) Dai, Y., et al.: Online credit card fraud detection: a hybrid framework with big data technologies. In: Trustcom/BigDataSE/I​ SPA, 2016 IEEE. IEEE (2016)
18.
Zurück zum Zitat Ryman-Tubb, N.: Understanding payment card fraud through knowledge extraction from neural networks using large-scale datasets. University of Surrey (2016) Ryman-Tubb, N.: Understanding payment card fraud through knowledge extraction from neural networks using large-scale datasets. University of Surrey (2016)
19.
Zurück zum Zitat Japkowicz, N.: Class imbalances: are we focusing on the right issue. In: Workshop on Learning from Imbalanced Data Sets II (2003) Japkowicz, N.: Class imbalances: are we focusing on the right issue. In: Workshop on Learning from Imbalanced Data Sets II (2003)
20.
Zurück zum Zitat Yap, B.W., Rani, K.A., Rahman, H.A.A., Fong, S., Khairudin, Z., Abdullah, N.N.: An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 13–22. Springer, Singapore (2014). https://doi.org/10.1007/978-981-4585-18-7_2CrossRef Yap, B.W., Rani, K.A., Rahman, H.A.A., Fong, S., Khairudin, Z., Abdullah, N.N.: An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 13–22. Springer, Singapore (2014). https://​doi.​org/​10.​1007/​978-981-4585-18-7_​2CrossRef
21.
Zurück zum Zitat Ma, L., Fan, S.: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinf. 18(1), 169 (2017)CrossRef Ma, L., Fan, S.: CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinf. 18(1), 169 (2017)CrossRef
22.
Zurück zum Zitat Han, J., Liu, Y., Sun, X.: A scalable random forest algorithm based on mapreduce. In: 2013 4th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013) Han, J., Liu, Y., Sun, X.: A scalable random forest algorithm based on mapreduce. In: 2013 4th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013)
26.
Zurück zum Zitat Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
27.
Zurück zum Zitat Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRef Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRef
28.
Zurück zum Zitat Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets (2000) Provost, F.: Machine learning from imbalanced data sets 101. In: Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets (2000)
29.
Zurück zum Zitat Fisher, W.D.: Machine Learning for the Automatic Detection of Anomalous Events. ProQuest Dissertations Publishing (2017) Fisher, W.D.: Machine Learning for the Automatic Detection of Anomalous Events. ProQuest Dissertations Publishing (2017)
30.
Zurück zum Zitat Géron, A.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media Inc., Sebastopol (2017) Géron, A.: Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media Inc., Sebastopol (2017)
31.
Zurück zum Zitat Carcillo, F., et al.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE (2017) Carcillo, F., et al.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE (2017)
32.
Zurück zum Zitat Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetMATH Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetMATH
34.
Zurück zum Zitat Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)CrossRef Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), e0118432 (2015)CrossRef
Metadaten
Titel
Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study
verfasst von
Rafiq Ahmed Mohammed
Kok-Wai Wong
Mohd Fairuz Shiratuddin
Xuequn Wang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-97310-4_27