Skip to main content
Top
Published in:

18-12-2024 | Original Research Paper

An engine to simulate insurance fraud network data

Authors: Bavo D. C. Campo, Katrien Antonio

Published in: European Actuarial Journal | Issue 1/2025

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article introduces a simulation engine designed to generate synthetic insurance fraud network data, addressing the scarcity of publicly available data. It emphasizes the importance of social network analytics in detecting fraud patterns and the benefits of using synthetic data for research and model evaluation. The simulation engine mimics the structure and properties of real-world non-life motor insurance data, allowing users to control various data-generating mechanisms. The article showcases the engine's capabilities by generating two types of data sets, one with and one without a social network effect, and demonstrates the development and evaluation of a fraud detection model using synthetic data. This approach enables researchers to benchmark and improve fraud detection methods, addressing key challenges in the field.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The midmean or interquartile mean is the average of the observations between the 25th and 75th percentiles.
 
2
For the gamma distribution, we use the parameterization with the density function \(f(x) = \tau ^\alpha x^{\alpha - 1} \exp (-\tau x) / \Gamma (\alpha )\) where \(\tau _i = \alpha / \exp ( _{cs} \varvec{x}_{ij}^\top \varvec{\beta }_{cs} + N_{ij} \zeta )\) and \(\Gamma (\cdot )\) denotes the gamma function.
 
Literature
1.
go back to reference Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, HobokenMATH Agresti A (2013) Categorical data analysis, 3rd edn. Wiley, HobokenMATH
2.
go back to reference Albashrawi M (2016) Detecting financial fraud using data mining techniques: a decade review from 2004 to 2015. J Data Sci 14(3):553–570 Albashrawi M (2016) Detecting financial fraud using data mining techniques: a decade review from 2004 to 2015. J Data Sci 14(3):553–570
3.
go back to reference Andresen MA, Felson M (2009) The impact of co-offending. Br J Criminol 50(1):66–81MATH Andresen MA, Felson M (2009) The impact of co-offending. Br J Criminol 50(1):66–81MATH
4.
go back to reference Avanzi B, Taylor G, Wang M, Wong B (2021) Synthetic: An individual insurance claim simulator with feature control. Insur Math Econ 100(2):296–308 Avanzi B, Taylor G, Wang M, Wong B (2021) Synthetic: An individual insurance claim simulator with feature control. Insur Math Econ 100(2):296–308
5.
go back to reference Baesens B (2023) Fraud analytics: a research agenda. J Chin Econ Bus Stud 21(1):137–141MATH Baesens B (2023) Fraud analytics: a research agenda. J Chin Econ Bus Stud 21(1):137–141MATH
6.
go back to reference Baesens B, Van Vlasselaer V, Verbeke W (2015) Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection, 1st edn. Wiley, HobokenMATH Baesens B, Van Vlasselaer V, Verbeke W (2015) Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection, 1st edn. Wiley, HobokenMATH
7.
go back to reference Barman S, Pal U, Sarfaraj MA, Biswas B, Mahata A, Mandal P (2016) A complete literature review on financial fraud detection applying data mining techniques. Int J Trust Manag Comput Commun 3(4):336–359 Barman S, Pal U, Sarfaraj MA, Biswas B, Mahata A, Mandal P (2016) A complete literature review on financial fraud detection applying data mining techniques. Int J Trust Manag Comput Commun 3(4):336–359
8.
go back to reference Denuit M, Dhaene J, Goovaerts M, Kaas R (2005) Actuarial theory for dependent risks: measures, orders and models. Wiley, West SussexMATH Denuit M, Dhaene J, Goovaerts M, Kaas R (2005) Actuarial theory for dependent risks: measures, orders and models. Wiley, West SussexMATH
9.
go back to reference European Insurance and Occupational Pensions Authority (2019) Big data analytics in motor and health insurance: a thematic review. Publications Office of the European Union, Luxembourg European Insurance and Occupational Pensions Authority (2019) Big data analytics in motor and health insurance: a thematic review. Publications Office of the European Union, Luxembourg
11.
go back to reference Frees EW, Gao J, Rosenberg MA (2011) Predicting the frequency and amount of health care expenditures. N Am Actuarial J 15(3):377–392MathSciNetMATH Frees EW, Gao J, Rosenberg MA (2011) Predicting the frequency and amount of health care expenditures. N Am Actuarial J 15(3):377–392MathSciNetMATH
12.
go back to reference Frees E, Derrig R, Meyers G (2014) Predictive modeling applications in actuarial science, vol 1. Predictive modeling techniques. Cambridge University Press, New York Frees E, Derrig R, Meyers G (2014) Predictive modeling applications in actuarial science, vol 1. Predictive modeling techniques. Cambridge University Press, New York
13.
go back to reference Gabrielli A, Wüthrich M (2018) An individual claims history simulation machine. Risks 6(2) Gabrielli A, Wüthrich M (2018) An individual claims history simulation machine. Risks 6(2)
14.
go back to reference Garrido J, Genest C, Schulz J (2016) Generalized linear models for dependent frequency and severity of insurance claims. Insur Math Econ 70:205–215MathSciNetMATH Garrido J, Genest C, Schulz J (2016) Generalized linear models for dependent frequency and severity of insurance claims. Insur Math Econ 70:205–215MathSciNetMATH
15.
go back to reference Ghobadi F, Rohani M (2016) Cost sensitive modeling of credit card fraud using neural network strategy. In 2016 2nd international conference of signal processing and intelligent systems (ICSPIS). IEEE, New York, pp 1–5 Ghobadi F, Rohani M (2016) Cost sensitive modeling of credit card fraud using neural network strategy. In 2016 2nd international conference of signal processing and intelligent systems (ICSPIS). IEEE, New York, pp 1–5
16.
go back to reference Goldburd M, Khare A, Tevet D, Guller D (2016) Generalized linear models for insurance rating. Casualty Actuarial Society, CAS Monographs Series, Arlington, p 5 Goldburd M, Khare A, Tevet D, Guller D (2016) Generalized linear models for insurance rating. Casualty Actuarial Society, CAS Monographs Series, Arlington, p 5
17.
go back to reference Gomes C, Jin Z, Yang H (2021) Insurance fraud detection with unsupervised deep learning. J Risk Insur 88(3):591–624MATH Gomes C, Jin Z, Yang H (2021) Insurance fraud detection with unsupervised deep learning. J Risk Insur 88(3):591–624MATH
18.
go back to reference Hanley J, McNeil B (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36MATH Hanley J, McNeil B (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36MATH
19.
go back to reference He X, Gao M, Kan M-Y, Wang D (2017) Birank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71MATH He X, Gao M, Kan M-Y, Wang D (2017) Birank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71MATH
20.
go back to reference Henckaerts R, Antonio K, Clijsters M, Verbelen R (2018) A data driven binning strategy for the construction of insurance tariff classes. Scand Actuar J 2018(8):681–705MathSciNetMATH Henckaerts R, Antonio K, Clijsters M, Verbelen R (2018) A data driven binning strategy for the construction of insurance tariff classes. Scand Actuar J 2018(8):681–705MathSciNetMATH
21.
go back to reference Hilal W, Gadsden SA, Yawney J (2022) Financial fraud: A review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429 Hilal W, Gadsden SA, Yawney J (2022) Financial fraud: A review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429
22.
go back to reference Jensen D (1997) Prospective assessment of AI technologies for fraud detection: a case study. In: AAAI workshop on AI approaches to fraud detection and risk management. Citeseer, pp 34–38 Jensen D (1997) Prospective assessment of AI technologies for fraud detection: a case study. In: AAAI workshop on AI approaches to fraud detection and risk management. Citeseer, pp 34–38
24.
go back to reference Kho JRD, Vea LA (2017) Credit card fraud detection based on transaction behavior. In: TENCON 2017-2017 IEEE region 10 conference. IEEE, New York, pp 1880–1884 Kho JRD, Vea LA (2017) Credit card fraud detection based on transaction behavior. In: TENCON 2017-2017 IEEE region 10 conference. IEEE, New York, pp 1880–1884
25.
go back to reference Khondoker M, Dobson R, Skirrow C, Simmons A, Stahl D (2016) A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Stat Methods Med Res 25(5):1804–1823 (PMID: 24047600)MathSciNet Khondoker M, Dobson R, Skirrow C, Simmons A, Stahl D (2016) A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Stat Methods Med Res 25(5):1804–1823 (PMID: 24047600)MathSciNet
26.
go back to reference Kumar P (2010) Probability distributions and estimation of Ali–Mikhail–Haq copula. Appl Math Sci 4(14):657–666MathSciNetMATH Kumar P (2010) Probability distributions and estimation of Ali–Mikhail–Haq copula. Appl Math Sci 4(14):657–666MathSciNetMATH
27.
go back to reference Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Market Res 43(2):276–286MATH Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Market Res 43(2):276–286MATH
28.
go back to reference Lopez-Rojas EA, Gorton D, Axelsson S (2015) Using the RetSim simulator for fraud detection research. Int J Simul Process Model 10(2):144–155MATH Lopez-Rojas EA, Gorton D, Axelsson S (2015) Using the RetSim simulator for fraud detection research. Int J Simul Process Model 10(2):144–155MATH
29.
go back to reference Marques AI, Garcia V, Sanchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070MATH Marques AI, Garcia V, Sanchez JS (2013) On the suitability of resampling techniques for the class imbalance problem in credit scoring. J Oper Res Soc 64(7):1060–1070MATH
30.
go back to reference McCullagh P, Nelder JA (1999) Generalized linear models. Chapman and Hall, LondonMATH McCullagh P, Nelder JA (1999) Generalized linear models. Chapman and Hall, LondonMATH
31.
go back to reference Morris TP, White IR, Crowther MJ (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102MathSciNetMATH Morris TP, White IR, Crowther MJ (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102MathSciNetMATH
32.
go back to reference Newman M (2010) Networks: an introduction. Oxford University Press, OxfordMATH Newman M (2010) Networks: an introduction. Oxford University Press, OxfordMATH
33.
go back to reference Ngai E, Hu Y, Wong Y, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Supp Syst 50(3):559–569MATH Ngai E, Hu Y, Wong Y, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Supp Syst 50(3):559–569MATH
34.
go back to reference Nur Prasasti IM, Dhini A, Laoh E (2020) Automobile insurance fraud detection using supervised classifiers. In: 2020 international workshop on big data and information security (IWBIS). IEEE, New York, pp 47–52 Nur Prasasti IM, Dhini A, Laoh E (2020) Automobile insurance fraud detection using supervised classifiers. In: 2020 international workshop on big data and information security (IWBIS). IEEE, New York, pp 47–52
35.
go back to reference Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Heidelberg Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Heidelberg
36.
go back to reference Oommen T, Baise LG, Vogel RM (2011) Sampling bias and class imbalance in maximum-likelihood logistic regression. Math Geosci 43(1):99–120MATH Oommen T, Baise LG, Vogel RM (2011) Sampling bias and class imbalance in maximum-likelihood logistic regression. Math Geosci 43(1):99–120MATH
37.
go back to reference Óskarsdóttir M, Ahmed W, Antonio K, Baesens B, Dendievel R, Donas T, Reynkens T (2022) Social network analytics for supervised fraud detection in insurance. Risk Anal 42(8):1872–1890 Óskarsdóttir M, Ahmed W, Antonio K, Baesens B, Dendievel R, Donas T, Reynkens T (2022) Social network analytics for supervised fraud detection in insurance. Risk Anal 42(8):1872–1890
38.
go back to reference Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab, StanfordMATH Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report, Stanford InfoLab, StanfordMATH
39.
go back to reference Park J, Barabási A-L (2007) Distribution of node characteristics in complex networks. Proc Natl Acad Sci PNAS 104(46):17916–17920MATH Park J, Barabási A-L (2007) Distribution of node characteristics in complex networks. Proc Natl Acad Sci PNAS 104(46):17916–17920MATH
40.
go back to reference Pourhabibi T, Ong K-L, Kam BH, Boo YL (2020) Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis Supp Syst 133:113303MATH Pourhabibi T, Ong K-L, Kam BH, Boo YL (2020) Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis Supp Syst 133:113303MATH
41.
go back to reference Quijano Xacur OA, Garrido J (2015) Generalised linear models for aggregate claims: to Tweedie or not? Eur Actuarial J 5(1):181–202MathSciNetMATH Quijano Xacur OA, Garrido J (2015) Generalised linear models for aggregate claims: to Tweedie or not? Eur Actuarial J 5(1):181–202MathSciNetMATH
42.
go back to reference Reiss AJ (1988) Co-offending and criminal careers. Crime Justice (Chicago, Ill.) 10:117–170 Reiss AJ (1988) Co-offending and criminal careers. Crime Justice (Chicago, Ill.) 10:117–170
43.
go back to reference Roy R, George KT (2017) Detecting insurance claims fraud using machine learning techniques. In: 2017 international conference on circuit, power and computing technologies (ICCPCT). IEEE, New York, pp 1–6 Roy R, George KT (2017) Detecting insurance claims fraud using machine learning techniques. In: 2017 international conference on circuit, power and computing technologies (ICCPCT). IEEE, New York, pp 1–6
44.
go back to reference So B, Boucher J-P, Valdez EA (2021) Synthetic dataset generation of driver telematics. Risks 9(4) So B, Boucher J-P, Valdez EA (2021) Synthetic dataset generation of driver telematics. Risks 9(4)
45.
go back to reference Srivastava A, Yadav M, Basu S, Salunkhe S, Shabad M (2016) Credit card fraud detection at merchant side using neural networks. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). Bharati Vidyapeeth, New Delhi as the organizer of INDIACom—2016, pp 667–670 Srivastava A, Yadav M, Basu S, Salunkhe S, Shabad M (2016) Credit card fraud detection at merchant side using neural networks. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). Bharati Vidyapeeth, New Delhi as the organizer of INDIACom—2016, pp 667–670
46.
go back to reference Storchmann K (2004) On the depreciation of automobiles: an international comparison. Transportation (Dordrecht) 31(4):371–408MATH Storchmann K (2004) On the depreciation of automobiles: an international comparison. Transportation (Dordrecht) 31(4):371–408MATH
47.
go back to reference Subudhi S, Panigrahi S (2020) Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ Comput Inf Sci 32(5):568–575 Subudhi S, Panigrahi S (2020) Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ Comput Inf Sci 32(5):568–575
48.
go back to reference Sundarkumar GG, Ravi V (2015) A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence 37:368–377MATH Sundarkumar GG, Ravi V (2015) A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence 37:368–377MATH
49.
go back to reference Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441MathSciNetMATH Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441MathSciNetMATH
50.
go back to reference Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading
51.
go back to reference Tumminello M, Consiglio A, Vassallo P, Cesari R, Farabullini F (2023) Insurance fraud detection: a statistically validated network approach. J Risk Insur 90(2):381–419MATH Tumminello M, Consiglio A, Vassallo P, Cesari R, Farabullini F (2023) Insurance fraud detection: a statistically validated network approach. J Risk Insur 90(2):381–419MATH
52.
go back to reference van den Goorbergh R, van Smeden M, Timmerman D. Van, Calster B (2022) The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J. Am Med Inform Assoc 29(9):1525–1534MATH van den Goorbergh R, van Smeden M, Timmerman D. Van, Calster B (2022) The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J. Am Med Inform Assoc 29(9):1525–1534MATH
53.
go back to reference van Koppen MV, de Poot CJ, Kleemans ER, Nieuwbeerta P (2010) Criminal trajectories in organized crime. Br J Criminol 50(1):102–123 van Koppen MV, de Poot CJ, Kleemans ER, Nieuwbeerta P (2010) Criminal trajectories in organized crime. Br J Criminol 50(1):102–123
54.
go back to reference Van Vlasselaer V, Eliassi-Rad T, Akoglu L, Snoeck M, Baesens B (2016) GOTCHA! network-based fraud detection for social security fraud. Manag Sci 63(9):3090–3110 Van Vlasselaer V, Eliassi-Rad T, Akoglu L, Snoeck M, Baesens B (2016) GOTCHA! network-based fraud detection for social security fraud. Manag Sci 63(9):3090–3110
55.
go back to reference Vosseler A (2022) Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks (Basel) 10(7):132 Vosseler A (2022) Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks (Basel) 10(7):132
56.
go back to reference Warren DE, Schweitzer ME (2018) When lying does not pay: how experts detect insurance fraud. J Bus Ethics 150(3):711–726 Warren DE, Schweitzer ME (2018) When lying does not pay: how experts detect insurance fraud. J Bus Ethics 150(3):711–726
57.
go back to reference West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57:47–66MATH West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57:47–66MATH
Metadata
Title
An engine to simulate insurance fraud network data
Authors
Bavo D. C. Campo
Katrien Antonio
Publication date
18-12-2024
Publisher
Springer Berlin Heidelberg
Published in
European Actuarial Journal / Issue 1/2025
Print ISSN: 2190-9733
Electronic ISSN: 2190-9741
DOI
https://doi.org/10.1007/s13385-024-00399-z