Skip to main content

2021 | OriginalPaper | Buchkapitel

Sammon Mapping-Based Gradient Boosted Trees for Tax Crime Prediction in the City of São Paulo

verfasst von : André Ippolito, Augusto Cezar Garcia Lozano

Erschienen in: Enterprise Information Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the currently vast volume of data available, several institutions, including the public sector, benefit from information, aiming to improve decision-making. Machine Learning enhances data-driven decision-making with its predictive power. In this work, our principal motivation was to apply Machine Learning to ameliorate fiscal audit planning for São Paulo’s municipality. In this study, we predicted crimes against the service tax system of São Paulo using Machine Learning. Our methodology embraced the following steps: data extraction; data preparation; dimensionality reduction; model training and testing; model evaluation; model selection. Our experimental findings revealed that Sammon Mapping (SM) combined with Gradient Boosted Trees (GBT) outranked other state-of-the-art works, classifiers and dimensionality reduction techniques as regards classification performance. Our belief is that the ensemble of classifiers of GBT, combined with SM’s ability to identify relevant dimensions in data, contributed to produce higher prediction scores. These scores enable São Paulo’s tax administration to rank fiscal audits according to the highest probabilities of tax crime occurrence, leveraging tax revenue.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Matheus, R., Janssen, M., Maheshwari, D.: Data science empowering the public: data-driven dashboards for transparent and accountable decision-making in smart cities. Gov. Inf. Q. (2018) Matheus, R., Janssen, M., Maheshwari, D.: Data science empowering the public: data-driven dashboards for transparent and accountable decision-making in smart cities. Gov. Inf. Q. (2018)
8.
Zurück zum Zitat São Paulo City Hall. Relatório Técnico do Balanço Geral de 2018. Accounting Department (2019) São Paulo City Hall. Relatório Técnico do Balanço Geral de 2018. Accounting Department (2019)
9.
Zurück zum Zitat González, P.C., Velásquez, J.D.: Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Syst. Appl. 40, 1427–1436 (2013)CrossRef González, P.C., Velásquez, J.D.: Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Syst. Appl. 40, 1427–1436 (2013)CrossRef
10.
Zurück zum Zitat López, C.P., Rodríguez, M.J.D., Santos, S.L.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11, 86 (2019)CrossRef López, C.P., Rodríguez, M.J.D., Santos, S.L.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11, 86 (2019)CrossRef
11.
Zurück zum Zitat Kim, S., et al.: DATE: dual attentive tree-aware embedding for customs fraud detection. In: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, USA (2020) Kim, S., et al.: DATE: dual attentive tree-aware embedding for customs fraud detection. In: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, USA (2020)
12.
Zurück zum Zitat Weiyu, C., Yanyan, S., Linpeng, H.: Adaptive factorization network: learning adaptive-order feature interactions. In: Proceedings of the 34th Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, New York, USA (2020). https://arxiv.org/pdf/1909.03276.pdf Weiyu, C., Yanyan, S., Linpeng, H.: Adaptive factorization network: learning adaptive-order feature interactions. In: Proceedings of the 34th Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, New York, USA (2020). https://​arxiv.​org/​pdf/​1909.​03276.​pdf
13.
Zurück zum Zitat Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016). https://doi.org/10.1145/2939672.2939704 Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016). https://​doi.​org/​10.​1145/​2939672.​2939704
14.
Zurück zum Zitat Ippolito, A., Lozano, A.C.G.: Tax crime prediction with machine learning: a case study in the municipality of São Paulo. In: Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, pp. 452–459. SciTePress (2020). https://doi.org/10.5220/0009564704520459 Ippolito, A., Lozano, A.C.G.: Tax crime prediction with machine learning: a case study in the municipality of São Paulo. In: Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS, pp. 452–459. SciTePress (2020). https://​doi.​org/​10.​5220/​0009564704520459​
15.
Zurück zum Zitat Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATH Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATH
16.
Zurück zum Zitat Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5, 1 (2015) Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5, 1 (2015)
17.
Zurück zum Zitat Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–78. ACM, New York, USA (2004) Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–78. ACM, New York, USA (2004)
18.
20.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, 2nd edn. MIT Press, Cambridge (2016)MATH Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, 2nd edn. MIT Press, Cambridge (2016)MATH
21.
Zurück zum Zitat Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson, London (2010)MATH Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson, London (2010)MATH
22.
Zurück zum Zitat Migon, S.H., Gamerman, D., Louzada, F.: Statistical Inference: An Integrated Approach. CRC Press, Boca Raton (2015)MATH Migon, S.H., Gamerman, D., Louzada, F.: Statistical Inference: An Integrated Approach. CRC Press, Boca Raton (2015)MATH
23.
Zurück zum Zitat Ben-Hur, A., Ong, C.S., Sonnenburg, S., Scholkopf, B., Ratsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)CrossRef Ben-Hur, A., Ong, C.S., Sonnenburg, S., Scholkopf, B., Ratsch, G.: Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4(10), e1000173 (2008)CrossRef
24.
Zurück zum Zitat Poole, D., Mackworth, A.: Artificial Intelligence: Foundations of Computational Agents, 2nd edn. Cambridge University Press, Cambridge (2017)CrossRef Poole, D., Mackworth, A.: Artificial Intelligence: Foundations of Computational Agents, 2nd edn. Cambridge University Press, Cambridge (2017)CrossRef
25.
26.
Zurück zum Zitat Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)CrossRef Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)CrossRef
27.
Zurück zum Zitat Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 2(55), 119–139 (1997)MathSciNetCrossRef Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 2(55), 119–139 (1997)MathSciNetCrossRef
28.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.H.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 2(28), 337–407 (2000)MathSciNetMATH Hastie, T., Tibshirani, R., Friedman, J.H.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 2(28), 337–407 (2000)MathSciNetMATH
32.
Zurück zum Zitat Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 5th edn. Prentice Hall, Hoboken (2002)MATH Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 5th edn. Prentice Hall, Hoboken (2002)MATH
33.
Zurück zum Zitat Sammon, J.W., Jr.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18(5), 401–409 (1969)CrossRef Sammon, J.W., Jr.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18(5), 401–409 (1969)CrossRef
34.
Zurück zum Zitat Alm, J.: What motivates tax compliance. Tulane Economics Working Paper Series, Working Paper 1903. Tulane University (2019) Alm, J.: What motivates tax compliance. Tulane Economics Working Paper Series, Working Paper 1903. Tulane University (2019)
36.
Zurück zum Zitat Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39 (2000) Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39 (2000)
37.
Zurück zum Zitat Berthold, M.R., et al.: KNIME - the Konstanz information miner: version 2.0 and beyond. SIGKDD Explor. Newsl. 11(1), 26–31 (2009)MathSciNetCrossRef Berthold, M.R., et al.: KNIME - the Konstanz information miner: version 2.0 and beyond. SIGKDD Explor. Newsl. 11(1), 26–31 (2009)MathSciNetCrossRef
38.
Zurück zum Zitat Tukey, J.W.: Explanatory Data Analysis. Addison-Wesley, Boston (1977) Tukey, J.W.: Explanatory Data Analysis. Addison-Wesley, Boston (1977)
40.
Zurück zum Zitat Lerner, B., Guterman, H., Aladjem, M., Dinstein, I.: On the initialisation of Sammon’s nonlinear mapping. IEEE Trans. Comput. Pattern Anal. Appl. 3(1), 61–68 (2000)CrossRef Lerner, B., Guterman, H., Aladjem, M., Dinstein, I.: On the initialisation of Sammon’s nonlinear mapping. IEEE Trans. Comput. Pattern Anal. Appl. 3(1), 61–68 (2000)CrossRef
41.
Zurück zum Zitat Lerner, B., Guterman, H., Aladjem, M., Dinstein, I., Romem, Y.: On pattern classification with Sammon’s nonlinear mapping - an experimental study. Pattern Recogn. 31, 371–381 (1998)CrossRef Lerner, B., Guterman, H., Aladjem, M., Dinstein, I., Romem, Y.: On pattern classification with Sammon’s nonlinear mapping - an experimental study. Pattern Recogn. 31, 371–381 (1998)CrossRef
42.
Zurück zum Zitat Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296–317 (1995)CrossRef Mao, J., Jain, A.K.: Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296–317 (1995)CrossRef
Metadaten
Titel
Sammon Mapping-Based Gradient Boosted Trees for Tax Crime Prediction in the City of São Paulo
verfasst von
André Ippolito
Augusto Cezar Garcia Lozano
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-75418-1_14