Top

Review of Accounting Studies

Published in:

02-10-2020

Using machine learning to detect misstatements

Authors: Jeremy Bertomeu, Edwige Cheynel, Eric Floyd, Wenqiang Pan

Published in: Review of Accounting Studies | Issue 2/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Machine learning offers empirical methods to sift through accounting datasets with a large number of variables and limited a priori knowledge about functional forms. In this study, we show that these methods help detect and interpret patterns present in ongoing accounting misstatements. We use a wide set of variables from accounting, capital markets, governance, and auditing datasets to detect material misstatements. A primary insight of our analysis is that accounting variables, while they do not detect misstatements well on their own, become important with suitable interactions with audit and market variables. We also analyze differences between misstatements and irregularities, compare algorithms, examine one-year- and two-year-ahead predictions and interpret groups at greater risk of misstatements.

previous article Analyst teams

next article Is all disaggregation good for investors? Evidence from earnings announcements

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Another possible solution is to fit the model using a rolling window or exclude observations from firms used to build the model. However, both of these choices severely restrict the sample effectively used for cross-validation.

In a previous version of the manuscript, we focused on all restatements, including restatements not reported in 8-K; however, many of these restatements need not include large events. We thank Andy Imdieke for this suggestion.

Audit Analytics provides the restated amount for each year only for the five most recent years impacted by the restatement. However, firms’ restatements can often impact more than five years of financial data. The impact on accounting numbers prior to the most recent five years is usually reported as a cumulative charge to retained earnings, and, in practice, firms need not retrospectively adjust all prior years. To account for this, we assume that the cumulative effect to retained earnings is distributed evenly across the misstatement span identified in the restatement filing. If the span is missing, we allocate the unexplained cumulative change to the year prior to the last year with an income effect.

RUSBoost refers to random undersampling, such that balanced samples are constructed by randomly drawing from the sample. With a heavily imbalanced dataset, however, nonrandom undersampling may perform better than random sampling. In untabulated results, we used the sampling method of Perols et al. (2016), but it did not perform better than RUSboost in our dataset. Under this alternate sampling method, the AUC is 69.4%, and the detection rate of restatements is 60.0% and of AAERs is 81.1%. This method ranks better than logistic models but slightly worse than GBRT and random under-sampling in Tables 10 and 15. An important difference is that there are more material misstatements than AAERs, so the benefits of nonrandom sampling to alleviate imbalance are more muted.

We report the summary statistics of the important predictors in Table 6.

As in any multivariate descriptive analysis with multiple correlated variables, interpretation requires some caution since the method may select one variable over another for reasons that relate primarily to the fitting procedure. Later on, we list the set of important variables in other methods and observe that, while many variables are common to multiple algorithms, there are also some differences.

For variables combining market and accounting information, such as book-to-market and earnings-to-price, we allocate half weight to each category.

The theoretical model of Bertomeu and Marinovic (2015) also predicts this relation, as firms that endogenously retain more soft assets tend to be more credible.

We only document the results with the backward logistic model because the forward logistic and simple logistic models exhibit the same results. Backward and forward logistic models are much more sparse; that is, they use fewer variables than GBRT and simple logistic models. However, they do not appear to perform better than a simple logistic model. This finding suggests complex interactions in the entire population of potential predictors capture misstatements.

We report in Table10 bootstrapped standard errors, retraining and testing the model 200 times on randomly drawn datasets. Differences between the performance of most models tend to be greater than two standard errors, indicating these differences are significant. In untabulated analyses, we bootstrapped differences between model performance and confirm that differences between models are significantly different from zero at conventional levels.

In untabulated results, we also estimate the model by separating the restatement sample into positive and negative period income effects, under the conjecture that positive effects may reflect reversals or incentives to influence the stock price downward. See Kasnik (1999) for an extensive continuing literature. We divide restatements into three categories: negative income effects (overstatement), zero income effects, and positive income effects (understatement). We then build three models and predict the probability of overstatement, understatement, and a zero income effect separately. We do not find any notable improvement to predictive power in the test sample, likely because these alternative methods reduce the size of the dataset used to estimate the model.

In Panel B of Table 12, we obtain similar results after excluding firms with restatements in the training sample. Machine learning algorithms continue to perform better, compared to the logistic model, but feature lower catch rates.

In untabulated analyses, we compute the number of caught misstatements at least a year before the AAERs. Out of 29 misstatements caught by GBRT in the test sample, they relate to 20 AAERs fillings, and all of them are detected at least a year (often more than a year) before the AAER is filed.

We still estimate the models as in panel A using the entire population of misstatements and AAERs. One alternative would have been to estimate a model using only AAER-misstatement pairs as irregularities. However, the number of observations here becomes too small to build a model with reasonable out-of-sample performance.

In untabulated results, we find very low predictive ability when we predict the first misstatement year.

InTrees can imply redundant conditions if an inequality is repeated twice or is a subset of another inequality. In these cases, we only report the stricter condition.

This result coincides with the Stata package Boost, with code boost Res EP Soft , distribution(logistic) train(1) bag(1) interaction(2) maxiter(1) shrink(1) predict(pred).

Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). Metafraud: a meta-learning framework for detecting financial fraud. Mis Quarterly, 36(4), 1293–1327.CrossRef

Avramov, D., Chordia, T., Jostova, G., & Philipov, A. (2009). Credit ratings and the cross-section of stock returns. Journal of Financial Markets, 12 (3), 469–499.CrossRef

Bao, Y., Ke, B., Li, B., Julia Yu, Y., & Zhang, J. (2020). Detecting accounting fraud in publicly traded us firms using a machine learning approach. Journal of Accounting Research, 58(1), 199–235.CrossRef

Barton, J., & Simko, P.J. (2002). The balance sheet as an earnings management constraint. The Accounting Review, 77(s-1), 1–27.CrossRef

Beneish, M.D. (1999). The detection of earnings manipulation. Financial Analysts Journal, 55(5), 24–36.CrossRef

Bertomeu, J., & Marinovic, I. (2015). A Theory of hard and soft information. The Accounting Review, 91(1), 1–20.CrossRef

Blackburne, T., Kepler, J., Quinn, P., & Taylor, D. (2020). Undisclosed sec investigations. Forthcoming Management Science.

Cheffers, M., Whalen, D., & Usvyatsky, O. (2010). 2009 financial restatements: A nine year comparison. Audit Analytics Sales (February).

Cheynel, E., & Levine, C. (2020). Public disclosures and information asymmetry: A theory of the mosaic. The Accounting Review, 95(1), 79–99.CrossRef

Dechow, P.M., & Dichev, I.D. (2002). The quality of accruals and earnings: The role of accrual estimation errors. The Accounting Review, 77(s-1), 35–59.CrossRef

Dechow, P.M., Ge, W., Larson, C.R., & Sloan, R.G. (2011). Predicting material accounting misstatements. Contemporary Accounting Research, 28(1), 17–82.CrossRef

DeFond, M.L., Raghunandan, K., & Subramanyam, K.R. (2002). Do non–audit service fees impair auditor independence? evidence from going concern audit opinions. Journal of Accounting Research, 40(4), 1247–1274.CrossRef

Deng, H. (2018). Interpreting tree ensembles with inttrees. International Journal of Data Science and Analytics, pp 1–11.

Ding, K., Lev, B., Peng, X., Sun, T., & Vasarhelyi, M.A. (2020). Machine learning improves accounting estimates. Review of Accounting Studies, pp 1–37.

Dutta, I., Dutta, S., & Raahemi, B. (2017). Detecting financial restatements using data mining techniques. Expert Systems with Applications, 90, 374–393.CrossRef

Ettredge, M.L., Sun, L., Lee, P., & Anandarajan, A.A. (2008). Is earnings fraud associated with high deferred tax and/or book minus tax levels?. Auditing: A Journal of Practice & Theory, 27(1), 1–33.CrossRef

Fanning, K.M., & Cogger, K.O. (1998). Neural network detection of management fraud using published financial data. Intelligent Systems in Accounting, Finance & Management, 7(1), 21–41.CrossRef

Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861– 874.CrossRef

Frankel, R.M., Johnson, M.F., & Nelson, K.K. (2002). The relation between auditors’ fees for nonaudit services and earnings management. The Accounting Review, 77(s-1), 71–105.CrossRef

Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, pp 1189–1232.

Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning Vol. 1. New York: Springer series in statistics.

Garfinkel, J.A. (2009). Measuring investors’ opinion divergence. Journal of Accounting Research, 47(5), 1317–1348.CrossRef

Glosten, L.R., & Milgrom, P.R. (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders. Journal of Financial Economics, 14(1), 71–100.CrossRef

Green, B.P., & Choi, J.H. (1997). Assessing the risk of management fraud through neural network technology. Auditing, A Journal of Practice and Theory, 16, 14–28.

Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 39(3), 3659–3667.CrossRef

Gupta, R., & Gill, N.S. (2012). A solution for preventing fraudulent financial reporting using descriptive data mining techniques. International Journal of Computer Applications.

Hribar, P., Kravet, T., & Wilson, R. (2014). A New measure of accounting quality. Review of Accounting Studies, 19(1), 506–538.CrossRef

Johnson, V.E., Khurana, I.K., & Kenneth Reynolds, J. (2002). Audit-firm tenure and the quality of financial reports. Contemporary Accounting Research, 19(4), 637–660.CrossRef

Kasznik, R. (1999). On the association between voluntary disclosure and earnings management. Journal of Accounting Research, 37(1), 57–81.CrossRef

Kim, Y.J., Baik, B., & Cho, S. (2016). Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Systems with Applications, 62, 32–43.CrossRef

Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2017). Human decisions and machine predictions. The Quarterly Journal of Economics, 133(1), 237–293.

Kornish, L.J., & Levine, C.B. (2004). Discipline with common agency: The case of audit and nonaudit services. The Accounting Review, 79(1), 173–200.CrossRef

Larcker, D.F., Richardson, S.A., & Tuna, Irem. (2007). Corporate governance, accounting outcomes, and organizational performance. The Accounting Review, 82(4), 963–1008.CrossRef

Laux, V., & Newman, P.D. (2010). Auditor liability and client acceptance decisions. The Accounting Review, 85(1), 261–285.CrossRef

Lin, J.W., Hwang, M.I., & Becker, J.D. (2003). A Fuzzy neural network for assessing the risk of fraudulent financial reporting. Managerial Auditing Journal, 18(8), 657–665.CrossRef

Lobo, G.J., & Zhao, Y. (2013). Relation between audit effort and financial report misstatements: Evidence from quarterly and annual restatements. The Accounting Review, 88(4), 1385–1412.CrossRef

Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19–50.CrossRef

Perols, J.L., Bowen, R.M., Zimmermann, C., & Samba, B. (2016). Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review, 92(2), 221–245.CrossRef

Ragothaman, S., & Lavin, A. (2008). Restatements due to improper revenue recognition: a neural networks perspective. Journal of Emerging Technologies in Accounting, 5(1), 129–142.CrossRef

Romanus, R.N., Maher, J.J., & Fleming, D.M. (2008). Auditor industry specialization, auditor changes, and accounting restatements. Accounting Horizons, 22(4), 389–413.CrossRef

Samuels, D., Taylor, D.J., & Verrecchia, R.E. (2018). Financial misreporting: Hiding in the shadows or in plain sight?.

Rijsbergen, V., & Joost, C. (2004). The geometry of information retrieval. Cambridge University Press.

Whiting, D.G., Hansen, J.V., McDonald, J.B., Albrecht, C., & Steve Albrecht, W. (2012). Machine learning methods for detecting patterns of management fraud. Computational Intelligence, 28(4), 505–527.CrossRef

Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308–324.CrossRef

Title: Using machine learning to detect misstatements
Authors: Jeremy Bertomeu
Edwige Cheynel
Eric Floyd
Wenqiang Pan
Publication date: 02-10-2020
Publisher: Springer US
Published in: Review of Accounting Studies / Issue 2/2021
Print ISSN: 1380-6653
Electronic ISSN: 1573-7136
DOI: https://doi.org/10.1007/s11142-020-09563-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2021

Measuring credit risk using qualitative disclosure

Real effects of auditor conservatism

Analyst teams

Management forecasts of volatility

Heterogeneity in expertise in a credence goods setting: evidence from audit partners

Correction to: Management forecasts of volatility