Skip to main content

Data Quality in Process Mining

  • Chapter
  • First Online:
Interactive Process Mining in Healthcare

Part of the book series: Health Informatics ((HI))

Abstract

To cope with challenges such as tightening budgets and increased care needs, healthcare organizations are becoming increasingly aware of the need to understand their processes in order to improve them. In this respect, process mining has the unique potential to retrieve process-related insights from process execution data. Despite the wide range of algorithms that have been developed over the past decade, the reliability of process mining outcomes ultimately depends on the quality of the input data. Consistent with the notion of “Garbage In, Garbage Out”, applying process mining algorithms to low quality data can lead to counter-intuitive or even misleading decisions. Real-life healthcare event logs typically suffer from a multitude of data quality issues such as missing events, incorrect timestamps and incorrect resource information. Against this background, this chapter provides an introduction to data quality in the process mining field. Three key topics are discussed: (1) data quality taxonomies, i.e. frameworks outlining potential data quality issues, (2) data quality assessment, i.e. the identification of data quality issues, and (3) data cleaning, i.e. efforts towards alleviating data quality issues which are present in an event log.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.tf-pm.org

  2. 2.

    https://mimic.physionet.org/

  3. 3.

    https://github.com/nielsmartin/daqapo/

  4. 4.

    https://www.r-project.org/

  5. 5.

    https://bupar.net/

References

  1. van der Aalst WMP. Extracting event data from databases to unleash process mining. In: vom Brocke J, Schmiedel T. editors. BPM – driving innovation in a digital world. Cham: Springer; 2015. p. 105–28.

    Google Scholar 

  2. van der Aalst WMP. Process mining: data science in action. Heidelberg: Springer; 2016.

    Book  Google Scholar 

  3. van der Aalst WMP, Adriansyah A, Wynn M. Process mining manifesto. Lect Notes Bus Inf Process. 2012;99:169–94.

    Article  Google Scholar 

  4. Altiok T, Melamed B. Simulation modeling and analysis with Arena. San Diego: Elsevier; 2010.

    Google Scholar 

  5. Andrews R, Suriadi S, Ouyang C, Poppe E. Towards event log querying for data quality. Lect Notes Comput Sci. 2018;11229:116–34.

    Article  Google Scholar 

  6. Andrews R, Wynn MT, Vallmuur K, Ter Hofstede AH, Bosley E, Elcock M, Rashford S. Leveraging data quality to better prepare for process mining: an approach illustrated through analysing road trauma pre-hospital retrieval and transport processes in Queensland. Int J Environ Res Public Health. 2019;16(7):1138.

    Article  Google Scholar 

  7. Bayomie D, Awad A, Ezat E. Correlating unlabeled events from cyclic business processes execution. Lect Notes Comput Sci. 2016;9694:274–89.

    Article  Google Scholar 

  8. Bertoli P, Di Francescomarino C, Dragoni M, Ghidini C. Reasoning-based techniques for dealing with incomplete business process execution traces. In: Proceedings of the congress of the italian association for artificial intelligence. Springer; 2013. p. 469–80.

    Google Scholar 

  9. Bose RJCP, Mans RS, van der Aalst WMP. Wanna improve process mining results? It’s high time we consider data quality issues seriously. Tech. Rep. BPM Center Report BPM-13-02, Eindhoven University of Technology, 2013.

    Google Scholar 

  10. Bozkaya M, Gabriels J, van der Werf JM. Process diagnostics: a method based on process mining. In: Proceedings of the 2009 international conference on information, process, and knowledge management. IEEE; 2009. p. 22–7.

    Google Scholar 

  11. Di Francescomarino C, Ghidini C, Tessaris S, Sandoval IV. Completing workflow traces using action languages. Lect Notes Comput Sci. 2015;9097:314–30.

    Article  Google Scholar 

  12. Dixit PM, Suriadi S, Andrews R, Wynn MT, ter Hofstede AH, Buijs JC, van der Aalst WMP. Detection and interactive repair of event ordering imperfection in process logs. Lect Notes Comput Sci. 2018;10816:274–90.

    Article  Google Scholar 

  13. van Eck ML, Lu X, Leemans SJJ, van der Aalst WMP. PM2: a process mining project methodology. Lect Notes Comput Sci. 2015;9097:297–313.

    Article  Google Scholar 

  14. Fox F, Aggarwal VR, Whelton H, Johnson O. A data quality framework for process mining of electronic health record data. In: Proceedings of the 2018 IEEE international conference on healthcare informatics. IEEE; 2018. p. 12–21.

    Google Scholar 

  15. Gschwandtner T, Gärtner J, Aigner W, Miksch S. A taxonomy of dirty time-oriented data. Lect Notes Comput Sci. 2012;7465:58–72.

    Article  Google Scholar 

  16. Janssenswillen G, Depaire B, Swennen M, Jans M, Vanhoof K. Bupar: enabling reproducible business process analysis. Knowl Based Syst. 2019;163:927–30.

    Article  Google Scholar 

  17. Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Medical Care 2012;50:S21–9.

    Article  Google Scholar 

  18. Kim W, Choi BJ, Hong EK, Kim SK, Lee D. A taxonomy of dirty data. Data Min Knowl Disc. 2003;7(1):81–99.

    Article  MathSciNet  Google Scholar 

  19. Kirchner K, Herzberg N, Rogge-Solti A, Weske M. Embedding conformance checking in a process intelligence system in hospital environments. Lect Notes Comput Sci. 2013;7738: 126–39.

    Article  Google Scholar 

  20. Kurniati AP, Rojas E, Hogg D, Hall G, Johnson OA. The assessment of data quality issues for process mining in healthcare using medical information mart for intensive care III, a freely available e-health record database. Health Inf J. 2019;25(4):1878–93.

    Article  Google Scholar 

  21. Mans RS, van der Aalst WMP, Vanwersch RJB. Process mining in healthcare: evaluating and exploiting operational healthcare processes. Heidelberg: Springer; 2015.

    Book  Google Scholar 

  22. Martin N. Using indoor location system data to enhance the quality of healthcare event logs: opportunities and challenges. Lect Notes Bus Inf Process. 2018;342:226–38.

    Article  Google Scholar 

  23. Martin N, Van Houdt G. DaQAPO – data quality assessment for process-oriented data. Https://github.com/nielsmartin/daqapo, 2019.

    Google Scholar 

  24. Martin N, Depaire B, Caris A. The use of process mining in business process simulation model construction. Bus Inf Syst Eng. 2016;58(1):73–87.

    Article  Google Scholar 

  25. Martin N, Martinez-Millana A, Valdivieso B, Fernández-Llatas C. Interactive data cleaning for process mining: a case study of an outpatient clinic’s appointment system. Lect Notes Bus Inf Process. 2019;362:532–44.

    Article  Google Scholar 

  26. Nguyen HTC, Lee S, Kim J, Ko J, Comuzzi M. Autoencoders for improving quality of process event logs. Expert Syst Appl. 2019;131:132–47.

    Article  Google Scholar 

  27. Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.

    Google Scholar 

  28. Rebuge Á, Ferreira DR. Business process analysis in healthcare environments: a methodology based on process mining. Inf Syst. 2012;37(2):99–116.

    Article  Google Scholar 

  29. Rogge-Solti A, Mans RS, van der Aalst WMP, Weske M. Repairing event logs using timed process models. Lect Notes Comput Sci. 2013;8186:705–8.

    Article  Google Scholar 

  30. Rozinat A, Mans RS, Song M, van der Aalst WM. Discovering simulation models. Inf Syst. 2009;34(3):305–27.

    Article  Google Scholar 

  31. Suriadi S, Andrews R, ter Hofstede AH, Wynn MT. Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf Syst. 2017;64:132–50.

    Article  Google Scholar 

  32. Vanbrabant L, Martin N, Ramaekers K, Braekers K. Quality of input data in emergency department simulations: framework and assessment techniques. Simul Model Pract Theory. 2019;91:83–101.

    Article  Google Scholar 

  33. Verhulst R. Evaluating quality of event data within event logs: an extensible framework. Master’s thesis, Eindhoven University of Technology, 2016.

    Google Scholar 

  34. Wang J, Song S, Zhu X, Lin X, Sun J. Efficient recovery of missing events. IEEE Trans Knowl Data Eng. 2016;28(11):2943–57.

    Article  Google Scholar 

  35. Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst. 1996;12(4):5–33.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niels Martin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Martin, N. (2021). Data Quality in Process Mining. In: Fernandez-Llatas, C. (eds) Interactive Process Mining in Healthcare. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-53993-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-53993-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-53992-4

  • Online ISBN: 978-3-030-53993-1

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics