Abstract
To cope with challenges such as tightening budgets and increased care needs, healthcare organizations are becoming increasingly aware of the need to understand their processes in order to improve them. In this respect, process mining has the unique potential to retrieve process-related insights from process execution data. Despite the wide range of algorithms that have been developed over the past decade, the reliability of process mining outcomes ultimately depends on the quality of the input data. Consistent with the notion of “Garbage In, Garbage Out”, applying process mining algorithms to low quality data can lead to counter-intuitive or even misleading decisions. Real-life healthcare event logs typically suffer from a multitude of data quality issues such as missing events, incorrect timestamps and incorrect resource information. Against this background, this chapter provides an introduction to data quality in the process mining field. Three key topics are discussed: (1) data quality taxonomies, i.e. frameworks outlining potential data quality issues, (2) data quality assessment, i.e. the identification of data quality issues, and (3) data cleaning, i.e. efforts towards alleviating data quality issues which are present in an event log.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
van der Aalst WMP. Extracting event data from databases to unleash process mining. In: vom Brocke J, Schmiedel T. editors. BPM – driving innovation in a digital world. Cham: Springer; 2015. p. 105–28.
van der Aalst WMP. Process mining: data science in action. Heidelberg: Springer; 2016.
van der Aalst WMP, Adriansyah A, Wynn M. Process mining manifesto. Lect Notes Bus Inf Process. 2012;99:169–94.
Altiok T, Melamed B. Simulation modeling and analysis with Arena. San Diego: Elsevier; 2010.
Andrews R, Suriadi S, Ouyang C, Poppe E. Towards event log querying for data quality. Lect Notes Comput Sci. 2018;11229:116–34.
Andrews R, Wynn MT, Vallmuur K, Ter Hofstede AH, Bosley E, Elcock M, Rashford S. Leveraging data quality to better prepare for process mining: an approach illustrated through analysing road trauma pre-hospital retrieval and transport processes in Queensland. Int J Environ Res Public Health. 2019;16(7):1138.
Bayomie D, Awad A, Ezat E. Correlating unlabeled events from cyclic business processes execution. Lect Notes Comput Sci. 2016;9694:274–89.
Bertoli P, Di Francescomarino C, Dragoni M, Ghidini C. Reasoning-based techniques for dealing with incomplete business process execution traces. In: Proceedings of the congress of the italian association for artificial intelligence. Springer; 2013. p. 469–80.
Bose RJCP, Mans RS, van der Aalst WMP. Wanna improve process mining results? It’s high time we consider data quality issues seriously. Tech. Rep. BPM Center Report BPM-13-02, Eindhoven University of Technology, 2013.
Bozkaya M, Gabriels J, van der Werf JM. Process diagnostics: a method based on process mining. In: Proceedings of the 2009 international conference on information, process, and knowledge management. IEEE; 2009. p. 22–7.
Di Francescomarino C, Ghidini C, Tessaris S, Sandoval IV. Completing workflow traces using action languages. Lect Notes Comput Sci. 2015;9097:314–30.
Dixit PM, Suriadi S, Andrews R, Wynn MT, ter Hofstede AH, Buijs JC, van der Aalst WMP. Detection and interactive repair of event ordering imperfection in process logs. Lect Notes Comput Sci. 2018;10816:274–90.
van Eck ML, Lu X, Leemans SJJ, van der Aalst WMP. PM2: a process mining project methodology. Lect Notes Comput Sci. 2015;9097:297–313.
Fox F, Aggarwal VR, Whelton H, Johnson O. A data quality framework for process mining of electronic health record data. In: Proceedings of the 2018 IEEE international conference on healthcare informatics. IEEE; 2018. p. 12–21.
Gschwandtner T, Gärtner J, Aigner W, Miksch S. A taxonomy of dirty time-oriented data. Lect Notes Comput Sci. 2012;7465:58–72.
Janssenswillen G, Depaire B, Swennen M, Jans M, Vanhoof K. Bupar: enabling reproducible business process analysis. Knowl Based Syst. 2019;163:927–30.
Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Medical Care 2012;50:S21–9.
Kim W, Choi BJ, Hong EK, Kim SK, Lee D. A taxonomy of dirty data. Data Min Knowl Disc. 2003;7(1):81–99.
Kirchner K, Herzberg N, Rogge-Solti A, Weske M. Embedding conformance checking in a process intelligence system in hospital environments. Lect Notes Comput Sci. 2013;7738: 126–39.
Kurniati AP, Rojas E, Hogg D, Hall G, Johnson OA. The assessment of data quality issues for process mining in healthcare using medical information mart for intensive care III, a freely available e-health record database. Health Inf J. 2019;25(4):1878–93.
Mans RS, van der Aalst WMP, Vanwersch RJB. Process mining in healthcare: evaluating and exploiting operational healthcare processes. Heidelberg: Springer; 2015.
Martin N. Using indoor location system data to enhance the quality of healthcare event logs: opportunities and challenges. Lect Notes Bus Inf Process. 2018;342:226–38.
Martin N, Van Houdt G. DaQAPO – data quality assessment for process-oriented data. Https://github.com/nielsmartin/daqapo, 2019.
Martin N, Depaire B, Caris A. The use of process mining in business process simulation model construction. Bus Inf Syst Eng. 2016;58(1):73–87.
Martin N, Martinez-Millana A, Valdivieso B, Fernández-Llatas C. Interactive data cleaning for process mining: a case study of an outpatient clinic’s appointment system. Lect Notes Bus Inf Process. 2019;362:532–44.
Nguyen HTC, Lee S, Kim J, Ko J, Comuzzi M. Autoencoders for improving quality of process event logs. Expert Syst Appl. 2019;131:132–47.
Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.
Rebuge Á, Ferreira DR. Business process analysis in healthcare environments: a methodology based on process mining. Inf Syst. 2012;37(2):99–116.
Rogge-Solti A, Mans RS, van der Aalst WMP, Weske M. Repairing event logs using timed process models. Lect Notes Comput Sci. 2013;8186:705–8.
Rozinat A, Mans RS, Song M, van der Aalst WM. Discovering simulation models. Inf Syst. 2009;34(3):305–27.
Suriadi S, Andrews R, ter Hofstede AH, Wynn MT. Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf Syst. 2017;64:132–50.
Vanbrabant L, Martin N, Ramaekers K, Braekers K. Quality of input data in emergency department simulations: framework and assessment techniques. Simul Model Pract Theory. 2019;91:83–101.
Verhulst R. Evaluating quality of event data within event logs: an extensible framework. Master’s thesis, Eindhoven University of Technology, 2016.
Wang J, Song S, Zhu X, Lin X, Sun J. Efficient recovery of missing events. IEEE Trans Knowl Data Eng. 2016;28(11):2943–57.
Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst. 1996;12(4):5–33.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Martin, N. (2021). Data Quality in Process Mining. In: Fernandez-Llatas, C. (eds) Interactive Process Mining in Healthcare. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-030-53993-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-53993-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53992-4
Online ISBN: 978-3-030-53993-1
eBook Packages: MedicineMedicine (R0)