Skip to main content
Top

2017 | OriginalPaper | Chapter

Toward a New Generation of Log Pre-processing Methods for Process Mining

Authors : Paolo Ceravolo, Ernesto Damiani, Mohammadsadegh Torabi, Sylvio Barbon Jr.

Published in: Business Process Management Forum

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Real-life processes are typically less structured and more complex than expected by stakeholders. For this reason, process discovery techniques often deliver models less understandable and useful than expected. In order to address this issue, we propose a method based on statistical inference for pre-processing event logs. We measure the distance between different segments of the event log, computing the probability distribution of observing activities in specific positions. Because segments are generated based on time-domain, business rules or business management system properties, we get a characterisation of these segments in terms of both business and process aspects. We demonstrate the applicability of this approach by developing a case study with real-life event logs and showing that our method is offering interesting properties in term of computational complexity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Some works, such as for instance [22], define as “Clustering” the identification of similar activities, this is also a pre-processing task relevant to our discussion, however in this paper we are using “Clustering” for referring uniquely to the process of segmenting event logs.
 
2
The Python implementation of the algorithms adopted to implement and test our method is available at http://​www.​uel.​br/​grupo-pesquisa/​remid/​wp-content/​uploads/​LightPMClusterin​g.​rar.
 
4
The edit distance between two strings is the minimum number of operations required to transform one string into the other.
 
5
Clearly, by generating segments the information on the control-flow encoded in matrices is aggregated using a compensative approach that can bias the comparisons. We plan to address this problem in future studies by using intra- and inter-segment similarity metrics.
 
6
Note that, when we do not reject \(H_{0}\), it does not mean that \(H_{0}\) is true. It means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of \(H_{0}\).
 
7
The rates provided are 3 “Neither agree nor disagree”, 4 “Agree”, and 4 “Agree”.
 
8
The rates provided are 1 “Strongly disagree”, 2 “Disagree”, and 2 “Disagree”.
 
Literature
1.
go back to reference Appice, A., Malerba, D.: A co-training strategy for multiple view clustering in process mining. IEEE Trans. Serv. Comput. 9(6), 832–845 (2016)CrossRef Appice, A., Malerba, D.: A co-training strategy for multiple view clustering in process mining. IEEE Trans. Serv. Comput. 9(6), 832–845 (2016)CrossRef
2.
go back to reference Bernardi, S., Requeno, J.I., Joubert, C., Romeu, A.: A systematic approach for performance evaluation using process mining: the POSIDONIA operations case study. In: Proceedings of the 2nd International Workshop on Quality-Aware DevOps, pp. 24–29. ACM (2016) Bernardi, S., Requeno, J.I., Joubert, C., Romeu, A.: A systematic approach for performance evaluation using process mining: the POSIDONIA operations case study. In: Proceedings of the 2nd International Workshop on Quality-Aware DevOps, pp. 24–29. ACM (2016)
3.
go back to reference Bogarín, A., Romero, C., Cerezo, R., Sánchez-Santillán, M.: Clustering for improving educational process mining. In: Proceedings of the Fourth International Conference on Learning Analytics And Knowledge, pp. 11–15. ACM (2014) Bogarín, A., Romero, C., Cerezo, R., Sánchez-Santillán, M.: Clustering for improving educational process mining. In: Proceedings of the Fourth International Conference on Learning Analytics And Knowledge, pp. 11–15. ACM (2014)
4.
go back to reference Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 127–134. IEEE (2013) Bose, R.P.J.C., Mans, R.S., van der Aalst, W.M.P.: Wanna improve process mining results? In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 127–134. IEEE (2013)
5.
go back to reference Bose, R.P.J.C., van der Aalst, W.M.P.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 401–412. SIAM (2009) Bose, R.P.J.C., van der Aalst, W.M.P.: Context aware trace clustering: towards improving process mining results. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 401–412. SIAM (2009)
6.
go back to reference Ceravolo, P., Azzini, A., Damiani, E., Lazoi, M., Marra, M., Corallo, A.: Translating process mining results into intelligible business information. In: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society, p. 14. ACM (2016) Ceravolo, P., Azzini, A., Damiani, E., Lazoi, M., Marra, M., Corallo, A.: Translating process mining results into intelligible business information. In: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society, p. 14. ACM (2016)
7.
go back to reference Ceravolo, P., Fugazza, C., Leida, M.: Modeling semantics of business rules. In: Digital EcoSystems and Technologies Conference, DEST 2007, Inaugural IEEE-IES, pp. 171–176. IEEE (2007) Ceravolo, P., Fugazza, C., Leida, M.: Modeling semantics of business rules. In: Digital EcoSystems and Technologies Conference, DEST 2007, Inaugural IEEE-IES, pp. 171–176. IEEE (2007)
8.
go back to reference Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007) Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)
9.
go back to reference Chen, J., Yan, Y., Liu, X., Yu, Y.: A method of process similarity measure based on task clustering abstraction. In: Ouyang, C., Jung, J.-Y. (eds.) AP-BPM 2014. LNBIP, vol. 181, pp. 89–102. Springer, Cham (2014). doi:10.1007/978-3-319-08222-6_7 Chen, J., Yan, Y., Liu, X., Yu, Y.: A method of process similarity measure based on task clustering abstraction. In: Ouyang, C., Jung, J.-Y. (eds.) AP-BPM 2014. LNBIP, vol. 181, pp. 89–102. Springer, Cham (2014). doi:10.​1007/​978-3-319-08222-6_​7
10.
go back to reference Damiani, E., Ceravolo, P., Fugazza, C., Reed, K.: Representing and validating digital business processes. In: Filipe, J., Cordeiro, J. (eds.) WEBIST 2007. LNBIP, vol. 8, pp. 19–32. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68262-2_2 CrossRef Damiani, E., Ceravolo, P., Fugazza, C., Reed, K.: Representing and validating digital business processes. In: Filipe, J., Cordeiro, J. (eds.) WEBIST 2007. LNBIP, vol. 8, pp. 19–32. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-68262-2_​2 CrossRef
11.
go back to reference de Leoni, M., van der Aalst, W.M.P., Dees, M.: A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf. Syst. 56, 235–257 (2016)CrossRef de Leoni, M., van der Aalst, W.M.P., Dees, M.: A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf. Syst. 56, 235–257 (2016)CrossRef
14.
go back to reference Jain, A.K., Hong, L., Pankanti, S.: IEEE draft standard for XES - extensible event stream - for achieving interoperability in event logs and event streams. Technical report P1849, IEEE-SA (2016) Jain, A.K., Hong, L., Pankanti, S.: IEEE draft standard for XES - extensible event stream - for achieving interoperability in event logs and event streams. Technical report P1849, IEEE-SA (2016)
15.
go back to reference Joe, H.: Dependence Modeling with Copulas. CRC Press (2014) Joe, H.: Dependence Modeling with Copulas. CRC Press (2014)
16.
go back to reference Knight, W.R.:A computer method for calculating kendall’s tau with ungrouped data. J. Am. Stat. Assoc. 61(314), 436–439 (1966)CrossRef Knight, W.R.:A computer method for calculating kendall’s tau with ungrouped data. J. Am. Stat. Assoc. 61(314), 436–439 (1966)CrossRef
17.
go back to reference Luengo, D., Sepúlveda, M.: Applying clustering in process mining to find different versions of a business process that changes over time. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 153–158. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28108-2_15 CrossRef Luengo, D., Sepúlveda, M.: Applying clustering in process mining to find different versions of a business process that changes over time. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 153–158. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-28108-2_​15 CrossRef
18.
go back to reference Rebuge, Á., Ferreira, D.R.: Business process analysis in healthcare environments: a methodology based on process mining. Inf. Syst. 37(2), 99–116 (2012)CrossRef Rebuge, Á., Ferreira, D.R.: Business process analysis in healthcare environments: a methodology based on process mining. Inf. Syst. 37(2), 99–116 (2012)CrossRef
19.
go back to reference Rojas, E., Munoz-Gama, J., Sepúlveda, M., Capurro, D.: Process mining in healthcare: a literature review. J. Biomed. Inform. 61, 224–236 (2016)CrossRef Rojas, E., Munoz-Gama, J., Sepúlveda, M., Capurro, D.: Process mining in healthcare: a literature review. J. Biomed. Inform. 61, 224–236 (2016)CrossRef
20.
go back to reference Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00328-8_11 CrossRef Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-00328-8_​11 CrossRef
21.
go back to reference Van der Aalst, W.M.P.: Process Mining. Data Science in Action. Springer, Heidelberg (2016)CrossRef Van der Aalst, W.M.P.: Process Mining. Data Science in Action. Springer, Heidelberg (2016)CrossRef
22.
go back to reference Dongen, B.F., Adriansyah, A.: Process mining: fuzzy clustering and performance visualization. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 158–169. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12186-9_15 CrossRef Dongen, B.F., Adriansyah, A.: Process mining: fuzzy clustering and performance visualization. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 158–169. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-12186-9_​15 CrossRef
23.
go back to reference Whissell, J.S., Clarke, C.L.A.: Effective measures for inter-document similarity. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 1361–1370. ACM (2013) Whissell, J.S., Clarke, C.L.A.: Effective measures for inter-document similarity. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 1361–1370. ACM (2013)
24.
go back to reference Yoo, S., Cho, M., Kim, E., Kim, S., Sim, Y., Yoo, D., Hwang, H., Song, M.: Assessment of hospital processes using a process mining technique: outpatient process analysis at a tertiary hospital. Int. J. Med. Inform. 88, 34–43 (2016)CrossRef Yoo, S., Cho, M., Kim, E., Kim, S., Sim, Y., Yoo, D., Hwang, H., Song, M.: Assessment of hospital processes using a process mining technique: outpatient process analysis at a tertiary hospital. Int. J. Med. Inform. 88, 34–43 (2016)CrossRef
Metadata
Title
Toward a New Generation of Log Pre-processing Methods for Process Mining
Authors
Paolo Ceravolo
Ernesto Damiani
Mohammadsadegh Torabi
Sylvio Barbon Jr.
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-65015-9_4