Elsevier

Information Systems

Volume 36, Issue 2, April 2011, Pages 450-475
Information Systems

Time prediction based on process mining

https://doi.org/10.1016/j.is.2010.09.001Get rights and content

Abstract

Process mining allows for the automated discovery of process models from event logs. These models provide insights and enable various types of model-based analysis. This paper demonstrates that the discovered process models can be extended with information to predict the completion time of running instances. There are many scenarios where it is useful to have reliable time predictions. For example, when a customer phones her insurance company for information about her insurance claim, she can be given an estimate for the remaining processing time. In order to do this, we provide a configurable approach to construct a process model, augment this model with time information learned from earlier instances, and use this to predict e.g., the completion time. To provide meaningful time predictions we use a configurable set of abstractions that allow for a good balance between “overfitting” and “underfitting”. The approach has been implemented in ProM and through several experiments using real-life event logs we demonstrate its applicability.

Research highlights

► Process mining allows for the automated discovery of process models from event logs. ► Discovered process models can be extended with information to predict the completion time of running instances. ► Annotated transition systems are used to predict the completion time. ► The approach has been implemented in ProM.

Introduction

More and more information about processes is recorded by information systems in the form of so-called “event logs”. A wide variety of process-aware information systems (PAISs) [14] is recording events as they are taking place. ERP (enterprise resource planning), WFM (workflow management), BPM (business process management), CRM (customer relationship management), SCM (supply chain management), and PDM (product data management) systems are examples of PAISs that log detailed information about events in a structured manner. Despite the omnipresence and richness of these event logs, most software vendors use this information for answering only relatively simple questions under the assumption that the process is fixed and known, e.g., the calculation of simple performance metrics like utilization and flow time for a well-known process. However, in many domains processes are evolving and people typically have an oversimplified and incorrect view of the actual business processes. Therefore, process mining techniques attempt to extract non-trivial and useful information from event logs. One aspect of process mining is control-flow discovery, i.e., automatically constructing a process model (e.g., a Petri net or BPMN model) describing the causal dependencies between activities [6], [4], [7], [9], [11], [38], [37]. The basic idea of control-flow discovery is very simple: given an event log containing a set of traces, automatically construct a suitable process model “describing the behavior” seen in the log. Such discovered processes have proven to be very useful for the understanding, redesign, and continuous improvement of business processes [3].

As process mining techniques are getting more mature, there is the desire to use the discovered models in an operational setting. This means that the PAIS is using results from process mining at runtime. An example is the recommendation service presented in [31]. This service, which is implemented in the process mining tool ProM, gives advice on the best possible next activity. Unlike the fixed routing in a workflow management system which is strictly enforced, the recommendation service merely gives an advice what to do next. Another example is the prediction service presented in [13]. This service is using non-parametric regression to predict the completion time of partially executed process instances. Using the same regression technique it is also estimated whether activities will be executed in the future and, if so, the time it takes to reach them.

The goal of the approach presented in this paper is similar to the prediction service described in [13]. However, we want to overcome some of the limitations of this earlier approach and therefore use a completely new approach to time prediction. Unlike existing approaches, the problem is not reduced to a simple heuristic (e.g., always estimating half of the average flow time or the average flow time minus the already elapsed time) or a regression model. Instead an annotated transition system is generated that represents an abstraction of the process with time annotations.

Fig. 1 shows an overview of the approach. There is an information system supporting some operational process. Events that take place in this process are recorded (e.g., starting some activity). This recorded information is used to derive a process model. In the process mining field, many process discovery algorithms have been proposed. Here we use a variant of the approach described in [4], i.e., using various abstractions a transition system is derived. This transition system is annotated with information about elapsed times (e.g., the average time to reach a particular state), sojourn times (e.g., the average time spent in a particular state), and remaining times (e.g., the average time to reach the end from this state). The annotated transition system can be used to predict the remaining flow time of all or some of the running cases (i.e., process instances). In our approach, we heavily rely on the transition system generation and corresponding abstractions presented in [4]. This allows us to find a balance between a transition system that is too specific (overfitting) and one that is too general (underfitting) with respect to the log and thus provide better predictions. As will be demonstrated, our approach performs much better than simple heuristics and also outperforms regressions models in terms of efficiency and precision.

The approach can be easily extended to predict other features of a case. For example, the transition system can also be annotated with information about the occurrence of particular event or the time until a particular event. This can lead to predictions such as: “there is a 90% probability that your claim will be rejected”, “there is 90% probability that this activity will be conducted in the next three days”, and “the expected time until the submission of some intermediate report is 11 days”. Although these additional features can easily be added, we focus here on predicting the completion of a case.

The approach presented in this paper has been implemented in ProM, cf. www.processmining.org. ProM serves as a testbed for our process mining research [1] and has been applied in various domains, e.g., hospitals (AMC and Catherina hospitals), banks (ING), several municipalities (Heusden, Alkmaar, etc.), high-tech system manufacturers (ASML and Philips Medical Systems), software repositories for open-source projects, etc. To show the applicability of our results, we present an experimental evaluation using a simulation model and two real-life case studies. In these case studies, we use our techniques to predict the completion time of cases in two different Dutch municipalities.

The remainder of this paper is organized as follows. Related work is discussed in Section 2. Section 3 introduces process mining and shows that a transition system can be derived after (potentially) making some abstractions. Section 4 presents the main idea of time prediction based on annotated transition systems. The implementation in ProM is described in Section 5. Section 6 discusses various ways of defining and measuring the quality of predictions. The approach has been validated and tested using several (real-life) examples. The results of our evaluation are given in Section 7. Section 8 discusses limitations and possible extensions of our approach. Section 9 concludes the paper.

Section snippets

Related work

Since the mid-nineties several groups have been working on techniques for process mining [6], [4], [7], [9], [11], [38], [17], [37], i.e., discovering process models based on observed events. In [5] an overview is given of the early work in this domain. The idea to apply process mining in the context of workflow management systems was introduced in [7]. In parallel, Datta [11] looked at the discovery of business process models. Cook et al. investigated similar issues in the context of software

Transition system generation using abstractions

This section describes the basis for our prediction approach. We first show how to construct a transition system based on an event log. In Section 4, we annotate this transition system and use it to predict, e.g., completion times.

Time prediction

In the previous section, we showed how a transition system can be generated on the basis of an event log. In this section, we show how this transition system can be annotated and used for prediction purposes. In Fig. 2, we already showed an overview of the overall approach. Based on the event log, an annotated transition system is generated. Whenever we want to predict the completion time of some process instance, we take its partial trace (i.e., the sequence of events executed thus far) and

Implementation

The work presented in this paper is supported by various plug-ins of our process mining tool ProM [1], [3], [26]. The first version of ProM was released in 2004. The initial goal of ProM was to unify process mining efforts at Eindhoven University of Technology and other cooperating groups [5]. Traditionally, most analysis tools focusing on processes are restricted to model-based analysis, i.e., a model is used as the starting point of analysis. Such analysis is only useful if the model reflects

Quality of predictions

As shown thus far, our approach allows for the prediction of various things including the remaining time until completion. Depending on the abstraction chosen, different predictions are possible. Moreover, under certain circumstances one prediction may be less reliable than another. Therefore, this section elaborates on the quality of predictions.

Intuitively, the quality of the prediction depends on the number of measurements per state. If there are only few observations, the predictive value

Experiments

In this section, we demonstrate the applicability of our approach using one synthetic event log obtained via simulation and two real-life logs used in two case studies. For the case studies we analyzed processes in two different Dutch municipalities.

Discussion

After presenting our approach and ProM plug-ins for time prediction and illustrating these using some case studies, we now discuss potential limitations and data requirements.

The previous section clearly showed that our prediction approach outperforms more naive approaches like always predicting half of average total flow time. Moreover, in Section 6 we discussed various techniques to measure the quality of predictions using cross-validation and error measures such as RMSE and MAE. However,

Conclusion

In this paper, we presented a new method for predicting the “future of a running instance”. Given a running case, our prediction approach allows answering questions like “When will this case be finished?”, “How long does it take before activity A is completed?”, “How likely is it that activity B will be performed in the next two days?”, etc. In this paper, we mainly focused on the remaining time until completion. However, our approach can easily be used for other types of predictions. The basic

Acknowledgements

This research is supported by EIT, NWO-EW, SUPER, and the Technology Foundation STW. We would like to thank many people involved in the development of ProM. In particular we would like to thank Eric Verbeek for implementing the FSM Miner, Maja Pesic for her work on operational support, and Boudewijn van Dongen and Ronald Crooy for their work on the regression-based prediction service.

References (38)

  • J.E. Cook et al.

    Discovering models of software processes from event-based data

    ACM Transactions on Software Engineering and Methodology

    (1998)
  • R. Crooy, Predictions in information systems: a process mining perspective, Master's Thesis, Eindhoven University of...
  • A. Datta

    Automating the discovery of As-Is business process models: probabilistic and algorithmic approaches

    Information Systems Research

    (1998)
  • S.N. den Hertog, Case prediction in BPM systems: research to the predictability of the remaining time of individual...
  • B.F. van Dongen et al.

    Cycle time prediction: when will this case finally be finished?

  • M. Dumas et al.

    Process-aware Information Systems: Bridging People and Software Through Process Technology

    (2005)
  • J. Eder et al.

    Time constraints in workflow systems

  • J. Eder, H. Pichler, Probabilistic calculation of execution intervals for workflows, in: Proceedings of the 12th...
  • C.W. Günther et al.

    Fuzzy mining: adaptive process simplification based on multi-perspective metrics

  • Cited by (439)

    • Predictive Monitoring of Business Process Execution Delays

      2024, Lecture Notes in Business Information Processing
    View all citing articles on Scopus
    View full text