1 Introduction
Gaining knowledge from data is an emergent topic in many disciplines (Hashem et al.
2015), high on many organizations’ agendas, and a macro-economic game-changer (Lund et al.
2013). Many researchers use data-driven techniques such as machine learning (ML), currently at the top of Gartner’s Hype Cycle (Gartner Inc.
2018), to mine information from large datasets (Shmueli and Koppius
2011). Over the past decade, sophisticated ML techniques commonly referred to as deep learning (DL) have yielded a breakthrough in diverse data-driven applications. The application of such techniques in fields as natural language processing or pattern recognition in images has shown that DL can solve increasingly complex problems (Goodfellow et al.
2016).
In business process management (BPM), lifecycle activities such as the identification, discovery, analysis, improvement, implementation, monitoring, and controlling of business processes rely on data, even though data had to be collected manually so far (Dumas et al.
2018). Today, these activities are supported by process-aware information systems that record events and additional attributes, e.g., resources or process outcomes (van der Aalst et al.
2011a). Process outcomes generally reflect the positive or negative result delivered to actors involved in a process (Dumas et al.
2018). While early data-driven approaches leveraged data for process discovery or analysis (van der Aalst et al.
2011a), interest has risen in using data-driven approaches in lifecycle phases such as monitoring to gain predictive insights (Grigori et al.
2004). Thus, there is a shift from design time-oriented phases (e.g., discovery, analysis, and improvement), where data is exploited in offline mode, to runtime-oriented phases (e.g., monitoring), where data is used in real-time to forecast process behavior, performance, and outcomes (van der Aalst
2013). As for runtime use cases, predictive process monitoring is growing in importance (Maggi et al.
2014). Predicting the behavior, performance, and outcomes of process instances – e.g., remaining cycle time (van der Aalst et al.
2011b), compliance (Ly et al.
2015), sequence of process activities (Polato et al.
2018), the final or partial outcome (Teinemaa et al.
2019), or the prioritization of processes (Kratsch et al.
2017) – helps organizations act proactively in fast-changing environments.
Various predictive process monitoring approaches use ML techniques as, in contrast to rule-based monitoring techniques, there is no need to rely on subjective expert-defined decision rules (Kang et al.
2012). Moreover, the increasing availability of data lowers the barriers of using ML. Although the popularity of DL has increased in predictive process monitoring, most works still use classical ML techniques such as decision trees, random forests (RF), or support vector machines (SVM) (Evermann et al.
2016). However, a drawback of such techniques is that their performance heavily depends on manual feature engineering in case of low-level feature representations (Goodfellow et al.
2016). From a BPM perspective, DL promises to leverage process data for predictive purposes.
Some predictive process monitoring approaches already use DL (Di Francescomarino et al.
2018). Most of them strive for insights that help predict the next events during process execution (Evermann et al.
2017a; Mehdiyev et al.
2018; Pasquadibisceglie et al.
2019; Tax et al.
2017). To the best of our knowledge, only one approach uses DL for outcome prediction (Hinkka et al.
2019), which is an important prediction task as early predictions may entail substantial savings related to cost, time, and corporate resources (Teinemaa et al.
2019). The rare use of DL, especially for outcome-oriented predictive process monitoring, reflects a lack of understanding about when the use of DL is sensible. While most papers propose new approaches, only two studies compare existing approaches, but without considering DL techniques (Metzger et al.
2015, Teinemaa et al.
2019). Moreover, these studies only use one or a few logs for evaluation, although logs are known to have different properties (van der Aalst et al.
2011a). Thus, we investigate the following research question:
Which event log properties facilitate the use of DL techniques for outcome-
oriented predictive process monitoring?
To address this question, we compare the performance of different ML and DL techniques for a diverse set of logs in terms of established evaluation metrics. To obtain transferable results and related propositions, we combined data-to-description (Level-1 inference) and description-to-theory (Level-2 inference) generalization, as included in Lee and Baskerville (
2003) generalization framework for information systems research. This required to purposively sample both techniques and logs. As for the techniques, we selected long short term memory networks (LSTM) and simple feedforward deep neural networks (DNN), as representatives of DL, as well as RF and SVM as representatives of classical ML techniques. Regarding the event logs, we selected five publicly available logs that cover most conceivable log types, e.g., in terms of the number of process instances, number of events, or data attributes. These logs were the BPI Challenge 2011 (BPIC11), the BPI Challenge 2013 (BPIC13), the road traffic fine management process (RFTM), the production log (PL), and the review log (RL).
Our study is structured as follows: In Sect.
2, we outline the background on data-driven approaches in BPM and ML as a predictive process monitoring technique. Section
3 outlines our study design, followed by details related to data collection and analysis in Sect.
4. In Sect.
5, we present our results. In Sect.
6, we discuss the results, highlight limitations, and point to future research.
3 Study Design
To infer propositions about which log properties facilitate the use of DL techniques for outcome-oriented predictive process monitoring, we compared the performance of DL and classical ML techniques for multiple event logs. To do so, we followed the reference process for big data analysis in information systems research proposed by Müller et al. (
2016), which comprises three phases: data collection, data analysis, and result interpretation.
In the data collection phase, we first compiled five event logs (i.e., BPIC11, BPIC13, RTFM, PL, and RL). Detailed information about these logs is shown in Sect.
4.1, including rationales why outcome-oriented predictive process monitoring makes sense. To purposively sample the event logs, we derived properties from diverse event logs and classified the selected ones accordingly as shown in Sect.
4.2. As discussed in Sect.
2.1, we classified the logs according to a data and a control flow perspective, which led to two pairs of log properties per perspective. Hence, we ensured to have covered the most conceivable log types. To conclude the data collection, we preprocessed the event data and defined input features as well as targets required for the application of supervised learning techniques.
In the data analysis phase, we first documented the preprocessing in Sect.
4.3 (Müller et al.
2016). We then built classifiers after each executed activity contained in the logs. In the sense of purposive sampling, we chose RF and SVM as representatives of classical ML techniques, as they are commonly used for predictive monitoring (Marquez-Chamorro et al.
2018). As for DL, we chose LSTM as they count among the most advanced techniques and a simple feedforward DNN as an entry-level technique (Tax et al.
2017). To analyze the classifiers, we evaluated their performance after each executed activity. We stopped at the tenth activity, as we were particularly interested in early predictions. From a business perspective, the rationale is that the earlier reliable outcome predictions can be made, the more valuable they are. For the RTFM log, we only built classifiers for the first six activities as approximately 91% of the contained instances terminate before this point. To prevent issues that result from an unfavorable configuration of classifiers, we performed a random search-based optimization of hyperparameters (Bergstra and Bengio
2012). When training and testing classifiers, we also employed tenfold cross-validation as recommended by Fushiki (
2011).
In the result interpretation phase, we built on Lee and Baskerville (
2003) generalization framework for information systems research. By combining an empirical (E) and a theoretical (T) layer, this framework proposes four generalization strategies: data-to-description (EE), theory-to-description (TE), description-to-theory (ET), and concepts-to-theory (TT). In our work, we used data-to-description and description-to-theory strategies. Also referred to as Level-1 inference (Yin
1994), data-to-description generalization takes empirical data as input, which is condensed into higher-level yet still empirical observations or descriptions. This strategy also covers the well-known statistical sample-to-population generalization. Description-to-theory generalization, which is also referred to as analytical generalization or Level-2 inference (Yin
1994), aims at inferring theoretical statements in the form of propositions, i.e., “variables and the relationships among them” (Lee and Baskerville
2003, p. 236), from empirical observations or descriptions. As for Level-1 inference, we analyzed the performance of the selected techniques per event log in terms of evaluation metrics and related statistical measures (i.e., mean and standard deviation) (Sect.
5.1). As for Level-2 inference, we identified relationships between the techniques’ performance across the logs and related these cross-log observations to the log properties introduced in Sect.
2.1. Moreover, we analyzed the distribution of class labels as the target variable of outcome-oriented predictive process monitoring. On this foundation, we inferred propositions about which log properties facilitate the use of DL techniques for outcome-oriented predictive process monitoring (Sect.
5.2). Due to the purposive sampling of logs and techniques, we can claim that these propositions also hold for event logs outside those we used in our research (Lee and Baskerville
2003).