main-content

This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics.

After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges.

Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.

### Chapter 1. Data Science in Action

Abstract
In recent years, data science emerged as a new and important discipline. It can be viewed as an amalgamation of classical disciplines like statistics, data mining, databases, and distributed systems. Existing approaches need to be combined to turn abundantly available data into value for individuals, organizations, and society. Moreover, new challenges have emerged, not just in terms of size (“Big Data”) but also in terms of the questions to be answered. This book focuses on the analysis of behavior based on event data. Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements. In later chapters, we will show that process mining provides powerful tools for today’s data scientist. However, before introducing the main topic of the book, we provide an overview of the data science discipline.
Wil van der Aalst

### Chapter 2. Process Mining: The Missing Link

Abstract
Information systems are becoming more and more intertwined with the operational processes they support. As discussed in the previous chapter, multitudes of events are recorded by today’s information systems. Nevertheless, organizations have problems extracting value from these data. The goal of process mining is to use event data to extract process-related information, e.g., to automatically discover a process model by observing events recorded by some enterprise system. A small example is used to explain the basic concepts. These concepts will be elaborated in later chapters.
Wil van der Aalst

### Chapter 3. Process Modeling and Analysis

Abstract
The plethora of process modeling notations available today illustrates the relevance of process modeling. Some organizations may use only informal process models to structure discussions and to document procedures. However, organizations that operate at a higher BPM maturity level use models that can be analyzed and used to enact operational processes. Today, most process models are made by hand and are not based on a rigorous analysis of existing process data. This chapter serves two purposes. On the one hand, preliminaries are presented that will be used in later chapters. For example, various process modeling notations are introduced and some analysis techniques are reviewed. On the other hand, the chapter reveals the limitations of classical approaches, thus motivating the need for process mining.
Wil van der Aalst

### Chapter 4. Data Mining

Abstract
Process mining builds on two pillars: (a) process modeling and analysis (as described in Chap. 3) and (b) data mining. This chapter introduces some basic data mining approaches and structures the field. The motivation for doing so is twofold. On the one hand, some process mining techniques build on classical data mining techniques, e.g., discovery and enhancement approaches focusing on data and resources. On the other hand, ideas originating from the data mining field will be used for the evaluation of process mining results. For example, one can adopt various data mining approaches to measure the quality of the discovered or enhanced process models. Existing data mining techniques are of little use for control-flow discovery, conformance checking, and other process mining tasks. Nevertheless, a basic understanding of data mining is most helpful for fully understanding the process mining techniques presented in subsequent chapters.
Wil van der Aalst

### Chapter 5. Getting the Data

Abstract
Process mining is impossible without proper event logs. This chapter describes the information that should be present in such event logs. Depending on the process mining technique used, these requirements may vary. The challenge is to extract such data from a variety of data sources, e.g., databases, flat files, message logs, transaction logs, ERP systems, and document management systems. When merging and extracting data, both syntax and semantics play an important role. Moreover, depending on the questions one seeks to answer, different views on the available data are needed. Process mining, like any other data-driven analysis approach, needs to deal with data quality problems. We discuss typical data quality challenges encountered in reality. The insights provided in this chapter help to get the event data assumed to be present in later chapters.
Wil van der Aalst

### Chapter 6. Process Discovery: An Introduction

Abstract
Process discovery is one of the most challenging process mining tasks. Based on an event log a process model is constructed thus capturing the behavior seen in the log. This chapter introduces the topic using the rather naïve $$\alpha$$-algorithm. This algorithm nicely illustrates some of the general ideas used by many process mining algorithms and helps to understand the notion of process discovery. Moreover, the $$\alpha$$-algorithm serves as a stepping stone for discussing challenges related to process discovery.
Wil van der Aalst

### Chapter 7. Advanced Process Discovery Techniques

Abstract
The $$\alpha$$-algorithm nicely illustrates some of the main ideas behind process discovery. However, this simple algorithm is unable to manage the trade-offs involving the four quality dimensions described in Chap. 6 (fitness, simplicity, precision, and generalization). To successfully apply process mining in practice, one needs to deal with noise and incompleteness. This chapter focuses on more advanced process discovery techniques. The goal is not to present one particular technique in detail, but to provide an overview of the most relevant approaches. This will assist the reader in selecting the appropriate process discovery technique. Moreover, insights into the strengths and weaknesses of the various approaches support the correct interpretation and effective use of the discovered models.
Wil van der Aalst

### Chapter 8. Conformance Checking

Abstract
After covering control-flow discovery in depth in Part III, this chapter looks at the situation in which both a process model and an event log are given. The model may have been constructed by hand or may have been discovered. Moreover, the model may be normative or descriptive. Conformance checking relates events in the event log to activities in the process model and compares both. The goal is to find commonalities and discrepancies between the modeled behavior and the observed behavior. Conformance checking is relevant for business alignment and auditing. For example, the event log can be replayed on top of the process model to find undesirable deviations suggesting fraud or inefficiencies. Moreover, conformance checking techniques can also be used for measuring the performance of process discovery algorithms and to repair models that are not aligned well with reality.
Wil van der Aalst

### Chapter 9. Mining Additional Perspectives

Abstract
Whereas the main focus of process discovery is on the control-flow perspective, event logs may contain a wealth of information relating to other perspectives such as the organizational perspective, the case perspective, and the time perspective. Therefore, we now shift our attention to these other perspectives. Organizational mining can be used to get insight into typical work patterns, organizational structures, and social networks. Timestamps and frequencies of activities can be used to identify bottlenecks and diagnose other performance related problems. Case data can be used to better understand decision-making and analyze differences among cases. Moreover, the different perspectives can be merged into a single model providing an integrated view on the process. Such an integrated model can be used for “what if” analysis using simulation.
Wil van der Aalst

### Chapter 10. Operational Support

Abstract
Most process-mining techniques work on “post mortem” event data, i.e., they analyze events that belong to cases that have already completed. Obviously, it is not possible to influence the execution of “post mortem” cases. Moreover, cases that are still in the pipeline cannot be guided on the basis of “post mortem” event data only. Today, however, many data sources are updated in (near) real-time and sufficient computing power is available to analyze events when they occur. Therefore, process mining should not be restricted to off-line analysis and can also be used for online operational support. This chapter broadens the scope of process mining to include online decision support. For example, for a running case the remaining flow time can be predicted and suitable actions can be recommended to minimize costs.
Wil van der Aalst

### Chapter 11. Process Mining Software

Abstract
The successful application of process mining relies on good tool support. Traditional Business Intelligence (BI) tools are data-centric and focus on rather simplistic forms of analysis. Mainstream data mining and machine learning tools provide more sophisticated forms of analysis, but are also not tailored towards the analysis and improvement of processes. Fortunately, there are dedicated process mining tools able to transform event data into actionable process-related insights. For example, ProM is an open-source process mining tool supporting all of the techniques mentioned in this book. Process discovery, conformance checking, social network analysis, organizational mining, clustering, decision mining, prediction, and recommendation are all supported by ProM plug-ins. However, the usability of the hundreds of available plug-ins varies and the complexity of the tool may be overwhelming for end-users. In recent years, several vendors released dedicated process mining tools (e.g., Celonis, Disco, EDS, Fujitsu, Minit, myInvenio, Perceptive, PPM, QPR, Rialto, and SNP). These tools typically provide less functionality than ProM, but are easier to use while focusing on data extraction, performance analysis and scalability. This chapter provides an overview of available tools and trends.
Wil van der Aalst

### Chapter 12. Process Mining in the Large

Abstract
Process mining provides the technology to leverage the ever-increasing amounts of event data in modern organizations and societies. Despite the growing capabilities of modern computing infrastructures, event logs may be too large or too complex to be handled using conventional approaches. This chapter focuses on handling “Big Event Data” and relates process mining to Big Data technologies. Moreover, it is shown that process mining problems can be decomposed in two ways, case-based decomposition and activity-based decomposition. Many of the analysis techniques described can be made scalable using such decompositions. Also other performance-related topics such as streaming process mining and process cubes are discussed. The chapter shows that the lion’s share of process mining techniques can be “applied in the large” by using the right infrastructure and approach.
Wil van der Aalst

### Chapter 13. Analyzing “Lasagna Processes”

Abstract
Lasagna processes are relatively structured and the cases flowing through such processes are handled in a controlled manner. Therefore, it is possible to apply all of the process mining techniques presented in the preceding chapters. This chapter characterizes Lasagna processes and discusses typical use cases for process mining. Moreover, the different stages of a process mining project for improving a Lasagna process are described. The resulting life-cycle model guides users of process mining tools like ProM. Moreover, different application scenarios are discussed.
Wil van der Aalst

### Chapter 14. Analyzing “Spaghetti Processes”

Abstract
Spaghetti processes are the counterpart of Lasagna processes. Because Spaghetti processes are less structured, only a subset of the process mining techniques described in this book are applicable. For instance, it makes no sense to aim at operational support activities if there is too much variability. Nevertheless, process mining can help to realize dramatic process improvements by uncovering key problems.
Wil van der Aalst

### Chapter 15. Cartography and Navigation

Abstract
Process models can be seen as the “maps” describing the operational processes of organizations. Similarly, information systems can be looked at as “navigation systems” guiding the flow of work in organizations. Unfortunately, many organizations fail in creating and maintaining accurate business process maps. Often process models are outdated and have little to do with reality. Moreover, most information systems fail to provide the functionality offered by today’s navigation systems. For instance, workers are not guided by the information system and need to work behind the system’s back to get things done. Moreover, useful information such as the “estimated arrival time” of a running case is not provided. Process mining can help to overcome some of these problems.
Wil van der Aalst

### Chapter 16. Epilogue

Abstract
To conclude this book we summarize the main reasons for using process mining. Process mining can be seen as the “missing link” between data mining and traditional model-driven BPM. Although mature process mining techniques and tools are available, several challenges remain to further improve the applicability of the techniques presented in the preceding chapters. Therefore, we list the most important challenges. Finally, we encourage the reader to start using process mining today. For organizations that store event data in some form, the threshold to get started is really low.
Wil van der Aalst