This section discusses process discovery techniques that use partial orders directly or explicitly exploit the notion of concurrency. We first briefly discuss classical process discovery algorithms. Secondly, we cover techniques that explicitly assume activity lifecycles, i.e., enabling these techniques to observe true concurrency. Thereafter, we focus on partial-order-based process discovery algorithms.
4.2.1 Classical process discovery
Most classical discovery techniques (see [
2] for a detailed overview) use the total order of events in an event log and derive concurrency based on the context of events, i.e., as covered in Sect.
4.1. It has been shown that concurrency can be reliably discovered [
29], as long as the concurrency involves more complex structures than just activities, i.e., otherwise classical process discovery techniques cannot distinguish between activities being
concurrent (i.e., potentially overlapping) and being
interleaved (i.e., both being executed yet non-overlapping).
Nevertheless, discovering concurrency remains challenging due to the information required from the event log: a process of 10 concurrent activities has
\(10!{=}3\,628\,800\) different orders of execution. In techniques that use the directly follows abstraction, e.g., the previously presented
\(\alpha \) miner [
14] and its derivatives, this amount of information is alleviated to
\(10*9 = 90\) observations. In comparison, the same information can be captured using only
one partially ordered trace.
Discovery techniques that detect concurrency using totally ordered traces inherently use uncertain semantics: at least one partially ordered run must match the given totally ordered trace in the discovered model.
In contrast, [
30] uses the eventually follows abstraction to directly construct a partially ordered model, which, due to label splitting, supports looping behavior.
4.2.2 Discovery based on lifecycle information
In event logs, events can be annotated with an attribute indicating the life cycle information of that event. In particular, an event can indicate the start of the execution of an activity (an
activity instance) and the completion of an activity instance. For instance, the trace
\(\langle a_s, b_s, b_c, a_c\rangle \) denotes a trace of two activity instances (
a and
b), such that
a started, after which
b started and completed, after which
a completed. More elaborate life cycle models, for instance, supporting pausing and resuming execution, have been proposed [
31]. However, such models have thus far not been leveraged for process discovery.
In [
32], the authors propose to revise the internal data structure used of a classical process discovery algorithm, i.e., the
Heuristic Miner [
33], to be aware of activity instances. Other examples of techniques that use life cycle information are Tsinghua alpha (T
\(\alpha \)) [
34] and Inductive Miner—life cycle (IMlc) [
24]. IMlc uses the concurrent semantics, whereas T
\(\alpha \) uses the interleaved semantics. T
\(\alpha \) and IMlc do not need to know which start event belongs to which completion event, as they abstract the behavior in the event log on an activity level (rather than the activity instance level). They only consider when an activity starts and completes, not when a particular activity instance starts or ends. Such techniques do not use this information as it is not available in event logs with life cycle information: for instance, in the trace
\(\langle a_s, a_s, a_c, a_c\rangle \) it is unknown which one of the start events
\(a_s\) link to which completion event
\(a_c\).
In contrast, such information is necessary to construct a partially ordered trace. On the other hand, a partial order cannot express that a should start before b but should end only after b completes. Hence, life cycle information and partially ordered traces are orthogonal and partially ordered traces in which the events are annotated with life cycle information could be defined.
4.2.3 Partial-order-based process discovery
This section considers process discovery techniques that directly work on partial orders. It is important to note that a large amount of works focuses on the
Petri net synthesis problem based on
labeled partial orders [
35‐
43]. These techniques discover a Petri net that describes a (partial order) language that is as close as possible to the input LPOs. Typically, the resulting models are hard to interpret by a human analyst. We divide the work covered in this section,i.e., partial-order-based process discovery techniques, according to the usage of partial orders under
certain semantics and
uncertain semantics.
Certain semantics
This section covers process discovery techniques that assume certain trace semantics. As this category covers a large amount of work, we order the work chronologically.
In [
44,
45], Herbst describes, i.e., as one of the first authors on process discovery, various
classes of process discovery problems, explicitly assuming the existence of a partial order over the instances of a workflow. In [
46], partial orders are first transformed to eventually follows relations, transitively reduced, de-duplicated and transformed into a process model.
In [
47], the authors do not use partial orders as an intermediary object. However, they do account for multiple
activity stages, e.g., ready, started, etc. These states are used to detect concurrency among two activities in the process. The relation (and others) is used to construct an execution graph, which can be seen as the transitive closure of a partial order over the observed activities (if activities only occur once). The execution graphs are combined into a workflow graph.
In [
48], the authors introduce the
Multi-Phase Miner, which aggregates instance graphs (partially ordered traces) into Petri nets. This aggregation first collapses the instance graphs into projected instance graphs. Each activity of the instance graph is a node (rather than each activity instance), and the edges are annotated with how often they appeared in the corresponding instance graph. Second, the union of all projected instance graphs is taken. Third, the union is transformed into an EPC using structural transformation rules. However, the models that preserve all behavior in the input event log tend to be imprecise. The authors show that these steps preserve behavior, and fitness is guaranteed; however, the method might generalize and sacrifice precision.
In [
49], the author proposes to discover
block-structured workflow models. The algorithm assumes that the event data capture start and end times. In the first step of the algorithm, repeated executions of activities are grouped together by an overarching fresh activity, i.e., representing the repeated behavior. In the second step, the ordering relations between the different activities in the event log are used to create trace clusters. The clusters are merged, e.g., based on observations of interleaving. Every cluster is transformed into a block-structured process model, which are combined together into a single resulting process model.
The authors in [
9] use Message Sequence Charts (MSCs), which denote messages sent between processes, with the messages sent for one process being totally ordered. Each MSC is translated into a partially ordered trace. Using a set of these translated MSCs, the Multi Phase Miner can be applied, with two limitations: Each message label must be unique in each trace, and the derived partial orders must be transitively reduced.
In [
50], the authors propose to construct Petri nets from partially ordered traces using synthesis: Using linear programming, a Petri net is constructed that can replay at least all partially ordered traces in the log.
In [
51], the authors introduce three algorithms to discover process models from partially ordered event logs. To this end, first, a collection of conclusions is derived from the partially ordered runs—a conclusion expresses equality between tokens produced and tokens consumed, corresponding to the edges of the partially ordered trace. Second, a Petri net is constructed that adheres to these conclusions.
In [
52], an event log is translated to a first-order logic expression, which is subsequently used to update a workflow model incrementally. In [
53,
54], the authors propose to learn labeled partial orders which are subsequently converted into event structures. From the event structures,
occurrence nets are deduced, which are subsequently folded into a Petri net (allowing the integration of negative trace information).
In [
27], the authors use partial orders enriched with choices and conflict relations (
prime event structures (PESs)). Events in these PESes that are equivalent (or equivalent enough to allow for imprecisions to be included) are combined, and the result is translated to a Petri net. Note that to construct partially ordered traces, the presence of a concurrency oracle for each activity is assumed. Thus, duplicate labels are not possible. In [
55], a public implementation of the proposal of [
27] is presented.
In [
56], the authors use conditional partial order graphs (CPOGs) to visualize event data and as an intermediate step towards process mining. In a (potentially cyclic) graph, the edges can be annotated with boolean variables, such that for each combination of boolean variable assignments, a partial order results. The language of a CPOG is the set of total ordered traces resulting from all possible variable assignments. Finding the smallest CPOG given a set of partially ordered traces is called CPOG synthesis, and several approaches have been proposed. A CPOG describes an acyclic partially ordered language and can thus be seen as a process model. Finally, data mining techniques are applied to provide intuition to the variables of the CPOG.
In [
57], a more generic approach extending partial orders is adopted. Event logs are defined as a partial order over the events (rather than a total order). Yet, the paper primarily focuses on assigning regions to events that help further decompose the process discovery problem. In this context, since using a process discovery algorithm is seen as a black box in this paper, partial orders have no added benefit over total orders.
Prime Miner [
58] first uses life-cycle information to create partially ordered runs. Second, it folds the most frequently (up to varying thresholds) partially ordered runs into a prime event structure. Third, the prime event structure is synthesized into a Petri net using the theory of compact token flow regions.
Uncertain semantics
Interestingly, almost all work in partial-order-based process discovery assumes complete certainty in the event data logging. In [
59], the authors assume a partially ordered event log, in which multiple trace notions might be present, i.e., events can be linked to various artifacts. A directly follows-based model is discovered from partially ordered and multi-trace event logs. This approach assumes that a partial order indicates an order’s absence, thus using uncertain semantics.
In [
60], the authors assume that event data contains uncertainty. The authors assume
simple uncertainty, i.e., the exact activity may not be known, or the exact timestamp may not be known (i.e., an interval is assumed). The authors propose to build
behavior graphs as an intermediate representation of the data, which is a transitive reduction of a partial order representation of the behavior.