1 Introduction
2 Preliminaries
2.1 Trace, event, and event log
2.2 Classification and regression
3 Specifying the desired prediction tasks
3.1 Overview: prediction task specification language
3.2 Specification language requirements
3.3 Towards formalizing the condition and target expressions
3.3.1 Adding aggregate functions
-
\({\mathbf {\mathsf{{{sum}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
-
\({\mathbf {\mathsf{{{avg}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
-
\({\mathbf {\mathsf{{{min}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
-
\({\mathbf {\mathsf{{{max}}}}} (\mathsf {numSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
-
\({\mathbf {\mathsf{{{concat}}}}} (\mathsf {nonNumSrc} ;~{{{\mathbf {\mathsf{{where}}}}}}~x=\mathsf {st}:\mathsf {ed})\)
3.4 First-Order Event Expression (FOE)
3.4.1 Checking whether a closed FOE formula is satisfied
3.5 Formalizing the analytic rule
4 Building the prediction model
5 Showcases and multi-perspective prediction service
5.1 Predicting unexpected behaviour/situation
5.2 Predicting SLA/business constraints compliance
5.3 Predicting time-related information
5.4 Predicting workload-related information
5.5 Predicting resource-related information
5.6 Predicting value-added related information
6 Implementation and experiments
6.1 Experiment on BPIC 2013 event log
Model | First encoding (more features) | Second encoding (less features) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | Accuracy | W. Prec | W. Rec | F-Measure | AUC | Accuracy | W. Prec | W. Rec | F-Measure | |
Experiments with the analytic rule \(R _{\text {E1}}\) (change of group while the concept:name is not ‘queued’) | ||||||||||
\(R _{\text {E1}}\) | ||||||||||
ZeroR | 0.50 | 0.82 | 0.68 | 0.82 | 0.75 | 0.50 | 0.82 | 0.68 | 0.82 | 0.75 |
Logistic Reg. | 0.64 | 0.81 | 0.75 | 0.81 | 0.76 | 0.55 | 0.82 | 0.68 | 0.82 | 0.75 |
Naive Bayes | 0.51 | 0.21 | 0.80 | 0.21 | 0.12 | 0.54 | 0.19 | 0.79 | 0.19 | 0.09 |
Decision Tree | 0.67 | 0.78 | 0.80 | 0.78 | 0.79 | 0.68 | 0.82 | 0.76 | 0.82 | 0.77 |
Random Forest | 0.83 | 0.84 | 0.83 | 0.84 | 0.83 | 0.68 | 0.82 | 0.76 | 0.82 | 0.77 |
AdaBoost | 0.73 | 0.81 | 0.77 | 0.81 | 0.78 | 0.66 | 0.82 | 0.75 | 0.82 | 0.75 |
Extra Trees | 0.81 | 0.83 | 0.81 | 0.83 | 0.82 | 0.68 | 0.82 | 0.76 | 0.82 | 0.77 |
Voting | 0.81 | 0.81 | 0.81 | 0.81 | 0.81 | 0.68 | 0.82 | 0.76 | 0.82 | 0.77 |
Deep Neural Net. | 0.73 | 0.83 | 0.81 | 0.83 | 0.81 | 0.68 | 0.83 | 0.78 | 0.83 | 0.75 |
Experiments with the analytic rule \(R _{\text {E2}}\) (change of people/group and change back to the original person/group) | ||||||||||
\(R _{\text {E2}}\) | ||||||||||
ZeroR | 0.50 | 0.79 | 0.63 | 0.79 | 0.70 | 0.50 | 0.79 | 0.63 | 0.79 | 0.70 |
Logistic Reg. | 0.77 | 0.82 | 0.80 | 0.82 | 0.80 | 0.62 | 0.81 | 0.78 | 0.81 | 0.76 |
Naive Bayes | 0.69 | 0.79 | 0.75 | 0.79 | 0.75 | 0.63 | 0.80 | 0.77 | 0.80 | 0.76 |
Decision Tree | 0.73 | 0.82 | 0.82 | 0.82 | 0.82 | 0.76 | 0.82 | 0.80 | 0.82 | 0.80 |
Random Forest | 0.85 | 0.86 | 0.85 | 0.86 | 0.85 | 0.78 | 0.82 | 0.80 | 0.82 | 0.80 |
AdaBoost | 0.81 | 0.84 | 0.83 | 0.84 | 0.83 | 0.68 | 0.81 | 0.79 | 0.81 | 0.77 |
Extra Trees | 0.85 | 0.86 | 0.85 | 0.86 | 0.86 | 0.78 | 0.82 | 0.80 | 0.82 | 0.80 |
Voting | 0.85 | 0.86 | 0.85 | 0.86 | 0.85 | 0.77 | 0.82 | 0.81 | 0.82 | 0.81 |
Deep Neural Net. | 0.77 | 0.86 | 0.86 | 0.86 | 0.85 | 0.78 | 0.83 | 0.82 | 0.83 | 0.80 |
Experiments with the analytic rule \(R _{\text {E3}}\) (involves at least three different groups) | ||||||||||
\(R _{\text {E3}}\) | ||||||||||
ZeroR | 0.50 | 0.74 | 0.54 | 0.74 | 0.63 | 0.50 | 0.74 | 0.54 | 0.74 | 0.63 |
Logistic Reg. | 0.78 | 0.78 | 0.76 | 0.78 | 0.76 | 0.77 | 0.79 | 0.77 | 0.79 | 0.77 |
Naive Bayes | 0.75 | 0.76 | 0.73 | 0.76 | 0.70 | 0.76 | 0.77 | 0.75 | 0.77 | 0.73 |
Decision Tree | 0.79 | 0.82 | 0.83 | 0.82 | 0.83 | 0.81 | 0.82 | 0.82 | 0.82 | 0.82 |
Random Forest | 0.92 | 0.87 | 0.87 | 0.87 | 0.87 | 0.83 | 0.82 | 0.82 | 0.82 | 0.82 |
AdaBoost | 0.89 | 0.86 | 0.86 | 0.86 | 0.86 | 0.83 | 0.81 | 0.80 | 0.81 | 0.80 |
Extra Trees | 0.91 | 0.87 | 0.87 | 0.87 | 0.87 | 0.82 | 0.82 | 0.82 | 0.82 | 0.82 |
Voting | 0.91 | 0.85 | 0.85 | 0.85 | 0.85 | 0.82 | 0.82 | 0.81 | 0.82 | 0.82 |
Deep Neural Net. | 0.85 | 0.85 | 0.84 | 0.85 | 0.84 | 0.83 | 0.83 | 0.82 | 0.83 | 0.82 |
Model | First Encoding (more features) | Second Encoding (less features) | ||
---|---|---|---|---|
MAE (in days) | RMSE (in days) | MAE (in days) | RMSE (in days) | |
Experiments with the analytic rule \(R _{\text {E4}}\) (the remaining duration of all waiting-related events) | ||||
\(R _{\text {E4}}\)
| ||||
ZeroR | 5.977 | 6.173 | 5.977 | 6.173 |
Linear Reg. | 5.946 | 6.901 | 6.16 | 6.462 |
Decision Tree | 5.431 | 17.147 | 5.8 | 7.227 |
Random Forest | 4.808 | 8.624 | 5.81 | 7.114 |
AdaBoost | 14.011 | 18.349 | 14.181 | 15.164 |
Extra Trees | 4.756 | 8.612 | 5.799 | 7.132 |
Deep Neural Net. |
2.205
|
4.702
|
4.064
|
4.596
|
Experiments with the analytic rule \(R _{\text {E5}}\) (the remaining duration of all events in which the status is “wait”) | ||||
\(R _{\text {E5}}\)
| ||||
ZeroR | 1.061 |
1.164
| 1.061 | 1.164 |
Linear Reg. | 1.436 | 1.974 | 1.099 | 1.233 |
Decision Tree | 0.685 | 5.165 | 1.003 | 1.66 |
Random Forest | 0.713 | 3.396 | 1.016 | 1.683 |
AdaBoost | 1.507 | 3.89 | 1.044 | 1.537 |
Extra Trees | 0.843 | 3.719 | 1.005 | 1.649 |
Deep Neural Net. |
0.37
| 2.037 |
0.683
|
0.927
|
6.2 Experiment on BPIC 2012 event log
Model | First Encoding (more features) | Second Encoding (less features) | ||
---|---|---|---|---|
MAE (in days) | RMSE (in days) | MAE (in days) | RMSE (in days) | |
Experiments with the analytic rule \(R _{\text {E6}}\) (total duration of all remaining activities named ‘W_Completeren aanvraag’) | ||||
\(R _{\text {E6}}\) | ||||
ZeroR | 3.963 | 5.916 | 3.963 | 5.916 |
Linear Reg. | 3.613 | 5.518 | 3.677 | 5.669 |
Decision Tree | 2.865 | 5.221 | 2.876 | 5.228 |
Random Forest | 2.863 | 5.198 | 2.877 | 5.213 |
AdaBoost | 3.484 | 5.655 | 3.484 | 5.655 |
Extra Trees | 2.857 | 5.185 | 2.868 | 5.191 |
Deep Neural Net. | 2.487 | 5.683 | 2.523 | 5.667 |
Model | First encoding (more features) | Second encoding (less features) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | Accuracy | W. Prec | W. Rec | F-Measure | AUC | Accuracy | W. Prec | W. Rec | F-Measure | |
Experiments with the analytic rule \(R _{\text {E7}}\) (predict whether an application will be eventually ‘DECLINED’) | ||||||||||
\(R _{\text {E7}}\)
| ||||||||||
ZeroR | 0.50 | 0.78 | 0.61 | 0.78 | 0.68 | 0.50 | 0.78 | 0.61 | 0.78 | 0.68 |
Logistic Reg. | 0.69 | 0.78 | 0.75 | 0.78 | 0.76 | 0.69 | 0.77 | 0.71 | 0.77 | 0.71 |
Naive Bayes | 0.67 | 0.33 | 0.74 | 0.33 | 0.30 | 0.67 | 0.33 | 0.73 | 0.33 | 0.30 |
Decision Tree | 0.70 | 0.78 | 0.76 | 0.78 | 0.77 | 0.70 | 0.78 | 0.76 | 0.78 | 0.77 |
Random Forest |
0.71
| 0.79 | 0.77 | 0.79 |
0.78
|
0.71
| 0.79 | 0.77 | 0.79 |
0.78
|
AdaBoost |
0.71
|
0.81
|
0.78
|
0.81
|
0.78
|
0.71
|
0.80
|
0.78
|
0.80
|
0.78
|
Extra Trees |
0.71
| 0.79 | 0.77 | 0.79 |
0.78
|
0.71
| 0.79 | 0.77 | 0.79 |
0.78
|
Voting |
0.71
| 0.79 | 0.77 | 0.79 |
0.78
|
0.71
| 0.79 | 0.77 | 0.79 | 0.77 |
Deep Neural Net. |
0.71
| 0.80 | 0.77 | 0.80 |
0.78
|
0.71
|
0.80
|
0.78
|
0.80
|
0.78
|
6.3 Experiment on BPIC 2015 event log
Model | First encoding (more features) | Second encoding (less features) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
AUC | Accuracy | W. Prec | W. Rec | F-Measure | AUC | Accuracy | W. Prec | W. Rec | F-Measure | |
Experiments with the analytic rule \(R _{\text {E8}}\) (predicting whether a process is complex) | ||||||||||
\(R _{\text {E8}}\) | ||||||||||
ZeroR | 0.50 | 0.57 | 0.32 | 0.57 | 0.41 | 0.50 | 0.57 | 0.32 | 0.57 | 0.41 |
Logistic Reg. | 0.92 | 0.83 | 0.85 | 0.83 | 0.83 | 0.90 | 0.84 | 0.84 | 0.84 | 0.83 |
Naive Bayes | 0.81 | 0.72 | 0.82 | 0.72 | 0.71 | 0.93 | 0.68 | 0.81 | 0.68 | 0.66 |
Decision Tree | 0.80 | 0.79 | 0.80 | 0.79 | 0.80 | 0.84 | 0.85 | 0.85 | 0.85 | 0.85 |
Random Forest | 0.95 | 0.89 | 0.89 | 0.89 | 0.89 | 0.95 | 0.90 | 0.90 | 0.90 | 0.90 |
AdaBoost | 0.92 | 0.87 | 0.87 | 0.87 | 0.87 | 0.93 | 0.88 | 0.88 | 0.88 | 0.88 |
Extra Trees | 0.95 | 0.88 | 0.88 | 0.88 | 0.88 | 0.95 | 0.88 | 0.89 | 0.88 | 0.88 |
Voting | 0.94 | 0.85 | 0.86 | 0.85 | 0.86 | 0.95 | 0.88 | 0.88 | 0.88 | 0.88 |
Deep Neural Net. | 0.89 | 0.84 | 0.84 | 0.84 | 0.84 | 0.92 | 0.84 | 0.84 | 0.84 | 0.84 |
Model | First Encoding (more features) | Second Encoding (less features) | ||
---|---|---|---|---|
MAE | RMSE | MAE | RMSE | |
Experiments with the analytic rule \(R _{\text {E9}}\) (the number of the remaining events/activities) | ||||
\(R _{\text {E9}}\)
| ||||
ZeroR | 11.21 | 13.274 | 11.21 | 13.274 |
Linear Reg. | 6.003 | 7.748 | 14.143 | 18.447 |
Decision Tree | 6.972 | 9.296 | 6.752 | 9.167 |
Random Forest | 4.965 | 6.884 | 4.948 | 6.993 |
AdaBoost | 4.971 | 6.737 | 4.879 | 6.714 |
Extra Trees |
4.684
|
6.567
|
4.703
|
6.627
|
Deep Neural Net. | 6.325 | 8.185 | 5.929 | 7.835 |