1 Introduction
2 Representing Process Log Data
2.1 Event-Logs Standards
2.2 Semantics-Aware Process Log Representations
2.3 The EBTIC-BPM Process Vocabulary
ebtic-bpm:Process
, which represents the business process, and ebtic-bpm:Task
, which represents the activities that compose the process. Each concept has a basic set of attributes: ebtic-bpm:startTime
and ebticbpm:endTime
, which represent the beginning and termination times. The relation between these two basic concepts defines the simplest way to represent a business process. The ebtic-bpm:hasTask
relation links ebtic-bpm:Process
to ebtic-bpm:Task
and represents the set of tasks belonging to a process. Three more relations apply to the ebtic-bpm:Task
concept: ebtic-bpm:followedBy
, ebticbpm:precededBy
and ebtic-bpm:hasSubTask
, which respectively indicate which tasks precede and follow a given one and which tasks are subtasks of a given one. These relations allow layering our model with specific ontologies describing the business activities addressed by an organization or additional domain knowledge originated from normative or contractual regulations [20].2.4 From Sensor Events to Semantics-Aware Log Entries
2.5 Software Architecture
3 PMon Security and Access Control
3.1 Access Control to Business Process Mining and Monitoring
3.2 RDF Access Control
- a rewriting is sound\(\iff Q_0(G) \subseteq Q(V(G))\);
- a rewriting is complete\(\iff Q(V(G)) \subseteq Q_0(G)\).
4 Research Contributions
5 The Policy Language
Policy element | Description |
---|---|
Task | This element can assume two values: allow or deny . It provides information on the effect of the policy: if it allows the resources to be accessed by the Requestor or vice versa |
Match | This element is the resource on which the policy will be applied; it represents a set of RDF triples that the Requestor is allowed to access or not (according to the value of Task) |
Condition | This element contains a set of graph patterns which are translated for their evaluation into SPARQL ASK queries\(^{\mathrm{a}}\) on the process flow that needs to be satisfied in order to apply the policy. Conditions can be connected with logical operators (And, Or and Not) |
Alternative | This element is used to sanitize the RDF graph when the Requestor is not authorized to access the triples in the Match Block. The Alternative element has two children: Find and Replace containing a graph pattern each. The Find element tells the Filter Updater which part of the original RDF data needs to be replaced with the Requestor-specific Replace one. |
DecisionPoint | This element contains a graph pattern that acts as a terminator: when it is found in the flow, the PEP stops the evaluation of Conditions\(^{\mathrm{b}}\) |
6 Overview of the Approach
6.1 The Filter Updater
u
corresponds to a filter \(V_u = < V_{allow}, V_{deny}, V_{alternative}>\), where \(V_{allow}\), \(V_{deny}\) and \(V_{alternative}\) are sets of selectors representing the policies (or, better, the obligations) \(\{P_1,..,P_i,..,P_n\}\) that apply to the requestor. We also introduce a constraint of mutually exclusion between \(V_{allow}\) and \(V_{deny}\), so that if \(V_{allow} \equiv \emptyset \iff V_{deny} \ne \emptyset \), because all obligations associated to a given Requestor have to be consistent with respect to the Task value (deny or allow). This is enforced both at the time of loading the policies in the system, by checking that only one type of obligations is present (allow or deny), and at the time of policies editing, by preventing the policy editor to put both deny and allow types of obligations. This implies the Filter Updater create and update the filter \(V_u\) by maintaining two selectors \(Q_i\) and \(QA_{i}\) for each policy \(P_i\) so that, each element \(V_{allow}\) or \(V_{deny}\), and \(V_{alternative}\) are composed by a set of selectors \(V_{allow} =\{Q_1,..,Q_i,..,Q_n\}\) or \(V_{deny} = \{Q_1,..,Q_i,..,Q_n\}\) depending on the Task value (allow or deny) and mutually exclusive, and \(V_{alternative} = \{QA_1,..,QA_i,..,QA_n\}\). We represent a selector \(Q_i\) or \(QA_i\) as a SPARQL CONSTRUCT query composed by the following elements:
t=0
. At t=0
, all access policies get simultaneously enforced and all filters are empty. Each time a monitor generates a new triple, the Filter Updater checks if the Requestor u is allowed to access the triple by invoking the PEP. The results of these calls update the selectors \(Q_i\) and \(QA_{i}\). Each call to the PEP is performed by passing the triple to filter, the PEP and returns an element containing the policy’s unique identifier, the type of Task (allow or deny), the Match block and the satisfied conditions or the Alternative block in case the triple needs to be filtered out. The results returned by the PEP to the Filter Updater are used to update the two selectors \(Q_i\) and \(QA_{i}\) of the policy \(P_i\) represented by the \(V_{allow}\), \(V_{deny}\) and \(V_{alternative}\) sets in the filter \(V_u\). The selectors are updated according to the templates shown in Table 2. Each element contained in the response from the PEP is replaced in the template: in case the Match block is returned, the element [RD] is replaced by the graph pattern contained in the Match block and the [GP] element is replaced by the graph pattern in the Match combined with the graph patterns of the elements in the Condition block. In case an Alternative block is returned, the element [RD] is replaced by the triples in the graph pattern defined in the Replace part of the alternative block and the [GP] element is defined by the graph pattern defined in the Find part of the Alternative block minus the Graph Pattern \(GP(Q_i)\) this in order to ensure that the selectors contained in \(V_{alternative}\) only replace triples where the policy conditions apply. In case the policy does not apply to the triple, a null
result is returned and the selectors \(Q_i\) and \(QA_{i}\) are not updated.s | p | o |
---|---|---|
bpi:process1 | rdf:type | bpi:LoanProcess |
bpi:process1 | bpi:amount | 120000 |
bpi:task1 | rdf:type | bpi:A_SUBMITTED |
bpi:agent1 | rdf:type | bpi:human |
bpi:task1 | bpi:performed_by_agent | bpi:agent1 |
bpi:process1 | ebtic-bpm:hasTask | bpi:task1 |
bpi:process1 | ebtic-bpm:endTime | 2010-03-02 |
bpi:process2 | rdf:type | bpi:LoanProcess |
bpi:process2 | bpi:amount | 100000 |
bpi:task2 | rdf:type | bpi:A_SUBMITTED |
bpi:agent2 | rdf:type | bpi:human |
bpi:task2 | bpi:performed_by_agent | bpi:agent2 |
bpi:process2 | ebtic-bpm:hasTask | bpi:task2 |
bpi:process2 | ebtic-bpm:endTime | 2010-03-02 |
bpi:process3 | rdf:type | bpi:LoanProcess |
bpi:process3 | bpi:amount | 200000 |
bpi:task3 | rdf:type | bpi:A_APPROVED |
bpi:agent3 | rdf:type | bpi:human |
bpi:task3 | bpi:performed_by_agent | bpi:agent3 |
bpi:process3 | ebtic-bpm:hasTask | bpi:task3 |
bpi:process3 | ebtic-bpm:endTime | 2010-03-02 |
bpi:process4 | rdf:type | bpi:LoanProcess |
bpi:process4 | bpi:amount | 100000 |
bpi:task4 | rdf:type | bpi:A_ACCEPTED |
bpi:agent4 | rdf:type | bpi:human |
bpi:task4 | bpi:performed_by_agent | bpi:agent4 |
bpi:process4 | ebtic-bpm:hasTask | bpi:task4 |
bpi:process4 | ebtic-bpm:endTime | 2010-03-02 |
?task
and ?agent
replaced with the subject and object of the triple, respectively) the conditions are tested until one is satisfied. In this case, the conditions are all tested but none of them is satisfied, so the filter cannot decide if the user is allowed to see the triple, so the triple is kept in a buffer until the decision point is satisfied (seventh triple); testing the decision point is done in the same way than testing the conditions, and it is done for each triple kept in the buffer. The selectors are not updated until the arrival of the triple, which satisfies the first condition. At this point, the PER return to the filter updater a response saying that the user is not allowed to see the triple and the elements required for updating the selectors which at this point are the ones reported in Table 4.6.2 The RDF Stream Demultiplexer
7 Validation
7.1 Correctness and Completeness
u
is the graph \(G_u\) returned by the execution of all the policies we can define:7.2 Performances
OPTIONAL
operators are present in the query, while is PSPACE-complete in the presence of OPTIONAL
operators. Still, efficient execution of SPARQL has been achieved in many practical systems by using data partitioning heuristics [69].
OPTIONAL
operator, which can be present in the conditions block of the policy. However, boolean operators used to combine different conditions inside the conditions block can be used to limit the need for OPTIONAL
operator only to exceptional cases. In order to validate our approach, we designed a set of experiments to measure performance improvement provided by our dynamic filtering enforcement approach with respect to an approach that statically apply filters such as SQR [65]. In fact, SQR can dynamically generate queries, but the enforcement of the filtering procedure is statically applied to any triple it processes. This means that SQR can be applied to a stream of RDF triples, but the triples are filtered by applying the entire policy, making it inefficient. Our approach builds the filter dynamically by applying only the necessary part of the policy. The experiment presented in this section is intended to show that, even given the high complexity of SPARQL query answering, our approach provides a viable solution for a practical PMon system, suitable for further improvement, e.g., by data partitioning heuristics. Since stream filter execution heavily impacts the overall performance of PMon, an improvement in this aspect positively affects the entire behavior of the system. We implemented our AC mechanism as a set of components for the Zeus process analyzer [43]. In our implementation, Requestors request a SPARQL endpoint address or submit a SPARQL query through a Web Service interface, after standard authentication. The Web Service takes care of passing the Requestor’s credentials to the other components where policies are extracted and applied. In our experiment, the Demultiplexer physically decouples the RDF graph \(G_u\) from the original graph G of triples. The data used for this experiment are the BPI Challenge 2012 [67] introduced in Sect. 2. The log is composed of 13.087 process instances and 262.200 activity instances divided into 24 different activity types. The log is available in the OpenXES8 format, and therefore it has been converted into RDF in order to be used with our system. The resulting RDF graph is composed of 2.379.557 triples. Experiments were carried out on a desktop pc with processor Intel core i5 2,53 ghz with 8 GBytes (1067 MHz) of RAM memory and Hard Disk of 500 GBytes (5400 rpm). The test-suite has been developed using Java9 version 7. The test-suite is composed by a Java implementation of the AC module, represented by the conceptual architecture in Fig. 2 and described in detail in Sect. 6, the flow of events, a log re-player and a flow listener. The flow of events is represented by a message queue10 ensuring that the overhead introduced by the message queue does not influence the performance of our approach. The events (represented by the triples in the RDF log) are inserted in the message queue by a Log re-player, which reads the RDF representation of the BPI Challenge Log and submits the triples to the message queue with a configurable delay between triples. From our experience in a real deployment of the analyzer, we observed that the arrival rate of triples is not constant: normally the process monitors generate bursts of triples to represent new activities in the monitored process. In the BPI Challenge log, an activity is defined by a block of 10 triples providing information on the identifier of the activity, its type, the process the activity belongs to, preceding activities, the start time and end time of the activity and the values of its attributes. This behavior has been simulated in the log re-player which sends to the message queue blocks of 10 triples with a variable delay with an average value of 1000 ms between the blocks. The benchmark represents the worst case scenario for our approach: the number of different activity types is relatively small. The Listener is invoked every time a triple is detected in the message queue and takes care of testing the triples against the PEP (Fig. 3) informing the Filter Updater with the obligations to apply to the triples flow. In order to simulate a stream analysis environment, the cache of the events log where the conditions are tested is cleaned and the RDF triples made persistent in the final triple store. The policy used for the experiments is the one defined in 1. The queries we used in the static SQR test are reported in Appendix A. (They are equal to the selectors when all the policy is applied.) We executed three runs for each test and compared the average of the resulting times.