Elsevier

Computers in Industry

Volume 63, Issue 9, December 2012, Pages 867-881
Computers in Industry

On efficient processing of BPMN-Q queries

https://doi.org/10.1016/j.compind.2012.06.002Get rights and content

Abstract

Business processes are central to the operation of both public and private organizations. With the rapid growth in the number of process models developed by different process designers, it becomes crucial for business process designers to be able to look up the repository for models that could handle a similar situation before developing new ones. Therefore, providing support for querying these business process repositories is a crucial requirement. BPMN-Q is a visual language that has been designed to query repositories of business process models.

In this paper, we present a novel approach for efficient evaluation of BPMN-Q queries. The approach is based on indexing process models by the transitive closure of their control flow relation as well as path indexes. The closure index is precomputed while the path index is incrementally built through processing of queries. These indexes are utilized to achieve an effective filtering process and an efficient verification check for BPMN-Q queries. The results of our experimental evaluation show the effectiveness of our proposed approach.

Introduction

Business process modeling is an essential first phase in the business process engineering chain. Models are created by business analysts with an objective to capture business requirements, enable a better understanding of business processes, facilitate communication between business analysts and IT experts, identify process improvement options and serve as a basis for executable business processes. Designing a new process model is a highly complex, time consuming and error prone task. As the number of business process models increases, providing business process designers with a capability of querying and reusing previously designed process models is of great practical value. Reusing implies the need for querying a process repository in order to find a suitable previous work that can be the basis for a new design. Moreover, the querying is on the content of the process model rather than a sort of keyword matching on meta data associated to process models. Additionally, with the huge size of process repositories, it is of utmost importance to have efficient querying techniques. Having a query technique(language) that takes a considerable amount of time to process a query would only encourage the user, e.g., the process designer, to give up looking for something to reuse from the repository and instead start from scratch designing a new process.

In a previous work, we introduced BPMN-Q [2] as a visual language to query repositories of business process models. The language has been applied in different scenarios where reuse or retrieval of process models based on a query pattern was needed [6], [7], [22], [18]. In general the query processing of BPMN-Q matches a query graph to the process graph and identifies the sub-graph(s) that represent a match to the query, if there are any. Due to its richness in expressing queries against process models, the evaluation process of a BPMN-Q query can be quite expensive one. Thus, it is of crucial importance to develop an efficient query processor for BPMN-Q queries especially with the continuous growth in the sizes of process model repositories.

In principle, the evaluation process of a BPMN-Q query can be intuitively mapped to a graph matching problem where a BPMN-Q query is represented as a query graph against a process model repository (e.g. Oryx [11]) representing a graph database. Owing to the very expensive cost of pair-wise comparison of graph data structures, most of the proposed graph query processing techniques (e.g. [14], [23], [30]) rely on a strategy which consists of two main steps: filtering and verification. For a given graph database D and a graph query q, the main objective of the filtering and verification methodology is to avoid comparing the query graph q with each graph database member gi  D to check whether gi satisfies the conditions of q. Hence, the processing of these querying techniques goes through the following three main steps:

  • 1.

    Off-line index construction: generates a feature set F from the graph database D. An inverted index is then built to map each feature to its associated list of related graphs.

  • 2.

    Filtering: probes the constructed index of (step 1) to eliminate, as much as possible, members that are identified as being of the false results. The indexed features are used to produce a set of candidate graphs which is related to the feature of the query q.

  • 3.

    Verification: checks each candidate graph g from the filtering phase to verify whether it indeed satisfies the conditions of the query q.

Since the candidate answer set is in general smaller than the entire graph database, query processing using the indexes is significantly more efficient than the sequential scan approach. However, candidate verification is still very expensive since the size of the candidate answer set is at least equal to that of the exact answer set in the optimal case but it is usually larger for most of the queries. Therefore, reducing the size of the candidate answer set by removing as much as possible of the false positive graphs is the main criteria to evaluate the effectiveness of the filtering techniques.

In principle, the query processing of BPMN-Q queries is more challenging than the traditional graph queries in the sense that the edges between the nodes of the BPMN-Q queries are not always simple or direct connections that can be evaluated using intuitive retrieval mechanisms. However, BPMN-Q query edges can represent more complex and recursive types of connections between the nodes of the business process graph models.

In this article, we describe a novel approach for efficient evaluation of BPMN-Q queries. Our approach is based on indexing process models by the transitive closure of their control flow relation as well as path indexes. The closure index is precomputed as a feature for all process models in the repository. The path index, however, is incrementally built based on the queries that are run against the repository. The path index was meant to be incrementally built to avoid the overhead of huge computation costs of unused paths in queries. Using these indexes, we show how we can achieve a more effective filtering process and a more efficient verification check for BPMN-Q queries. In particular, we summarize the main contributions of this article as follows:

  • We present two efficient indexing mechanisms for business process model graphs. Namely, transitive closure index and incremental path index.

  • We describe two algorithms for efficient processing of BPMN-Q queries in both of its two phases for the query evaluation process: the filtration phase and the verification phase. To the best of our knowledge, all relevant graph indexing techniques are able to only utilize their indexing structures at the filtering phase.

  • We present a new algorithm to evaluate path edges in a BPMN-Q query based on the available information of the transitive closure index.

  • We present a novel mechanism for incrementally building the path index according to the queries that are posed against the process model repository. The main aim of this incremental building mechanism is to avoid the overhead of the huge computation costs of unnecessary paths that are rarely or never used in the querying workloads.

  • We conduct an extensive set of experiments to evaluate the efficiency and performance of our new algorithms using a data set of 1256 real business processes using a set of 10 BPMN-Q queries with different complexities.

The remainder of this article is organized as follows. Section 3 describes BPMN-Q constructs as well as the query processing. The major contribution of the paper comes in Section 4 where we describe how to index process models based on their transitive closure and paths and how an efficient query evaluation can be achieved using these indexes. The evaluation of our approach is presented in Section 5. Related work is discussed in Section 2 showing the limitations of the current graph indexing approaches for BPMN-Q query processing before we conclude the article in Section 6.

Section snippets

Related work

Process models represent the blueprint for subsequent executable processes. Designing new process models is a complex and time consuming task. This task could be accelerated if process modelers can lookup repositories of process models for similar situations to the one in hand. Moreover, a process models residing in the repository might be large in size. It might be difficult for the process modeler to manually traverse it and check whether it matches what is in his mind. There are different

BPMN-Q: a visual language for querying business processes

BPMN-Q [2], [26] is a visual language which is based on the concrete syntax of BPMN (Business Process Modeling Notation) [1], i.e., the visual notation. However, the language is designed to query core concepts of business process models, as shown in Definition 1. Thus, it is possible to use a concrete syntax based on EPCs and then the language can be EPC-Q, etc.

BPMN-Q is used to query business process diagrams by matching a business process diagram to a query graph. BPMN-Q is designed to access

An efficient query processor

In general, it is clearly an inefficient and a very time-consuming approach to perform a sequential scan over the entire process model repository R to check whether each process model M belongs to the answer set of a BPMN-Q query q. Most of the proposed graph query processing techniques rely on indexing the graph repository using metrics and features from the set of graphs (e.g. Paths [14], Trees [31], [33], Sub-graphs [10], [15], [29]) which is then followed by the step of using the index

Evaluation

We start this section by describing the architecture of BPMN-Q and its implementation. Next, we evaluate our approach by conducting a comprehensive set of experiments for querying business process models. The experiments of this section have the following goals:

  • 1.

    Demonstrating the expressive power of our framework in dealing with different use cases.

  • 2.

    Demonstrating the efficiency of our query engine and the effectiveness of the various optimization techniques.

Conclusions

In this paper we have proposed indexing approaches for business process models for the sake of efficient processing of BPMN-Q queries. Two indexes were developed, the closure and path indexes. The two indexes helped to prune candidates at the filtration and verification phase of query processing. By filtering processes not just by node labels and sequence flow edges but also by path and negative path edges, a smaller set of candidates can be achieved. Even if it is not possible to eliminate

Dr. Ahmed Awad is an assistant professor at the Information Systems department, Faculty of Computers and Information, Cairo University, Egypt. He received his Ph.D. in 2010 from Potsdam University, Germany. His research interest is in compliance management of business processes and in querying repositories of business processes to support reuse based modeling of business processes.

References (37)

  • A. Awad et al.

    Visually specifying compliance rules and explaining their violations for business processes

    Journal of Visual Languages and Computing

    (2011)
  • R. Laue et al.

    Visual suggestions for improvements in business process diagrams

    Journal of Visual Languages and Computing

    (2011)
  • Business Process Modeling Notation 1.2 (BPMN 1.2) Specification, Final Adopted Specification. Technical report, OMG,...
  • A. Awad

    BPMN-Q: a language to query business processes

  • A. Awad et al.

    Efficient compliance checking using BPMN-Q and temporal logic

  • A. Awad et al.

    Semantic querying of business process models

  • A. Awad et al.

    Querying graph-based repositories of business process models

  • A. Awad et al.

    Design by selection: a reuse-based approach for business process modeling

  • C. Beeri et al.

    Querying business processes with BP-QL

  • C. Beeri et al.

    Querying business processes

  • J. Cheng et al.

    FG-Index: towards verification-free query processing on graph databases

  • G. Decker et al.

    Oryx – sharing conceptual models on the web

  • E. Rik et al.

    Structural matching of BPEL processes

  • C. Di Francescomarino et al.

    Crosscutting concern documentation by visual query of business processes

  • R. Giugno et al.

    GraphGrep: a fast and universal method for querying graphs

  • H. Jiang et al.

    GString: a novel approach for efficient search in graph databases

  • T. Jin et al.

    Efficient and accurate retrieval of business process models through indexing

  • T. Jin et al.

    Efficient and accurate retrieval of business process models through indexing – (short paper)

  • Cited by (0)

    Dr. Ahmed Awad is an assistant professor at the Information Systems department, Faculty of Computers and Information, Cairo University, Egypt. He received his Ph.D. in 2010 from Potsdam University, Germany. His research interest is in compliance management of business processes and in querying repositories of business processes to support reuse based modeling of business processes.

    Sherif Sakr is a Research Scientist in the Software Systems Research Group at National ICT Australia (NICTA). He is also a Conjoint Lecturer in the School of Computer Science and Engineering (CSE) at the University of New South Wales (UNSW), Australia. He held a Visiting Scientist position at Microsoft Research, Redmond. Dr. Sakr received his Ph.D. degree in Computer Science from Konstanz University, Germany in 2007. His research interests is data and information management in general, particularly in areas of indexing techniques, query processing and optimization techniques, graph data management and the large scale data management in cloud platforms..

    View full text