Querying business processes with BP-QL
Introduction
Background: A business process (BP for short) consists of a group of business activities undertaken by one or more organizations in pursuit of some particular goal. It usually depends upon various business functions for support, e.g. personnel, accounting, inventory, and interacts with other BPs/activities carried by the same or other organizations. Consequently, the software implementing such BPs typically operate in a cross-organization, distributed environment.
It is common practice to use XML for data exchange between BPs, and Web services for interaction with remote processes [1]. The recent BPEL standard (business process execution language [2], also identified as BPELWS or BPEL4WS), developed jointly by BEA Systems, IBM, and Microsoft, combines and replaces IBM's webservices flow language (WSFL) [3] and Microsoft's XLANG [4]. It provides an XML-based language to describe not only the interface between the participants in a process, but also the full operational logic of the process and its execution flow.
Commercial vendors offer systems that allow to design BPEL specification via a visual interface, using a conceptual, intuitive view of the process, as a graph of data and activity nodes, connected by control and data flow edges. Designs are automatically converted to BPEL specifications. These can be automatically compiled into executable code that implements the described BP [5].
Motivation: Declarative BPEL specifications greatly simplify the task of software development for BPs. More interestingly from an information management perspective, they also provide an important new mine of information. Consider for instance a user who tries to understand how a particular travel agency operates. She may want to find answers to questions such as: Can I get a price quote without giving first my credit card details? What should one do to confirm a purchase? What kind of credit services are used by the agency, directly or indirectly, (i.e. by the other processes it interacts with)? Obviously, such queries are of great interest to both individual users and to organizations interested in using or analyzing BPs. Answering them is extremely hard (if not impossible) when the BP logic is coded in a complex program. It is potentially much easier given a declarative specification like BPEL. For an organization that has access to its own BPEL specifications, as well to those of cooperating organizations, the ability to answer such queries, in a possibly distributed environment, is of great practical potential.
To support such queries, one needs an adequate query language, and an efficient execution engine for it. To address this need, we present in this paper BP-QL, a new query language which allows for an intuitive formulation of queries on BP specifications, and query execution in a distributed cross-organization environment.
Challenges: We briefly highlight some of the challenges in querying BP specifications in general, and BPEL ones in particular.
- •
Flexible granularity: BP specifications may be abstractly viewed as a set of nested graphs, possibly with recursion: the graphs structure captures the execution flow of the process components; the nesting comes from the fact that the operations/services used in a process are not necessarily atomic and may have a complex internal structure (which may itself be represented by a graph); the recursion is due to the fact that a process may call itself indirectly, through calls it makes to other processes. Users may wish to ask coarse-grain queries that consider certain process components as black boxes and allow for high level abstraction, as well as fine-grained queries that “zoom-in” on all the process components, possibly recursively (e.g. “Which services are called by the travel agency?” vs. Which services are called directly/indirectly?”). An adequate query language must thus allow users to query the processes at different, flexible, granularity levels.
- •
Distribution: As mentioned above, BPs typically operate in a cross-organization, distributed environment, where each server holds a set of BPs and may provide services to other organizations, or use services that reside on remote servers. If a service's internal flow has been defined in BPEL, and the service providers make this specification available to their cooperating organizations (say via a web service), users may wish to zoom-in on these remote components as well to query the service specification.
- •
Paths extraction: When querying BPs, users may be interested in retrieving, as an answer, the qualifying flow paths (as for instance in the query “What should I do to confirm my purchase?”). As the number of relevant paths may be large (or even infinite in recursive processes) a major challenge is to provides the users with a compact finite representation of the (possible infinite) answer.
- •
Ease of querying: As mentioned above, the BPEL standard offers an XML-based language for describing the operational logic of a BP. Since a BPEL specification is essentially an XML document, a natural question is why not query it directly, using XQuery? A key observation is that the BPEL XML format is (1) very complex and (2) was designed with ease of automatic code generation in mind; however, it is extremely inconvenient for querying: to express even a very simple inquiry about a process execution flow, one needs to write a fairly complex XQuery query that performs an excessive number of joins. Furthermore, even if a more query-friendly XML representation for it had been chosen (as indeed is done internally in our implementation), XQuery, as is, would still not be adequate for the task: XQuery only returns document elements, but not paths, it does not support querying at different levels of granularity, and it does not offer tools for controlling distributed querying. Last but not least, querying an XML representation is much more difficult than querying directly a conceptual model. Essentially, ease of querying requires an intuitive, conceptual, data model, coupled with a matching, equally intuitive, query language.
At the core of the BP-QL language are BP patterns that allow users to describe the pattern of activities/data flow that are of interest. BP patterns are similar to the tree- and graph-patterns offered by existing query languages for XML [6] and graph-shaped data [7], [8], [9], but include two novel features designed to address the issues mentioned above. First, BP-QL supports navigation along two axis: (1) the standard path-based axis, that allows to navigate through, and query, paths in process graphs, and (2) a novel zoom-in axis, that allows to navigate (transitively) inside process components (local as well as remote ones) and query them at any depth of nesting. Second, paths are considered first class objects in BP-QL and can be retrieved, and represented compactly, even when involving activities performed on remote servers.
Together, these features allow for simple formulation of queries on BPs. However, they make the evaluation of queries much more intricate than that of traditional XML/graph patterns. Indeed, some queries that can easily be evaluated on flat graphs/trees may become computationally expensive (or even undecidable) when nested graphs are concerned. To keep the evaluation of queries tractable, we had identified these problematic scenarios and carefully designed the language so that they are avoided, and polynomial-time query evaluation is guaranteed. Our analysis is based on modeling systems of processes and queries as graph grammars [10].
Observe that, in general, several modes of querying BPs are possible. One can query the specifications as data (e.g. “does the specification include a path from activity A to activity B”). One can also ask about patterns that may occur when the processes are executed (e.g. “can there be a run of the system where activity A is followed by activity B”). One can also monitor runs as processes execute, or pose queries on logs of past runs.
BP-QL is a query language for process specifications,1 not about their possible runs. This is for two main reasons. First, querying the possible runs of a system is a verification problem [11] and is typically of very high complexity (from NP-hard for very simple specifications to undecidable in the general case [12]). Second, the analysis of runs requires a specification to have a well-defined semantics. Unfortunately, BPEL is not based on a formal model [12]. To avoid these obstacles and guarantee complexity that is polynomial in the size of the data, BP-QL ignores the run-time semantics of certain BPEL constructs such as conditional execution and variable values and focuses on the given specification flow. We believe this approach offers a reasonable balance between expressibility and complexity. Note that querying of specifications in fact “approximates” the querying of runs (e.g. only specifications that contain two given activities may potentially have runs where both occur). Hence, even when full run verification is desired, BP-QL can be used as an efficient means to narrow the search space for the more costly, interpretation dependent, verification. It can also be used to select the process parts to be monitored at run time [13].
The query language presented above has been fully implemented and tested in the BP-QL distributed system. All the examples presented in this paper are screenshots that were taken with our BP-QL visual designer and query tool. A first prototype of BP-QL was demonstrated in [14], where only a very high level view of the language was presented.
Contribution: To summarize, the contributions of the BP-QL system are the following:
- (1)
Query language: The BP-QL query language is a new graphical language that allows for intuitive querying of process specifications, by offering a data model and an interface similar to those used for BPs specification. It allows to retrieve paths, and offers facilities for querying at different levels of granularity, and for controlling distributed querying.
- (2)
Model and algorithms: BP-QL is based on a formal model based on graph grammars [10] for systems of processes, and for our query language on such systems. This model allows to distinguish between query features that can be efficiently supported, and those that incur a prohibitively high cost, or cannot be computed at all. Using this model, we explain how to construct a finite and intuitive representation of the (possibly infinite) answers of queries in time polynomial in the size of the specifications.
- (3)
Implementation: The BP-QL system implementation is based on Active XML (AXML) [15], a special type of XML with embedded calls to Web services. We describe here the system implementation, highlighting the main challenges faced and the solutions taken.
The paper is organized as follows. Section 2 introduces BP-QL informally via a running example. The underlying formal model is presented in Section 3. We present the algorithm for construction of compact results in Section 5, and discuss some extensions to our model in Section 6. The system implementation is described in Section 7. We conclude in Section 8, considering related and future work.
Section snippets
System overview
We present here an informal overview of BP-QL via a running example. To illustrate the features of BP-QL, we will consider a set of BPs used by a consortium offering travel-related services. These include flight and hotel reservation, car rental, credit and accounting services. The processes, and their BPEL specifications, reside and operate on remote servers. The specifications include the interactions between the various processes.
We first show how processes are specified, via the system's
The formal model
In this section we briefly present the formal model underlying the BP-QL query language. We discuss the properties of the various language components and explain how these influenced our system's design. In particular we distinguish features that can be efficiently supported, and those that incur a prohibitively high cost, or cannot be computed at all. To simplify the presentation we first consider a basic data model and query language, then enrich them to obtain the full fledged model.
Issues in answer computation and representation
The result associated with an embedding is, in general, a system of graphs. As there is no limit on the size of refinements, a result may be large, in terms of the sizes of the system and the query. Further, the number of results may be large, or even infinite. The latter may occur when the implementation function is recursive (i.e. cyclic), since then the number of refinements of a system may be infinite. Thus, we face a double challenge: to compute answers efficiently, and to represent them
The answer construction algorithm
This section describes in detail how answers are represented, and the answer construction algorithm, following the ideas presented in the previous section. Following the observation in Section 4.2.1, the construction of answer representations can be structured as follows: First, for each and , construct the representations and for the two sets of BP embeddings and . Then, construct images from these sets to represent , respectively, .
A richer model
The query model discussed in 3 The formal model, 5 The answer construction algorithm was rather simple. This simplicity allowed us to concentrate in Section 5 on the main ideas of how answers of queries are represented. We now present some useful extensions that enhance the expressive power of the language, and facilitate the querying of real life BPs. Some of these were illustrated in Section 2.
Negation: In a query with negation, the patterns have some nodes and edges that are distinguished as
Implementation
The query language presented above has been fully implemented and tested in the BP-QL distributed setting. The system provides persistent storage for BPEL specifications, allows users to design new processes, and to query existing specifications.
The visual interface of the system is implemented as an Eclipse [21] plug-in. It allows to: design new BPs and store their specifications in the repository; import existing BPEL documents to the repository; formulate queries, run them and view the
Related work and conclusion
We presented BP-QL, a novel graphical Query Language for querying BPs. BP-QL allows users to query BPs visually, in a manner very close to how such processes are typically specified, and can be employed in a distributed P2P setting. We described the formal model underlying the BP-QL query language, studied the properties of the language components, and explained how they influenced the language design. We have also described the system implementation, highlighting the main challenges faced and
References (45)
- et al.
Analysis and simulation of web services
Computer Networks
(2003) Statecharts: A visual formalism for complex systems
Sci. Comp. Programming
(1987)The monadic second-order logic of graphs i: Recognizable sets of finite graphs
Inf. Comput.
(1990)- et al.
The language intersection problem for non-recursive context-free grammars
Inf. Comput.
(2004) - The World Wide Web Consortium,...
- Business Process Execution Language for Web Services, 〈...
- F. Leymann, Web Services Flow Language (WSFL) 1.1,〈 http://www-3.ibm.com/software/solutions/webservices/pdf/WSFL.pdf〉...
- XLANG: Web Services for Business Process Design, 〈...
- Oracle BPEL Process Manager 2.0 Quick Start Tutorial, 〈...
- XML Path Language (XPath) Version 1.0,...
A graphical query language supporting recursion
The g/graphlog visual query system
G-log: A graph-based query language
IEEE Trans. Knowl. Data Eng.
Handbook of graph grammars and computing by graph transformation
Analysis of interacting BPEL web services
Business Process Cockpit
Querying Business Processes with BP-QL demo
Querying business processes
Lazy evaluation of active XML queries
Cited by (78)
Natural language querying of process execution data
2023, Information SystemsScenario-based process querying for compliance, reuse, and standardization
2020, Information SystemsCitation Excerpt :In contrast, the proposed process querying approach generates an execution that justifies that the requested property indeed holds in the retrieved model, refer to Section 6.3 for details. BP-QL [91] and BPMN-Q [92] are two typical examples of query languages that can be used to express intents to retrieve process models based on paths and structural patterns in their underlying graphs. These languages can be seen as adaptations of graph query languages to the specifics of graphs that are used to compose process models.
Process querying: Enabling business intelligence through query-based process analytics
2017, Decision Support SystemsMulti-Context Systems for Consistency Validation and Querying of Business Process Models
2017, Procedia Computer Science