Skip to main content

2017 | Buch

Advanced Information Systems Engineering

29th International Conference, CAiSE 2017, Essen, Germany, June 12-16, 2017, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 29th International Conference on Advanced Information Systems Engineering, CAiSE 2017, held in Essen, Germany, in June 2017.

The 37 papers presented together with 3 keynote papers in this volume were carefully reviewed and selected from 175 submissions. The papers are organized in topical sections on information systems architecture; business process alignment; user knowledge discovery; business process performance; big data exploration; process variability management; information systems transformation and evolution; business process modeling readability; business process adaption; data mining; process discovery; business process modeling notation.

Inhaltsverzeichnis

Frontmatter

Keynotes

Frontmatter
Digital Transformation at thyssenkrupp: Challenges, Strategies and Examples

The digital transformation is changing the world in a continuously accelerating pace. Traditional industrial companies have a good chance to be the winner of the digital transformation. They can create additional value to their customer by optimizing and extending their current business and by creating new business models offering smart services.The paper describes thyssenkrupp’s strategy for the digital transformation illustrated by real examples.

Reinhold Achatz
Information Systems for Retail Companies
Challenges in the Era of Digitization

Worldwide the retail sector is driven by a strong intra-competition of existing retailers and an inter-competition between traditional and new pure digital players. The challenges for retail companies can be differentiated into a business and an application system (architecture) perspective.Based on a domain-oriented architecture that covers all steps of value creation through to the customer, the potential influence of digitization on the tasks of Retail Information Systems are examined from five different perspectives. The domain perspective is divided into five levels: master data, technical processes, value based processes, administrative processes and decision oriented tasks.The technical challenges of application systems are not least characterized by the complexity of such architectures. The traditional mass data problem in retail is increasing in times of big data and several different omni-channel-scenarios. This leads towards really large enterprise systems, which require an understanding of the main challenges in the future. So, that the IT manager can gain and keep the flexibility and the software maintenance of applications (and the application architecture).

Reinhard Schütte

Information Systems Architecture

Frontmatter
Understanding the Blockchain Using Enterprise Ontology

Blockchain technology is regarded as highly disruptive, but there is a lack of formalization and standardization of terminology. Not only because there are several (sometimes propriety) implementation platforms, but also because the academic literature so far is predominantly written from either a purely technical or an economic application perspective. The result of the confusion is an offspring of blockchain solutions, types, roadmaps and interpretations. For blockchain to be accepted as a technology standard in established industries, it is pivotal that ordinary internet users and business executives have a basic yet fundamental understanding of the workings and impact of blockchain. This conceptual paper provides a theoretical contribution and guidance on what blockchain actually is by taking an ontological approach. Enterprise Ontology is used to make a clear distinction between the datalogical, infological and essential level of blockchain transactions and smart contracts.

Joost de Kruijff, Hans Weigand
Accommodating Openness Requirements in Software Platforms: A Goal-Oriented Approach

Open innovation is becoming an important strategy in software development. Following this strategy, software companies are increasingly opening up their platforms to third-party products. However, opening up software platforms to third-party applications raises serious concerns about critical quality requirements, such as security, performance, privacy and proprietary ownership. Adopting appropriate openness design strategies, which fulfill open-innovation objectives while maintaining quality requirements, calls for deliberate analysis of openness requirements from early on in opening up software platforms. We propose to treat openness as a distinct class of non-functional requirements, and to refine and analyze it in parallel with other design concerns using a goal-oriented approach. We extend the Non-Functional Requirements (NFR) analysis method with a new set of catalogues for specifying and refining openness requirements in software platforms. We apply our approach to revisit the design of data provision service in two real-world open software platforms and discuss the results.

Mahsa H. Sadi, Eric Yu
Development of Mobile Data Collection Applications by Domain Experts: Experimental Results from a Usability Study

Despite their drawbacks, paper-based questionnaires are still used to collect data in many application domains. In the QuestionSys project, we develop an advanced framework that enables domain experts to transform paper-based instruments to mobile data collection applications, which then run on smart mobile devices. The framework empowers domain experts to develop robust mobile data collection applications on their own without the need to involve programmers. To realize this vision, a configurator component applying a model-driven approach is developed. As this component shall relieve domain experts from technical issues, it has to be proven that domain experts are actually able to use the configurator properly. The experiment presented in this paper investigates the mental efforts for creating such data collection applications by comparing novices and experts. Results reveal that even novices are able to model instruments with an acceptable number of errors. Altogether, the QuestionSys framework empowers domain experts to develop sophisticated mobile data collection applications by orders of magnitude faster compared to current mobile application development practices.

Johannes Schobel, Rüdiger Pryss, Winfried Schlee, Thomas Probst, Dominic Gebhardt, Marc Schickler, Manfred Reichert

Business Process Alignment

Frontmatter
Checking Process Compliance on the Basis of Uncertain Event-to-Activity Mappings

A crucial requirement for compliance checking techniques is that observed behavior, captured in event traces, can be mapped to the process models that specify allowed behavior. Without a mapping, it is not possible to determine if observed behavior is compliant or not. A considerable problem in this regard is that establishing a mapping between events and process model activities is an inherently uncertain task. Since the use of a particular mapping directly influences the compliance of a trace to a specification, this uncertainty represents a major issue for compliance checking. To overcome this issue, we introduce a probabilistic compliance checking method that can deal with uncertain mappings. Our method avoids the need to select a single mapping, but rather works on a spectrum of possible mappings. A quantitative evaluation demonstrates that our method can be applied on a considerable number of real-world processes where traditional compliance checking methods fail.

Han van der Aa, Henrik Leopold, Hajo A. Reijers
Aligning Modeled and Observed Behavior: A Compromise Between Computation Complexity and Quality

Certifying that a process model is aligned with the real process executions is perhaps the most desired feature a process model may have: aligned process models are crucial for organizations, since strategic decisions can be made easier on models instead of on plain data. In spite of its importance, the current algorithmic support for computing alignments is limited: either techniques that explicitly explore the model behavior (which may be worst-case exponential with respect to the model size), or heuristic approaches that cannot guarantee a solution, are the only alternatives. In this paper we propose a solution that sits right in the middle in the complexity spectrum of alignment techniques; it can always guarantee a solution, whose quality depends on the exploration depth used and local decisions taken at each step. We use linear algebraic techniques in combination with an iterative search which focuses on progressing towards a solution. The experiments show a clear reduction in the time required for reaching a solution, without sacrificing significantly the quality of the alignment obtained.

Boudewijn van Dongen, Josep Carmona, Thomas Chatain, Farbod Taymouri
Multi-party Business Process Resilience By-Design: A Data-Centric Perspective

Nowadays every business organization operates in ecosystems and cooperation is mandatory. If, on the one side, this increases the opportunities for the involved organizations, on the other side, every actor is a potential source of failures with impacts on the entire ecosystem. For this reason, resilience is a feature that multi-party business processes today must enforce. As resilience concerns the ability to cope with unplanned situations, managing the critical issues is usually a run-time task.The aim of this work is to emphasize awareness on resilience in multi-party business processes also at design-time, when a proper analysis of involved data allows the process designer to identify (possible) failures, their impact, and thus improving the process model. Using a data-centric collaboration-oriented language for processes, i.e., OMG CMMN – Case Management Model and Notation, as modeling notation, our approach allows the designer to model a flexible business process that, at run-time, results easier to manage in case of failures.

Pierluigi Plebani, Andrea Marrella, Massimo Mecella, Marouan Mizmizi, Barbara Pernici

User Knowledge Discovery

Frontmatter
Identifying Domains and Concepts in Short Texts via Partial Taxonomy and Unlabeled Data

Accurate and real-time identification of domains and concepts discussed in microblogging texts is crucial for many important applications such as earthquake monitoring, influenza surveillance and disaster management. Existing techniques such as machine learning and keyword generation are application specific and require significant amount of training in order to achieve high accuracy. In this paper, we propose to use a multiple domain taxonomy (MDT) to capture general user knowledge. We formally define the problems of domain classification and concept tagging. Using the MDT, we devise domain-independent pure frequency count methods that do not require any training data nor annotations and that are not sensitive to misspellings or shortened word forms. Our extensive experimental analysis on real Twitter data shows that both methods have significantly better identification accuracy with low runtime than existing methods for large datasets.

Yihong Zhang, Claudia Szabo, Quan Z. Sheng, Wei Emma Zhang, Yongrui Qin
User Interests Clustering in Business Intelligence Interactions

It is quite common these days for experts, casual analysts, executives or data enthusiasts, to analyze large datasets using user-friendly interfaces on top of Business Intelligence (BI) systems. However, current BI systems do not adequately detect and characterize user interests, which may lead to tedious and unproductive interactions. In this paper, we propose to identify such user interests by characterizing the intent of the interaction with the BI system. With an eye on user modeling for proactive search systems, we identify a set of features for an adequate description of intents, and a similarity measure for grouping intents into coherent interests. We validate experimentally our approach with a user study, where we analyze traces of BI navigation. We show that our similarity measure outperforms a state-of-the-art query similarity measure and yields a very good precision with respect to expressed user interests.

Krista Drushku, Julien Aligon, Nicolas Labroche, Patrick Marcel, Veronika Peralta, Bruno Dumant
Analysis of Online Discussions in Support of Requirements Discovery

Feedback about software applications and services that end-users express through web-based communication platforms represents an invaluable knowledge source for diverse software engineering tasks, including requirements elicitation. Research work on automated analysis of textual messages in app store reviews, open source software (OSS) mailing-lists and user forums has been rapidly increasing in the last five years. NLP techniques are applied to filter out irrelevant data, text mining and automated classification techniques are then used to classify messages into different categories, such as bug report and feature request. Our research focuses on online discussions that take place in user forums and OSS mailing-lists, and aims at providing automated analysis techniques to discover contained requirements. In this paper, we present a speech-acts based analysis technique, and experimentally evaluate it on a dataset taken from a widely used OSS project.

Itzel Morales-Ramirez, Fitsum Meshesha Kifetew, Anna Perini

Business Process Performance

Frontmatter
Discovering Causal Factors Explaining Business Process Performance Variation

Business process performance may be affected by a range of factors, such as the volume and characteristics of ongoing cases or the performance and availability of individual resources. Event logs collected by modern information systems provide a wealth of data about the execution of business processes. However, extracting root causes for performance issues from these event logs is a major challenge. Processes may change continuously due to internal and external factors. Moreover, there may be many resources and case attributes influencing performance. This paper introduces a novel approach based on time series analysis to detect cause-effect relations between a range of business process characteristics and process performance indicators. The scalability and practical relevance of the approach has been validated by a case study involving a real-life insurance claims handling process.

Bart F. A. Hompes, Abderrahmane Maaradji, Marcello La Rosa, Marlon Dumas, Joos C. A. M. Buijs, Wil M. P. van der Aalst
Enriching Decision Making with Data-Based Thresholds of Process-Related KPIs

The continuous performance improvement of business processes usually involves the definition of a set of process performance indicators (PPIs) with their target values. These PPIs can be classified into lag PPIs, which establish a goal that the organization is trying to achieve, though are not directly influenceable by process performers, and lead PPIs, which are influenceable by process performers and have a predictable impact on the lag indicator. Determining thresholds for lead PPIs that enable the fulfillment of the related lag PPI is a key task, which is usually done based on the experience and intuition of the process owners. However, the amount and nature of currently available data make it possible for data-driven decisions to be made in this regard. This paper proposes a method that applies statistical techniques for thresholds determination successfully employed in other domains. Its applicability has been evaluated in a real case study, where data from more than a thousand process executions was used.

Adela del-Río-Ortega, Félix García, Manuel Resinas, Elmar Weber, Francisco Ruiz, Antonio Ruiz-Cortés
Characterizing Drift from Event Streams of Business Processes

Early detection of business process drifts from event logs enables analysts to identify changes that may negatively affect process performance. However, detecting a process drift without characterizing its nature is not enough to support analysts in understanding and rectifying process performance issues. We propose a method to characterize process drifts from event streams, in terms of the behavioral relations that are modified by the drift. The method builds upon a technique for online drift detection, and relies on a statistical test to select the behavioral relations extracted from the stream that have the highest explanatory power. The selected relations are then mapped to typical change patterns to explain the detected drifts. An extensive evaluation on synthetic and real-life logs shows that our method is fast and accurate in characterizing process drifts, and performs significantly better than alternative techniques.

Alireza Ostovar, Abderrahmane Maaradji, Marcello La Rosa, Arthur H. M. ter Hofstede

Big Data Exploration

Frontmatter
Massively Distributed Environments and Closed Itemset Mining: The DCIM Approach

Data analytics in general, and data mining primitives in particular, are a major source of bottlenecks in the operation of information systems. This is mainly due to their high complexity and intensive call to IO operations, particularly in massively distributed environments. Moreover, an important application of data analytics is to discover key insights from the running traces of information system in order to improve their engineering. Mining closed frequent itemsets (CFI) is one of these data mining techniques, associated with great challenges. It allows discovering itemsets with better efficiency and result compactness. However, discovering such itemsets in massively distributed data poses a number of issues that are not addressed by traditional methods. One solution for dealing with such characteristics is to take advantage of parallel frameworks like, e.g., MapReduce. We address the problem of distributed CFI mining by introducing a new parallel algorithm, called DCIM, which uses a prime number based approach. A key feature of DCIM is the deep combination of data mining properties with the principles of massive data distribution. We carried out exhaustive experiments over real world datasets to illustrate the efficiency of DCIM for large real world datasets with up to 53 million documents.

Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia, Florent Masseglia
Uncovering the Runtime Enterprise Architecture of a Large Distributed Organisation
A Process Mining-Oriented Approach

Process mining mainly focuses on analyzing a single process that runs through an organization. Often organisations consist of multiple departments that need to work together to deliver a process. ArchiMate introduced the Business Process Cooperation Viewpoint for this. However, such models tend to focus on modeling design time, and not the runtime behavior. Additionally, many approaches exist to analyze multiple departments in isolation, or the social network they form, but the cooperation between processes received little attention.In this paper we take a different approach by analyzing the runtime execution data to create a new visualization technique to uncover cooperation between departments by means of the Runtime Enterprise Architecture using process mining techniques. By means of a real-life case study at a large logistic organization, we apply the presented approach.

Robert van Langerak, Jan Martijn E. M. van der Werf, Sjaak Brinkkemper
Summarisation and Relevance Evaluation Techniques for Big Data Exploration: The Smart Factory Case Study

The increasing connections of systems that produce high volumes of real time data have raised the importance of addressing data abundance research challenges. In the Industry 4.0 application domain, for example, high volumes and velocity of data collected from machines, as well as value of data that declines very quickly, put Big Data issues among the new challenges also for the factory of the future. While many approaches have been developed to investigate data analysis, data visualisation, data collection and management, the impact of Big Data exploration is still under-estimated. In this paper, we propose an approach to support and ease exploration of real time data in a dynamic context of interconnected systems, such as the Industry 4.0 domain, where large amounts of data must be incrementally collected, organized and analysed on-the-fly. The approach relies on: (i) a multi-dimensional model, that is suited for supporting the iterative and multi-step exploration of Big Data; (ii) novel data summarisation techniques, based on clustering; (iii) a model of relevance, aimed at focusing the attention of the user only on relevant data that are being explored. We describe the application of the approach in the smart factory as a case study.

Ada Bagozi, Devis Bianchini, Valeria De Antonellis, Alessandro Marini, Davide Ragazzi

Process Variability Management

Frontmatter
Instance-Based Process Matching Using Event-Log Information

Process model matching provides the basis for many process analysis techniques such as inconsistency detection and process querying. The matching task refers to the automatic identification of correspondences between activities in two process models. Numerous techniques have been developed for this purpose, all share a focus on process-level information. In this paper we introduce instance-based process matching, which specifically focuses on information related to instances of a process. In particular, we introduce six similarity metrics that each use a different type of instance information stored in the event logs associated with processes. The proposed metrics can be used as standalone matching techniques or to complement existing process model matching techniques. A quantitative evaluation on real-world data demonstrates that the use of information from event logs is essential in identifying a considerable amount of correspondences.

Han van der Aa, Avigdor Gal, Henrik Leopold, Hajo A. Reijers, Tomer Sagi, Roee Shraga
Analyzing Process Variants to Understand Differences in Key Performance Indices

Service delivery organizations cater similar processes across several clients. Process variants may manifest due to the differences in the nature of clients, heterogeneity in the type of cases, etc. The organization’s operational Key Performance Indices (KPIs) across these variants may vary, e.g., KPIs for some variants may be better than others. There is a need to gain insights for such variance in performance and seek opportunities to learn from well performing process variants (e.g., to establish best practices and standardization of processes) and leverage these learnings/insights on non-performing ones. In this paper, we present an approach to analyze two or more process variants, presented as annotated process maps. Our approach identifies and reasons the key differences, manifested in both the control-flow (e.g., frequent paths) and performance (e.g., flow time, activity execution times, etc.) perspectives, among these variants. The fragments within process variants where the key differences manifest are targets for process redesign and re-engineering. The proposed approach has been implemented as a plug-in in the process mining framework, ProM, and applied on real-life case studies.

Nithish Pai Ballambettu, Mahima Agumbe Suresh, R. P. Jagadeesh Chandra Bose
Discovering Hierarchical Consolidated Models from Process Families

Process families consist of different related variants that represent the same process. This might include, for example, processes executed similarly by different organizations or different versions of a same process with varying features. Motivated by the need to manage variability in process families, recent advances in process mining make it possible to discover, from a collection of event logs, a generic process model that explicitly describes the commonalities and differences across variants. However, existing approaches often result in flat complex models where it is hard to obtain a comparative insight into the common and different parts, especially when the family consists of a large number of process variants. This paper presents a decomposition-driven approach to discover hierarchical consolidated process models from collections of event logs. The discovered hierarchy consists of nested process fragments and allows to browse the variability at different levels of abstraction. The approach has been implemented as a plugin in ProM and was evaluated using synthetic and real-life event logs.

Nour Assy, Boudewijn F. van Dongen, Wil M. P. van der Aalst

Information Systems Transformation and Evolution

Frontmatter
Survival in Schema Evolution: Putting the Lives of Survivor and Dead Tables in Counterpoint

How can we plan development over an evolving schema? In this paper, we study the history of the schema of eight open source software projects that include relational databases and extract patterns related to the survival or death of their tables. Our findings are mostly summarized by a pattern, which we call “electrolysis pattern” due to its diagrammatic representation, stating that dead and survivor tables live quite different lives: tables typically die shortly after birth, with short durations and mostly no updates, whereas survivors mostly live quiet lives with few updates – except for a small group of tables with high update ratios that are characterized by high durations and survival. Based on our findings, we recommend that development over newborn tables should be restrained, and wherever possible, encapsulated by views to buffer both infant mortality and high update rate of hyperactive tables. Once a table matures, developers can rely on a typical pattern of gravitation to rigidity, providing less disturbances due to evolution to the surrounding code.

Panos Vassiliadis, Apostolos V. Zarras
On the Similarity of Process Change Operations

Process flexibility is a vital part for almost any business area. Change logs are a central asset for documenting adaptations in processes, since they capture key information about associated change operations. Comparing multiple change operations offers interesting data for many analysis questions, e.g., for analyzing previously applied change operations and for supporting users in future adaptions. In this paper, we discuss different change perspectives and present metrics for comparing change operations. Their applicability and feasibility are evaluated based on a prototypical implementation and based on real world process logs.

Georg Kaes, Stefanie Rinderle-Ma
Agile Transformation Success Factors: A Practitioner’s Survey

An agile transformation process presents challenges to organizations around the world. Research on agile success factors is not conclusive and there is still need for guidelines to help in the transformation process considering the organizational context. This research proposes a survey among practitioners to identify the difficulty to implement success factors in organizations to create a fertile environment for agile transformation. We conducted a survey with 457 practitioners resulting in 328 valid responses. The findings show that the success factors implementation difficulty rankings generated for all practitioners and for expert practitioners have a high correlation. According to expert practitioners, measurement model and changes in mindset of project managers are the hardest success factors to implement while incentives and motivation to adopt agile methods and management buy-in are the easiest to implement. The contribution of this research is a ranking organizations can use as a reference for their agile transformation processes.

Amadeu Silveira Campanelli, Dairton Bassi, Fernando Silva Parreiras
Crossing the Boundaries – Agile Methods in Large-Scale, Plan-Driven Organizations: A Case Study from the Financial Services Industry

Selecting the software development methodology best-suited for a project or organization is a fundamental decision in the context of Information Systems (IS) engineering. In many industries and organizations, agile software development models are already well-established and commonly used for this purpose. However, large-scale, plan-driven organizations face additional challenges when implementing agile methods. To analyze how such organizations could make the implementation more effective, the results of a qualitative case study performed in a large-scale financial institution are presented in this paper. Based on these results, a best-practice model for their effective implementation in a complex environment is proposed. An organization-specific agile development framework and continuous stakeholder involvement are identified as crucial success factors. In addition, a successful implementation of agile methods in practice needs to be performed by dedicated individuals and cross-functional teams should be established in order to support a common understanding across organizational boundaries.

Sina Katharina Weiss, Philipp Brune

Business Process Modeling Readability

Frontmatter
Structural Descriptions of Process Models Based on Goal-Oriented Unfolding

Business processes are normally managed by designing, operating and analysing corresponding process models. While delivering these process models, an understanding gap arises depending on the degree of different users’ familiarity with modeling languages, which may slow down or even stop the normal functioning of processes. Therefore, a method for automatically generating texts from process models was proposed. However, the current method just involves ordinary model patterns so that the coverage of the generated text is too low and information loss exists. In this paper, we propose an improved transformation algorithm named Goun to tackle this problem of describing the process models automatically. The experimental results demonstrate that the Goun algorithm not only supports more elements and complex structures, but also remarkably improves the coverage of generated text.

Chen Qian, Lijie Wen, Jianmin Wang, Akhil Kumar, Haoran Li
Aligning Textual and Graphical Descriptions of Processes Through ILP Techniques

With the aim of having individuals from different backgrounds and expertise levels examine the operations in an organization, different representations of business processes are maintained. To have these different representations aligned is not only a desired feature, but also a real challenge due to the contrasting nature of each process representation. In this paper we present an efficient technique for aligning a textual description and a graphical model of a process. The technique is grounded on using natural language processing techniques to extract linguistic features of each representation, and encode the search as a mathematical optimization encoded using Integer Linear Programming (ILP) whose resolution ensures an optimal alignment between both descriptions. The technique has been implemented and the experiments witness the significance of the approach with respect to the state-of-the-art technique for the same task.

Josep Sànchez-Ferreres, Josep Carmona, Lluís Padró
Use Cases for Understanding Business Process Models

Process models are used by people for many different purposes. Depending on that purpose, users may look into process models in different ways. However, the current stream of research into process model comprehension does not explicitly consider the type of information that a user is seeking for. By failing to do so, attempts to improve the readability of process models may be lopsided at best. To overcome this situation, we propose a list of 17 so-called process model comprehension use cases. These capture the different types of information-seeking behavior of the users. We validated the list through interview and focus group studies, which included 24 participants from 8 organizations. Based on our findings, we present implications for researchers to re-investigate the comprehension topic. The use cases may also be beneficial for the development of modeling tools and process modelers to better support the user needs.

Banu Aysolmaz, Hajo A. Reijers

Business Process Adaptation

Frontmatter

Open Access

Predictive Business Process Monitoring Considering Reliability Estimates

Predictive business process monitoring aims at predicting potential problems during process execution so that these problems can be proactively managed and mitigated. Compared to aggregate prediction accuracy indicators (e.g., precision or recall), prediction reliability estimates provide additional information about the prediction error for an individual business process. Intuitively, it appears appealing to consider reliability estimates when deciding on whether to adapt a running process instance or not. However, we lack empirical evidence to support this intuition, as research on predictive business process monitoring focused on aggregate prediction accuracy. We experimentally analyze the effect of considering prediction reliability estimates for proactive business process adaptation. We use ensemble prediction techniques, which we apply to an industry data set from the transport and logistics domain. In our experiments, proactive business process adaptation in general had a positive effect on cost in 52.5% of the situations. In 82.9% of these situations, considering reliability estimates increased the positive effect, leading to cost savings of up to 54%, with 14% savings on average.

Andreas Metzger, Felix Föcker
Leveraging Game-Tree Search for Robust Process Enactment

A robust machinery for process enactment should ideally be able to anticipate and account for possible ways in which the execution environment might impede a process from achieving its desired effects or outcomes. At critical decision points in a process, it is useful for the enactment machinery to compute alternative flows by viewing the problem as an adversarial game pitting the process (or its enactment machinery) against the process execution environment. We show how both minimax search and Monte Carlo game tree search, coupled with a novel conception of an evaluation function, delivers useful results.

Yingzhi Gou, Aditya Ghose, Hoa Khanh Dam
Predictive Business Process Monitoring with LSTM Neural Networks

Predictive business process monitoring methods exploit logs of completed cases of a process in order to make predictions about running cases thereof. Existing methods in this space are tailor-made for specific prediction tasks. Moreover, their relative accuracy is highly sensitive to the dataset at hand, thus requiring users to engage in trial-and-error and tuning when applying them in a specific setting. This paper investigates Long Short-Term Memory (LSTM) neural networks as an approach to build consistently accurate models for a wide range of predictive process monitoring tasks. First, we show that LSTMs outperform existing techniques to predict the next event of a running case and its timestamp. Next, we show how to use models for predicting the next task in order to predict the full continuation of a running case. Finally, we apply the same approach to predict the remaining time, and show that this approach outperforms existing tailor-made methods.

Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas

Data Mining

Frontmatter
Searching Linked Data with a Twist of Serendipity

Serendipity is defined as the discovery of a thing when one is not searching for it. In other words, serendipity means the discovery of information that provides valuable insights by unveiling previously unknown knowledge. This paper focuses on the problem of Linked Data serendipitous search. It first discusses how to capture a set of serendipity patterns in the context of Linked Data. Then, the paper introduces a Linked Data serendipitous search application, called the Serendipity Over Linked Data Search tool – SOL-Tool. Finally, the paper describes experiments with the tool to illustrate the serendipity effect using DBpedia. The experimental results present a promissory score of 90% of unexpectedness for real-world scenarios in the music domain.

Jeronimo S. A. Eichler, Marco A. Casanova, Antonio L. Furtado, Lívia Ruback, Luiz André P. Paes Leme, Giseli Rabello Lopes, Bernardo Pereira Nunes, Alessandra Raffaetà, Chiara Renso
Extraction of Embedded Queries via Static Analysis of Host Code

Correctly identifying the embedded queries within the source code of an information system is a significant aid to developers and administrators, as it can facilitate the visualization of a map of the information system, the identification of areas affected by schema evolution, code migration, and the planning of the joint maintenance of code and data. In this paper, we provide a solution to the problem of identifying the location and semantics of embedded queries with a generic, language-independent method that identifies the embedded queries of a data-intensive ecosystem, regardless of the programming style and the host language, and represents them in a universal, also language-independent manner that facilitates the aforementioned maintenance, evolution and migration tasks with minimal user effort and significant effectiveness.

Petros Manousis, Apostolos Zarras, Panos Vassiliadis, George Papastefanatos
Table Identification and Reconstruction in Spreadsheets

Spreadsheets are one of the most successful content generation tools, used in almost every enterprise to perform data transformation, visualization, and analysis. The high degree of freedom provided by these tools results in very complex sheets, intermingling the actual data with formatting, formulas, layout artifacts, and textual metadata. To unlock the wealth of data contained in spreadsheets, a human analyst will often have to understand and transform the data manually. To overcome this cumbersome process, we propose a framework that is able to automatically infer the structure and extract the data from these documents in a canonical form. In this paper, we describe our heuristics-based method for discovering tables in spreadsheets, given that each cell is classified as either header, attribute, metadata, data, or derived. Experimental results on a real-world dataset of 439 worksheets (858 tables) show that our approach is feasible and effectively identifies tables within partially structured spreadsheets.

Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner

Process Discovery

Frontmatter
Data-Driven Process Discovery - Revealing Conditional Infrequent Behavior from Event Logs

Process discovery methods automatically infer process models from event logs. Often, event logs contain so-called noise, e.g., infrequent outliers or recording errors, which obscure the main behavior of the process. Existing methods filter this noise based on the frequency of event labels: infrequent paths and activities are excluded. However, infrequent behavior may reveal important insights into the process. Thus, not all infrequent behavior should be considered as noise. This paper proposes the Data-aware Heuristic Miner (DHM), a process discovery method that uses the data attributes to distinguish infrequent paths from random noise by using classification techniques. Data- and control-flow of the process are discovered together. We show that the DHM is, to some degree, robust against random noise and reveals data-driven decisions, which are filtered by other discovery methods. The DHM has been successfully tested on several real-life event logs, two of which we present in this paper.

Felix Mannhardt, Massimiliano de Leoni, Hajo A. Reijers, Wil M. P. van der Aalst
An Approach for Incorporating Expert Knowledge in Trace Clustering

Trace clustering techniques are a set of approaches for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or done by discovering a process model for each cluster of traces. In general, however, it is likely that clustering solutions obtained by these approaches will be hard to understand or difficult to validate given an expert’s domain knowledge. Therefore, we propose a novel semi-supervised trace clustering technique based on expert knowledge. Our approach is validated using a case in tablet reading behaviour, but widely applicable in other contexts. In an experimental evaluation, the technique is shown to provide a beneficial trade-off between performance and understandability.

Pieter De Koninck, Klaas Nelissen, Bart Baesens, Seppe vanden Broucke, Monique Snoeck, Jochen De Weerdt
Mining Business Process Stages from Event Logs

Process mining is a family of techniques to analyze business processes based on event logs recorded by their supporting information systems. Two recurrent bottlenecks of existing process mining techniques when confronted with real-life event logs are scalability and interpretability of the outputs. A common approach to tackle these limitations is to decompose the process under analysis into a set of stages, such that each stage can be mined separately. However, existing techniques for automated discovery of stages from event logs produce decompositions that are very different from those that domain experts would produce manually. This paper proposes a technique that, given an event log, discovers a stage decomposition that maximizes a measure of modularity borrowed from the field of social network analysis. An empirical evaluation on real-life event logs shows that the produced decompositions more closely approximate manual decompositions than existing techniques.

Hoang Nguyen, Marlon Dumas, Arthur H. M. ter Hofstede, Marcello La Rosa, Fabrizio Maria Maggi

Business Process Modeling Notation

Frontmatter
Visual Modeling of Instance-Spanning Constraints in Process-Aware Information Systems

Instance-Spanning Constraints (ISCs) have raised attention just recently although they are omnipresent in practice to define conditions across multiple instances or processes, e.g., bundling of cargo. It would be crucial to convey ISC information on, e.g., shared instance resources to users. However, no approach for visualizing ISCs has been presented yet. To overcome this gap we analysed literature and derived visualization requirements for constraints on multiple instances of the same or different processes. The proposed language ISC_Viz is based on BPMN-Q and incorporates existing visual notations to reduce the cognitive load on the user. The applicability of ISC_Viz is shown along 114 ISC modeling examples. Moreover, a questionnaire-based study with 42 participants is conducted in order to assess the usability of ISC_Viz.

Manuel Gall, Stefanie Rinderle-Ma
Linking Data and BPMN Processes to Achieve Executable Models

We describe a formally well founded approach to link data and processes conceptually, based on adopting UML class diagrams to represent data, and BPMN to represent the process. The UML class diagram together with a set of additional process variables, called Artifact, form the information model of the process. All activities of the BPMN process refer to such an information model by means of OCL operation contracts. We show that the resulting semantics while abstract is fully executable. We also provide an implementation of the executor.

Giuseppe De Giacomo, Xavier Oriol, Montserrat Estañol, Ernest Teniente
Discovery of Fuzzy DMN Decision Models from Event Logs

Successful business process management is highly dependent on effective decision making. The recent Decision Model and Notation (DMN) standard prescribes decisions to be documented and executed complementary to processes. However, the decision logic is often implicitly contained in event logs, and “as-is” decision knowledge needs to be retrieved. Commonly, decision logic is represented by rules based on Boolean algebra. The formal nature of such decisions is often hard for interpretation and utilization in practice, because imprecision is intrinsic to real-life decisions. Operations research considers fuzzy logic, based on fuzzy algebra, as a tool dealing with partial knowledge. In this paper, we explore the possibility of incorporating fuzziness into DMN decision models. Further, we propose a methodology for discovering fuzzy DMN decision models from event logs. The evaluation of our approach on a use case from the banking domain shows high comprehensibility and accuracy of the output decision model.

Ekaterina Bazhenova, Stephan Haarmann, Sven Ihde, Andreas Solti, Mathias Weske
Backmatter
Metadaten
Titel
Advanced Information Systems Engineering
herausgegeben von
Eric Dubois
Klaus Pohl
Copyright-Jahr
2017
Electronic ISBN
978-3-319-59536-8
Print ISBN
978-3-319-59535-1
DOI
https://doi.org/10.1007/978-3-319-59536-8

Premium Partner