Skip to main content
Top

2018 | Book

Advanced Information Systems Engineering

30th International Conference, CAiSE 2018, Tallinn, Estonia, June 11-15, 2018, Proceedings

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 30th International Conference on Advanced Information Systems Engineering, CAiSE 2018, held in Talinn, Estonia, in June 2018.
The 37 papers presented in this volume were carefully reviewed and selected from 175 submissions. The papers are organized in topical sections on Process Execution, User-Oriented IS Development, Social Computing and Personalization, the Cloud and Data Services, Process Discovery, Decisions and the Blockchain, Process and Multi-level Modelling, Data Management and Visualization, Big Data and Intelligence, Data Modelling and Mining, Quality Requirements and Software, and Tutorials.

Table of Contents

Frontmatter

Process Execution

Frontmatter
Association Rules for Anomaly Detection and Root Cause Analysis in Process Executions

Existing business process anomaly detection approaches typically fall short in supporting experts when analyzing identified anomalies. Hereby, false positives and insufficient anomaly countermeasures might impact an organization in a severely negative way. This work tackles this limitation by basing anomaly detection on association rule mining. It will be shown that doing so enables to explain anomalies, support process change and flexible executions, and to facilitate the estimation of anomaly severity. As a consequence, the risk of choosing an inappropriate countermeasure is likely reduced which, for example, helps to avoid the termination of benign process executions due to mistaken anomalies and false positives. The feasibility of the proposed approach is shown based on a publicly available prototypical implementation as well as by analyzing real life logs with injected artificial anomalies.

Kristof Böhmer, Stefanie Rinderle-Ma
AB Testing for Process Versions with Contextual Multi-armed Bandit Algorithms

Business process improvement ideas can be validated through sequential experiment techniques like AB Testing. Such approaches have the inherent risk of exposing customers to an inferior process version, which is why the inferior version should be discarded as quickly as possible. In this paper, we propose a contextual multi-armed bandit algorithm that can observe the performance of process versions and dynamically adjust the routing policy so that the customers are directed to the version that can best serve them. Our algorithm learns the best routing policy in the presence of complications such as multiple process performance indicators, delays in indicator observation, incomplete or partial observations, and contextual factors. We also propose a pluggable architecture that supports such routing algorithms. We evaluate our approach with a case study. Furthermore, we demonstrate that our approach identifies the best routing policy given the process performance and that it scales horizontally.

Suhrid Satyal, Ingo Weber, Hye-young Paik, Claudio Di Ciccio, Jan Mendling
Filtering Spurious Events from Event Streams of Business Processes

Process mining aims at gaining insights into business processes by analysing event data recorded during process execution. The majority of existing process mining techniques works offline, i.e. using static, historical data stored in event logs. Recently, the notion of online process mining has emerged, whereby techniques are applied on live event streams, as process executions unfold. Analysing event streams allows us to gain instant insights into business processes. However, current techniques assume the input stream to be completely free of noise and other anomalous behaviours. Hence, applying these techniques to real data leads to results of inferior quality. In this paper, we propose an event processor that enables us to filter out spurious events from a live event stream. Our experiments show that we are able to effectively filter out spurious events from the input stream and, as such, enhance online process mining results.

Sebastiaan J. van Zelst, Mohammadreza Fani Sani, Alireza Ostovar, Raffaele Conforti, Marcello La Rosa
The Relational Process Structure

Using data-centric process paradigms, small processes such as artifacts, object lifecycles, or Proclets have become an alternative to large, monolithic models. In these paradigms, a business process arises from the interactions between small processes. However, many-to-many relationships may exist between different process types, requiring careful consideration to ensure that the interactions between processes can be purposefully coordinated. Although several concepts exist for modeling interrelated processes, a concept that considers both many-to-many relationships and cardinality constraints is missing. Furthermore, existing concepts focus on design-time, neglecting the complexity introduced by many-to-many relationships when enacting extensive process structures at run-time. The knowledge which process instances are related to which other process instances is essential. This paper proposes the relational process structure, a concept providing full support for many-to-many-relationships and cardinality constraints at both design- and run-time. The relational process structure represents a cornerstone to the proper coordination of interrelated processes.

Sebastian Steinau, Kevin Andrews, Manfred Reichert

User-Oriented IS Development

Frontmatter
Support of Justification Elicitation: Two Industrial Reports

The result of productive processes is commonly accompanied by a set of justifications which can be, depending on the product, process-related qualities, traceability documents, product-related experiments, tests or expert reports, etc. In critical contexts, it is mandatory to substantiate that a product’s development has been carried out appropriately which results in an inflation of the quantity of justification documents. This mass of document and information is difficult to manage and difficult to assess (in terms of soundness). In this paper, we report on the experience gained on two industrial case studies, in which we applied a justification elicitation approach based on justification diagrams and justification pattern diagrams in order to identify necessary and sufficient justification documentation.

Clément Duffau, Thomas Polacsek, Mireille Blay-Fornarino
Configurations of User Involvement and Participation in Relation to Information System Project Success

Information system (IS) project success is crucial given the importance of these projects for many organizations. We examine the role of user involvement and participation (UIP) for IS project success in terms of perceived usability in 16 cases, where an IS has been implemented in an organization. Qualitative Comparative Analysis (QCA) enables us to research multiple IS project configurations. We identify the participation of the appropriate users in the requirements analysis phase as the key condition for IS project success. Our research corroborates anecdotal evidence on key factors and informs practitioners about the most effective way to conduct UIP.

Phillip Haake, Johanna Kaufmann, Marco Baumer, Michael Burgmaier, Kay Eichhorn, Benjamin Mueller, Alexander Maedche
Human and Value Sensitive Aspects of Mobile App Design: A Foucauldian Perspective

Value sensitive concerns remain relatively neglected by software design processes leading to potential failure of technology acceptance. By drawing upon an inter-disciplinary study that employed participatory design methods to develop mobile apps in the domain of youth justice, this paper examines a critical example of an unintended consequence that created user concerns around Focauldian concepts including power, authority, surveillance and governmentality. The primary aim of this study was to design, deploy and evaluate social technology that may help to promote better engagement between case workers and young people to help reduce recidivism, and support young people’s transition towards social inclusion in society. A total of 140 participants including practitioners (n = 79), and young people (n = 61) contributed to the data collection via surveys, focus groups and one-one interviews. The paper contributes an important theoretically located discussion around both how co-design is helpful in giving ‘voice’ to key stakeholders in the research process and observing the risk that competing voices may lead to tensions and unintended outcomes. In doing so, software developers are exposed to theories from social science that have significant impact on their products.

Balbir S. Barn, Ravinder Barn
Context-Aware Access to Heterogeneous Resources Through On-the-Fly Mashups

Current scenarios for app development are characterized by rich resources that often overwhelm the final users, especially in mobile app usage situations. It is therefore important to define design methods that enable dynamic filtering of the pertinent resources and appropriate tailoring of the retrieved content. This paper presents a design framework based on the specification of the possible contexts deemed relevant to a given application domain and on their mapping onto an integrated schema of the resources underlying the app. The context and the integrated schema enable the instantiation at runtime of templates of app pages in function of the context characterizing the user’s current situation of use.

Florian Daniel, Maristella Matera, Elisa Quintarelli, Letizia Tanca, Vittorio Zaccaria

Social Computing and Personalization

Frontmatter
CrowdSheet: An Easy-To-Use One-Stop Tool for Writing and Executing Complex Crowdsourcing

Developing crowdsourcing applications with dataflows among tasks requires requesters to submit tasks to crowdsourcing services, obtain results, write programs to process the results, and often repeat this process. This paper proposes CrowdSheet, an application that provides a spreadsheet interface to easily write and execute such complex crowdsourcing applications. We prove that a natural extension to existing spreadsheets, with only two types of new spreadsheet functions, allows us to write a fairly wide range of real-world applications. Our experimental results indicate that many spreadsheet users can easily write complex crowdsourcing applications with CrowdSheet.

Rikuya Suzuki, Tetsuo Sakaguchi, Masaki Matsubara, Hiroyuki Kitagawa, Atsuyuki Morishima
Educating Users to Formulate Questions in Q&A Platforms: A Scaffold for Google Sheets

Different studies point out that spreadsheets are easy to use but difficult to master. When difficulties arise, home users and small-and-medium organizations might not always resort to a help desk. Alternatively, Question&Answer platforms (e.g. Stack Overflow) come in handy. Unfortunately, we can not always expect home users to properly set good questions. However, examples can be a substitute for long explanations. This is particularly important for our target audience who might lack the skills to describe their needs in abstract terms, and hence, examples might be the easiest way to get the idea through. This sustains the effort to leverage existing spreadsheet tools with example-centric inline question posting. This paper describes such an effort for Google Sheets. The extension assists users in posing their example-based questions without leaving Google Sheets. These questions are next transparently channeled to Stack Overflow.

Oscar Díaz, Jeremías P. Contell
News Recommendation with CF-IDF+

Traditionally, content-based recommendation is performed using term occurrences, which are leveraged in the TF-IDF method. This method is the defacto standard in text mining and information retrieval. Valuable additional information from domain ontologies, however, is not employed by default. The TF-IDF-based CF-IDF method successfully utilizes the semantics of a domain ontology for news recommendation by detecting ontological concepts instead of terms. However, like other semantics-based methods, CF-IDF fails to consider the different concept relationship types. In this paper, we extend CF-IDF to additionally take into account concept relationship types. Evaluation is performed using Ceryx, an extension to the Hermes news personalization framework. Using a custom news data set, our CF-IDF+ news recommender outperforms the CF-IDF and TF-IDF recommenders in terms of $$F_1$$ and Kappa.

Emma de Koning, Frederik Hogenboom, Flavius Frasincar

The Cloud and Data Services

Frontmatter
Model-Driven Elasticity for Cloud Resources

Elasticity is a key distinguishing feature of cloud services. It represents the power to dynamically reconfigure resources to adapt to varying resource requirements. However, the implementation of such feature has reached a level of complexity since various and non standard interfaces are provided to deal with cloud resources. To alleviate this, we believe that elasticity features should be provided at resource description level. In this paper, we propose a Cloud Resource Description Model (cRDM) based on State Machine formalism. This novel abstraction allows representing cloud resources while considering their elasticity behavior over the time. Our prototype implementation shows the feasibly and experiments illustrate the productivity and expressiveness of our cRDM model in comparison to traditional solutions.

Hayet Brabra, Achraf Mtibaa, Walid Gaaloul, Boualem Benatallah
Fog Computing and Data as a Service: A Goal-Based Modeling Approach to Enable Effective Data Movements

Data as a Service (DaaS) organizes the data management life-cycle around the Service Oriented Computing principles. Data providers are supposed to take care not only of performing the life-cycle phases, but also of the data movements from where data are generated, to where they are stored, and, finally, consumed. Data movements become more frequent especially in Fog environments, i.e., where data are generated by devices at the edge of the network (e.g., sensors), processed on the cloud, and consumed at the customer premises.This paper proposes a goal-based modeling approach for enabling effective data movements in Fog environments. The model considers the requirements of several customers to move data at the right time and in the right place, taking into account the heterogeneity of the resources involved in the data management.

Pierluigi Plebani, Mattia Salnitri, Monica Vitali
An Ontology-Based Framework for Describing Discoverable Data Services

Data-services are applications in charge of retrieving certain data when they are called. They are found in different communities such as the Internet Of Things, Cloud Computing, Big Data, etc. So, there is a real need to discover how can an application that requires some data automatically find a data-service which is providing it. To our knowledge, the problem of automatically discovering these data-services is still open. To make a step forward in this direction, we propose an ontology-based framework to address this problem. In our framework, input and output values of the request are mapped into concepts of the domain ontology. Then, data-services specify how to obtain the output from the input by stating the relationship between the mapped concepts of the ontology.

Xavier Oriol, Ernest Teniente

Process Discovery

Frontmatter
How Much Event Data Is Enough? A Statistical Framework for Process Discovery

With the increasing availability of business process related event logs, the scalability of techniques that discover a process model from such logs becomes a performance bottleneck. In particular, exploratory analysis that investigates manifold parameter settings of discovery algorithms, potentially using a software-as-a-service tool, relies on fast response times. However, common approaches for process model discovery always parse and analyse all available event data, whereas a small fraction of a log could have already led to a high-quality model. In this paper, we therefore present a framework for process discovery that relies on statistical pre-processing of an event log and significantly reduce its size by means of sampling. It thereby reduces the runtime and memory footprint of process discovery algorithms, while providing guarantees on the introduced sampling error. Experiments with two public real-world event logs reveal that our approach speeds up state-of-the-art discovery algorithms by a factor of up to 20 .

Martin Bauer, Arik Senderovich, Avigdor Gal, Lars Grunske, Matthias Weidlich
Process Discovery from Low-Level Event Logs

The discovery of a control-flow model for a process is here faced in a challenging scenario where each trace in the given log $$L_E$$ encodes a sequence of low-level events without referring to the process’ activities. To this end, we define a framework for inducing a process model that describes the process’ behavior in terms of both activities and events, in order to effectively support the analysts (who typically would find more convenient to reason at the abstraction level of the activities than at that of low-level events). The proposed framework is based on modeling the generation of $$L_E$$ with a suitable Hidden Markov Model (HMM), from which statistics on precedence relationships between the hidden activities that triggered the events reported in $$L_E$$ are retrieved. These statistics are passed to the well-known Heuristics Miner algorithm, in order to produce a model of the process at the abstraction level of activities. The process model is eventually augmented with probabilistic information on the mapping between activities and events, encoded in the discovered HMM. The framework is formalized and experimentally validated in the case that activities are “atomic” (i.e., an activity instance triggers a unique event), and several variants and extensions (including the case of “composite” activities) are discussed.

Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri
Detection and Interactive Repair of Event Ordering Imperfection in Process Logs

Many forms of data analysis require timestamp information to order the occurrences of events. The process mining discipline uses historical records of process executions, called event logs, to derive insights into business process behaviours and performance. Events in event logs must be ordered, typically achieved using timestamps. The importance of timestamp information means that it needs to be of high quality. To the best of our knowledge, no (semi-)automated support exists for detecting and repairing ordering-related imperfection issues in event logs. We describe a set of timestamp-based indicators for detecting event ordering imperfection issues in a log and our approach to repairing identified issues using domain knowledge. Lastly, we evaluate our approach implemented in the open-source process mining framework, ProM, using two publicly available logs.

Prabhakar M. Dixit, Suriadi Suriadi, Robert Andrews, Moe T. Wynn, Arthur H. M. ter Hofstede, Joos C. A. M. Buijs, Wil M. P. van der Aalst
Fusion-Based Process Discovery

Information systems record the execution of transactions as part of business processes in event logs. Process mining analyses such event logs, e.g., by discovering process models. Recently, various discovery algorithms have been proposed, each with specific advantages and limitations. In this work, we argue that, instead of relying on a single algorithm, the outcomes of different algorithms shall be fused to combine the strengths of individual approaches. We propose a general framework for such fusion and instantiate it with two new discovery algorithms: The Exhaustive Noise-aware Inductive Miner (exNoise), which, exhaustively searches for model improvements; and the Adaptive Noise-aware Inductive Miner (adaNoise), a computationally tractable version of exNoise. For both algorithms, we formally show that they outperform each of the individual mining algorithms used by them. Our empirical evaluation further illustrates that fusion-based discovery yields models of better quality than state-of-the-art approaches.

Yossi Dahari, Avigdor Gal, Arik Senderovich, Matthias Weidlich

Decisions and the Blockchain

Frontmatter
On the Relationships Between Decision Management and Performance Measurement

Decision management is of utmost importance for the achievement of strategic and operational goals in any organisational context. Therefore, decisions should be considered as first-class citizens that need to be modelled, analysed, monitored to track their performance, and redesigned if necessary. Up to now, existing literature that studies decisions in the context of business processes has focused on the analysis of the definition of decisions themselves, in terms of accuracy, certainty, consistency, covering and correctness. However, to the best of our knowledge, no prior work exists that analyses the relationship between decisions and performance measurement. This paper identifies and analyses this relationship from three different perspectives, namely: the impact of decisions on process performance, the performance measurement of decisions, and the use of performance indicators in the definition of decisions. Furthermore, we also introduce solutions for the representation of these relationships based, amongst others, on the DMN standard.

Bedilia Estrada-Torres, Adela del-Río-Ortega, Manuel Resinas, Antonio Ruiz-Cortés
DMN Decision Execution on the Ethereum Blockchain

Recently blockchain technology has been introduced to execute interacting business processes in a secure and transparent way. While the foundations for process enactment on blockchain have been researched, the execution of decisions on blockchain has not been addressed yet. In this paper we argue that decisions are an essential aspect of interacting business processes, and, therefore, also need to be executed on blockchain. The immutable representation of decision logic can be used by the interacting processes, so that decision taking will be more secure, more transparent, and better auditable. The approach is based on a mapping of the DMN language S-FEEL to Solidity code to be run on the Ethereum blockchain. The work is evaluated by a proof-of-concept prototype and an empirical cost evaluation.

Stephan Haarmann, Kimon Batoulis, Adriatik Nikaj, Mathias Weske
Shared Ledger Accounting - Implementing the Economic Exchange Pattern in DL Technology

DLT suggests a new way to implement the Accounting Information System, but after reviewing the current literature our conclusion is that an ontologically sound consensus-based design is missing to date. Against this research gap, the paper introduces a DLT-based shared ledger solution in a formal way and compliant with Financial Reporting Standards. We build on the COFRIS accounting ontology (grounded in UFO-S) and the blockchain ontology developed by De Kruijff & Weigand that distinguishes between a Datalogical level, an Infological and an Essential (conceptual) level. It is shown how consensual and agent-specific parts of the business exchange transaction can be represented in a concise way, and how this pattern can be implemented using Smart Contracts.

Hans Weigand, Ivars Blums, Joost de Kruijff

Process and Multi-level Modelling

Frontmatter
Exploring New Directions in Traceability Link Recovery in Models: The Process Models Case

Traceability Links Recovery (TLR) has been a topic of interest for many years. However, TLR in Process Models has not received enough attention yet. Through this work, we study TLR between Natural Language Requirements and Process Models through three different approaches: a Models specific baseline, and two techniques based on Latent Semantic Indexing, used successfully over code. We adapted said code techniques to work for Process Models, and propose them as novel techniques for TLR in Models. The three approaches were evaluated by applying them to an academia set of Process Models, and to a set of Process Models from a real-world industrial case study. Results show that our techniques retrieve better results that the baseline Models technique in both case studies. We also studied why this is the case, and identified Process Models particularities that could potentially lead to improvement opportunities.

Raúl Lapeña, Jaime Font, Carlos Cetina, Óscar Pastor
Clinical Processes - The Killer Application for Constraint-Based Process Interactions?

For more than a decade, the interest in aligning information systems in a process-oriented way has been increasing. To enable operational support for business processes, the latter are usually specified in an imperative way. The resulting process models, however, tend to be too rigid to meet the flexibility demands of the actors involved. Declarative process modeling languages, in turn, provide a promising alternative in scenarios in which a high level of flexibility is demanded. In the scientific literature, declarative languages have been used for modeling rather simple processes or synthetic examples. However, to the best of our knowledge, they have not been used to model complex, real-world scenarios that comprise constraints going beyond control-flow. In this paper, we propose the use of a declarative language for modeling a sophisticated healthcare process scenario from the real world. The scenario is subject to complex temporal constraints and entails the need for coordinating the constraint-based interactions among the processes related to a patient treatment process. As demonstrated in this work, the selected real process scenario can be suitably modeled through a declarative approach.

Andres Jimenez-Ramirez, Irene Barba, Manfred Reichert, Barbara Weber, Carmelo Del Valle
Formal Executable Theory of Multilevel Modeling

Multi-Level Modeling (MLM) conceptualizes software models as layered architectures of sub-models that are inter-related by the instance-of relation, which breaks monolithic class hierarchies midway between subtyping and interfaces. This paper introduces a formal theory of MLM, rooted in a set-theoretic semantics of class models. The MLM theory is validated by a provably correct translation into the FOML executable logic. We show how FOML accounts for inter-level constraints, rules, and queries. In that sense, FOML is an organic executable extension for MLM that incorporates all MLM services. As much as the page budget permits, the paper illustrates how multilevel models are represented and processed in FOML.

Mira Balaban, Igal Khitron, Michael Kifer, Azzam Maraee

Data Management and Visualization

Frontmatter
An LSH-Based Model-Words-Driven Product Duplicate Detection Method

The online shopping market is growing rapidly in the 21st century, leading to a huge amount of duplicate products being sold online. An important component for aggregating online products is duplicate detection, although this is a time consuming process. In this paper, we focus on reducing the amount of possible duplicates that can be used as an input for the Multi-component Similarity Method (MSM), a state-of-the-art duplicate detection solution. To find the candidate pairs, Locality Sensitive Hashing (LSH) is employed. A previously proposed LSH-based algorithm makes use of binary vectors based on the model words in the product titles. This paper proposes several extensions to this, by performing advanced data cleaning and additionally using information from the key-value pairs. Compared to MSM, the MSMP+ method proposed in this paper leads to a minor reduction by $$6\%$$ in the $$F_1$$-measure whilst reducing the number of needed computations by $$95\%$$.

Aron Hartveld, Max van Keulen, Diederik Mathol, Thomas van Noort, Thomas Plaatsman, Flavius Frasincar, Kim Schouten
A Manageable Model for Experimental Research Data: An Empirical Study in the Materials Sciences

As in many other research areas the material sciences produce vast amounts of experimental data. The corresponding findings are then published, albeit the data remains in heterogeneous formats within institutes and is neither shared nor reused by the scientific community. To address this issue we have developed and deployed a scientific data management environment for the material sciences at various test facilities. Unlike other systems this one explicitly models every facet of the experiment and the materials used therein - thereby supporting the initial design of the experiment, its execution and the ensuing results. Consequently, the collection of the structured data becomes an integral part of the research workflow rather than a post hoc nuisance. In this paper we report on an empirical study that was performed to test the effects of a paradigm change in the data model to align it better with the actual scientific practice at hand.

Susanne Putze, Robert Porzel, Gian-Luca Savino, Rainer Malaka
VizDSL: A Visual DSL for Interactive Information Visualization

The development of systems of systems or the replacement of processes or systems can create unknowns, risks, delays and costs which are difficult to understand and characterise, and which frequently result in unforeseen issues resulting in overspend or avoidance. Yet maintaining state of the art processes and systems and utilising best of breed component systems is essential. Visualization of disparate data, systems, processes and standards can help end users to understand relationships such as class hierarchy or communication across system components better. There are many visualization tools and libraries available but they are either a black box when it comes to specifying possible interactions between end users and the visualization or require significant programming skills and manual effort to implement. In this paper we propose a visual language called VizDSL that is based on the Interaction Flow Modeling Language (IFML) for creating highly interactive visualizations. VizDSL can be used to model, share and implement interactive visualization based on model-driven engineering principles. The language has been evaluated based on interaction patterns for visualizations.

Rebecca Morgan, Georg Grossmann, Michael Schrefl, Markus Stumptner, Timothy Payne

Big Data and Intelligence

Frontmatter
Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems

The Big Data characteristics, namely volume, variety and velocity, currently highlight the severe limitations of traditional Data Warehouses (DWs). Their strict relational model, costly scalability, and, sometimes, inefficient performance open the way for emerging techniques and technologies. Recently, the concept of Big Data Warehousing is gaining attraction, aiming to study and propose new ways of dealing with the Big Data challenges in Data Warehousing contexts. The Big Data Warehouse (BDW) can be seen as a flexible, scalable and highly performant system that uses Big Data techniques and technologies to support mixed and complex analytical workloads (e.g., streaming analysis, ad hoc querying, data visualization, data mining, simulations) in several emerging contexts like Smart Cities and Industries 4.0. However, due to the almost embryonic state of this topic, the ambiguity of the constructs and the lack of common approaches still prevails. In this paper, we discuss and evaluate some design patterns and trends in Big Data Warehousing systems, including data modelling techniques (e.g., star schemas, flat tables, nested structures) and some streaming considerations for BDWs (e.g., Hive vs. NoSQL databases), aiming to foster and align future research, and to help practitioners in this area.

Carlos Costa, Maribel Yasmina Santos
KAYAK: A Framework for Just-in-Time Data Preparation in a Data Lake

A data lake is a loosely-structured collection of data at large scale that is usually fed with almost no requirement of data quality. This approach aims at eliminating any human effort before the actual exploitation of data, but the problem is only delayed since preparing and querying a data lake is usually a hard task. We address this problem by introducing Kayak, a framework that helps data scientists in the definition and optimization of pipelines of data preparation. Since in many cases approximations of the results, which can be computed rapidly, are enough informative, Kayak allows the users to specify their needs in terms of accuracy over performance and produces previews of the outputs satisfying such requirement. In this way, the pipeline is executed much faster and the process of data preparation is shortened. We discuss the design choices of Kayak including execution strategies, optimization techniques, scheduling of operations, and metadata management. With a set of preliminary experiments, we show that the approach is effective and scales well with the number of datasets in the data lake.

Antonio Maccioni, Riccardo Torlone
Defining Interaction Design Patterns to Extract Knowledge from Big Data

The Big Data domain offers valuable opportunities to gain valuable knowledge. The User Interface (UI), the place where the user interacts to extract knowledge from data, must be adapted to address the domain complexities. Designing UIs for Big Data becomes a challenge that involves identifying and designing the user-data interaction implicated in the knowledge extraction. To design such an interaction, one widely used approach is design patterns. Design Patterns describe solutions to common interaction design problems. This paper proposes a set of patterns to design UIs aimed at extracting knowledge from the Big Data systems’ data conceptual schemas. As a practical example, we apply the patterns to design UI’s for the Diagnosis of Genetic Diseases domain since it is a clear case of extracting knowledge from a complex set of genetic data. Our patterns provide valuable design guidelines for Big Data UIs.

Carlos Iñiguez-Jarrín, José Ignacio Panach, Oscar Pastor López
Continuous Improvement, Business Intelligence and User Experience for Health Care Quality

Long-term health care organizations are facing increased complexity to provide new, high quality services (required by regional laws) keeping costs under control. They have to deal with many internal/external procedures involving outsourced services. We develop a BI solution with a “global” approach to face different issues due to: deep impact on processes, systems, organization’s structure and job roles. We propose the combination of different methodologies as Balanced Scorecard, Change management, Lean tools, User Experience with classical Data Warehouse design and development cycle. This new approach can: create “cascading improvement process” for all Departments (medical, administrative); allow timely and easy access to providers’ information; bring to governing institutions a considerable saving in time and accurate control of social services’ quality. Furthermore, it can develop a new culture towards processes and no value activities to increase overall quality and efficiency. The method has been applied to organizations in north of Italy.

Annamaria Chiasera, Elisa Creazzi, Marco Brandi, Ilaria Baldessarini, Cinzia Vispi

Data Modelling and Mining

Frontmatter
Embedded Cardinality Constraints

Cardinality constraints express bounds on the number of data patterns that occur in application domains. They improve the consistency dimension of data quality by enforcing these bounds within database systems. Much research has examined the semantics of integrity constraints over incomplete relations in which null markers can occur. Unfortunately, relying on some fixed interpretation of null markers leads frequently to doubtful results. We introduce the class of embedded cardinality constraints which hold on incomplete relations independently of how null marker occurrences are interpreted. Two major technical contributions are made as well. Firstly, we establish an axiomatic and an algorithmic characterization of the implication problem associated with embedded cardinality constraints. This enables humans and computers to reason efficiently about such business rules. Secondly, we exemplify the occurrence of embedded cardinality constraints in real-world benchmark data sets both qualitatively and quantitatively. That is, we show how frequently they occur, and exemplify their semantics.

Ziheng Wei, Sebastian Link
Relationship Matching of Data Sources: A Graph-Based Approach

Relationship matching is a key procedure during the process of transforming structural data sources, like relational data bases, spreadsheets into the common data model. The matching task refers to the automatic identification of correspondences between relationships of source columns and the relationships of the common data model. Numerous techniques have been developed for this purpose. However, the work is missing to recognize relationship types between entities in information obtained from data sources in instance level and resolve ambiguities. In this paper, we develop a method for resolving ambiguous relationship types between entity instances in structured data. The proposed method can be used as standalone matching techniques or to complement existing relationship matching techniques of data sources. The result of an evaluation on a large real-world data set demonstrated the high accuracy of our approach (>80%).

Zaiwen Feng, Wolfgang Mayer, Markus Stumptner, Georg Grossmann, Wangyu Huang
Business Credit Scoring of Estonian Organizations

Recent hype in social analytics has modernized personal credit scoring to take advantage of rapidly changing non-financial data. At the same time business credit scoring still relies on financial data and is based on traditional methods. Such approaches, however, have the following limitations. First, financial reports are compiled typically once a year, hence scoring is infrequent. Second, since there is a delay of up to two years in publishing financial reports, scoring is based on outdated data and is not applied to young businesses. Third, quality of manually crafted models, although human-interpretable, is typically inferior to the ones constructed via machine learning. In this paper we describe an approach for applying extreme gradient boosting with Bayesian hyper-parameter optimization and ensemble learning for business credit scoring with frequently changing/updated data such as debts and network metrics from board membership/ownership networks. We report accuracy of the learned model as high as 99.5%. Additionally we discuss lessons learned and limitations of the approach.

Jüri Kuusik, Peep Küngas

Quality Requirements and Software

Frontmatter
A Behavior-Based Framework for Assessing Product Line-Ability

Systems are typically not developed from scratch, so different kinds of similarities between them exist, challenging their maintenance and future development. Software Product Line Engineering (SPLE) proposes methods and techniques for developing reusable artifacts that can be systematically reused in similar systems. Despite the potential benefits of SPLE to decrease time-to-market and increase product quality, it requires a high up-front investment and hence SPLE techniques are commonly adopted in a bottom-up approach, after individual systems have already been developed. Deciding whether to turn existing systems into a product line – referred to as product line-ability – involves many aspects and requires some tooling for analyzing similarities and differences among systems. In this paper we propose a framework for the identification of “similarly behaving” artifacts and analyzing their potential reuse in the context of product lines. This framework provides metrics for calculating behavior similarity and a method for analyzing the product line-ability of a set of products. The framework has been integrated into a tool named VarMeR – Variability Mechanisms Recommender, whose aim is to systematically guide reuse.

Iris Reinhartz-Berger, Anna Zamansky
Data-Driven Elicitation, Assessment and Documentation of Quality Requirements in Agile Software Development

Quality Requirements (QRs) are difficult to manage in agile software development. Given the pressure to deploy fast, quality concerns are often sacrificed for the sake of richer functionality. Besides, artefacts as user stories are not particularly well-suited for representing QRs. In this exploratory paper, we envisage a data-driven method, called Q-Rapids, to QR elicitation, assessment and documentation in agile software development. Q-Rapids proposes: (1) The collection and analysis of design and runtime data in order to raise quality alerts; (2) The suggestion of candidate QRs to address these alerts; (3) A strategic analysis of the impact of such requirements by visualizing their effect on a set of indicators rendered in a dashboard; (4) The documentation of the requirements (if finally accepted) in the backlog. The approach is illustrated with scenarios evaluated through a questionnaire by experts from a telecom company.

Xavier Franch, Cristina Gómez, Andreas Jedlitschka, Lidia López, Silverio Martínez-Fernández, Marc Oriol, Jari Partanen
A Situational Approach for the Definition and Tailoring of a Data-Driven Software Evolution Method

Successful software evolution heavily depends on the selection of the right features to be included in the next release. Such selection is difficult, and companies often report bad experiences about user acceptance. To overcome this challenge, there is an increasing number of approaches that propose intensive use of data to drive evolution. This trend has motivated the SUPERSEDE method, which proposes the collection and analysis of user feedback and monitoring data as the baseline to elicit and prioritize requirements, which are then used to plan the next release. However, every company may be interested in tailoring this method depending on factors like project size, scope, etc. In order to provide a systematic approach, we propose the use of Situational Method Engineering to describe SUPERSEDE and guide its tailoring to a particular context.

Xavier Franch, Jolita Ralyté, Anna Perini, Alberto Abelló, David Ameller, Jesús Gorroñogoitia, Sergi Nadal, Marc Oriol, Norbert Seyff, Alberto Siena, Angelo Susi
Backmatter
Metadata
Title
Advanced Information Systems Engineering
Editors
John Krogstie
Prof. Dr. Hajo A. Reijers
Copyright Year
2018
Electronic ISBN
978-3-319-91563-0
Print ISBN
978-3-319-91562-3
DOI
https://doi.org/10.1007/978-3-319-91563-0

Premium Partner