Skip to main content
Top

2020 | Book

Research Challenges in Information Science

14th International Conference, RCIS 2020, Limassol, Cyprus, September 23–25, 2020, Proceedings

insite
SEARCH

About this book

This book constitutes the proceedings of the 14th International Conference on Research Challenges in Information Sciences, RCIS 2020, held in Limassol, Cyprus, during September 23-25, 2020. The conference was originally scheduled for May 2020, but the organizing committee was forced to postpone the conference due to the outbreak of the COVID-19 pandemic.

The scope of RCIS 2020 is summarized by the thematic areas of information systems and their engineering; user-oriented approaches; data and information management; business process management; domain-specific information systems engineering; data science; information infrastructures, and reflective research and practice.

The 26 full papers and 3 work in progress papers presented in this volume were carefully reviewed and selected from 106 submissions. They were organized in topical sections named:

Data Analytics and Business Intelligence; Digital Enterprise and Technologies; Human Factors in Information Systems; Information Systems Development and Testing; Machine Learning and Text Processing; and Security and Privacy. The volume also contains 12 poster and demo-papers, and 4 Doctoral Consortium papers.

Table of Contents

Frontmatter

Data Analytics and Business Intelligence

Frontmatter
Toward Becoming a Data-Driven Organization: Challenges and Benefits

Organizations are looking for ways to harness the power of big data and to incorporate the shift that big data brings into their competitive strategies in order to seek competitive advantage and to improve their decision making by becoming data-driven organizations. Despite the potential benefits to be gained from becoming data-driven, the number of organizations that efficiently use it and successfully transform into data-driven organizations stays low. The emphasis in the literature has mostly been technology oriented with limited attention paid to the organizational challenges it entails. This paper presents an empirical study that investigates the challenges and benefits faced by organizations when moving toward becoming a data-driven organization. Data were collected through semi-structured interviews with 15 practitioners from nine software developing companies. The study identifies 49 challenges an organization may face when implementing a data-driven organization in practice, and it identifies 23 potential benefits of a data-driven organization compared to a non-data-driven organization.

Richard Berntsson Svensson, Maryam Taghavianfar
A Big Data Conceptual Model to Improve Quality of Business Analytics

As big data becomes an important part of business analytics for gaining insights about business practices, the quality of big data is an essential factor impacting the outcomes of business analytics. Although this is quite challenging, conceptual modeling has much potential to solve it since the good quality of data comes from good quality of models. However, existing data models at a conceptual level have limitations to incorporate quality aspects into big data models. In this paper, we focus on the challenges cause by Variety of big data propose IRIS, a conceptual modeling framework for big data models which enables us to define three modeling quality notions – relevance, comprehensiveness, and relative priorities and incorporate such qualities into a big data model in a goal-oriented approach. Explored big data models based on the qualities are integrated with existing data grounded on three conventional organizational dimensions creating a virtual big data model. An empirical study has been conducted using the shipping decision process of a worldwide retail chain, to gain an initial understanding of the applicability of this approach.

Grace Park, Lawrence Chung, Haan Johng, Vijayan Sugumaran, Sooyong Park, Liping Zhao, Sam Supakkul
How to Measure Influence in Social Networks?

Today, social networks are a valued resource of social data that can be used to understand the interactions among people and communities. People can influence or be influenced by interactions, shared opinions and emotions. However, in the social network analysis, one of the main problems is to find the most influential people. This work aims to report on the results of literature review whose goal was to identify and analyse the metrics, algorithms and models used to measure the user influence on social networks. The search was carried out in three databases: Scopus, IEEEXplore, and ScienceDirect. We restricted published articles between the years 2014 until 2020, in English, and we used the following keywords: social networks analysis, influence, metrics, measurements, and algorithms. Backward process was applied to complement the search considering inclusion and exclusion criteria. As a result of this process, we obtained 25 articles: 12 in the initial search and 13 in the backward process. The literature review resulted in the collection of 21 influence metrics, 4 influence algorithms, and 8 models of influence analysis. We start by defining influence and presenting its properties and applications. We then proceed by describing, analysing and categorizing all that were found metrics, algorithms, and models to measure influence in social networks. Finally, we present a discussion on these metrics, algorithms, and models. This work helps researchers to quickly gain a broad perspective on metrics, algorithms, and models for influence in social networks and their relative potentialities and limitations.

Ana Carolina Ribeiro, Bruno Azevedo, Jorge Oliveira e Sá, Ana Alice Baptista
Developing a Real-Time Traffic Reporting and Forecasting Back-End System

This work describes the architecture of the back-end engine of a real-time traffic data processing and satellite navigation system. The role of the engine is to process real-time feedback, such as speed and travel time, provided by in-vehicle devices and derive real-time reports and traffic predictions through leveraging historical data as well. We present the main building blocks and the versatile set of data sources and processing platforms that need to be combined together to form a working and scalable solution. We also present performance results focusing on meeting system requirements keeping the need for computing resources low. The lessons and results presented are of value to additional real-time applications that rely on both recent and historical data.

Theodoros Toliopoulos, Nikodimos Nikolaidis, Anna-Valentini Michailidou, Andreas Seitaridis, Anastasios Gounaris, Nick Bassiliades, Apostolos Georgiadis, Fotis Liotopoulos
IoT Analytics Architectures: Challenges, Solution Proposals and Future Research Directions

The Internet of Things (IoT) presents an extensive area for research, based on its growing importance in a multitude of different domains of everyday life, business and industry. In this context, different aspects of data analytics, e.g. algorithms or system architectures, as well as their scientific investigation play a pivotal role in the advancement of the IoT. Therefore, past research has presented a multitude of architectural approaches to enable data processing and analytics in various IoT domains, addressing different architectural challenges. In this paper, we identify and present an overview of these challenges as well as existing architectural proposals. Furthermore, we categorize found architectural proposals along various dimensions in order to highlight the evolution of research in this field and pinpoint architectural shortcomings. The results of this paper show that several challenges have been addressed by a large number of IoT system architectures for data analytics while others are either not relevant for certain domains or need further investigation. Finally, we offer points of reference for future research based on the findings of this paper.

Theo Zschörnig, Robert Wehlitz, Bogdan Franczyk

Digital Enterprise and Technologies

Frontmatter
Structural Coupling, Strategy and Fractal Enterprise Modeling

The concept of structural coupling, which comes from the biological cybernetics, has been found useful for organizational decision making on the higher level, such as management of organizational identity and strategy development. However, currently, there is no systematic procedure for finding all elements (other organizations, markets, etc.) of the environment to which a given organization is structurally coupled, or will be coupled after redesign. The paper tries to fill the gap by employing enterprise modeling to identify structural couplings. More specifically, an extended Fractal Enterprise Model (FEM) is used for this end. FEM connects enterprise processes with assets that are used in and are managed by these processes. The extended FEM adds concepts to represent external elements and their connections to the enterprise. The paper drafts rules for identifying structural couplings in the model by analyzing FEMs that represent different phases of the development of a company which the author co-founded and worked for over 20 years.

Ilia Bider
Systems-Thinking Heuristics for the Reconciliation of Methodologies for Design and Analysis for Information Systems Engineering

Many competing, complementary, generic, or specific methodologies for design and analysis co-exist in the field of Information System Engineering. The idea of reconciling these methodologies and their underlying theories has crossed the minds of researchers many times. In this paper, we inquire into the nature of such reconciliation using the interpretivist research paradigm. This paradigm acknowledges the existence of diverse points of view as ways of seeing and experiencing the world through different contexts. We examine why it might be impossible to reconcile these methodologies that each represents a point of view. Instead of searching for the one (overarching, universal, global, ultimate) methodology that reconciles all others, we explain why we should think about reconciliation as an ongoing practice. We propose to the community a set of heuristics for this practice. The heuristics are a result of our experience in reconciling a number of methods that we created as part of our research during the past 20 years. We illustrate the use of the heuristics with an example of use cases and user stories. We believe these heuristics to be of interest to the Information Systems Engineering community.

Blagovesta Kostova, Irina Rychkova, Andrey Naumenko, Gil Regev, Alain Wegmann
An Ontology of IS Design Science Research Artefacts

From a design science perspective, information systems and their components are viewed as artefacts. However, not much has been written yet on the ontological status of artefacts or their structure. After March & Smith’s (1995) initial classification of artefacts in terms of models, constructs, procedures and instantiations, there have been only a few attempts to come up with a more systematic approach. After reviewing previous work, this conceptual paper introduces an ontology of IS artefacts. It starts with an ontological characterization of artefacts and technical objects in general and proceeds to introduce a systematic classification of IS artefacts and compare it with existing work. We end with some practical implications for design research.

Hans Weigand, Paul Johannesson, Birger Andersson
Evolution of Enterprise Architecture for Intelligent Digital Systems

Intelligent systems and services are the strategic targets of many current digitalization efforts and part of massive digital transformations based on digital technologies with artificial intelligence. Digital platform architectures and ecosystems provide an essential base for intelligent digital systems. The paper raises an important question: Which development paths are induced by current innovations in the field of artificial intelligence and digitalization for enterprise architectures? Digitalization disrupts existing enterprises, technologies, and economies and promotes the architecture of cognitive and open intelligent environments. This has a strong impact on new opportunities for value creation and the development of intelligent digital systems and services. Digital technologies such as artificial intelligence, the Internet of Things, service computing, cloud computing, blockchains, big data with analysis, mobile systems, and social business network systems are essential drivers of digitalization. We investigate the development of intelligent digital systems supported by a suitable digital enterprise architecture. We present methodological advances and an evolutionary path for architectures with an integral service and value perspective to enable intelligent systems and services that effectively combine digital strategies and digital architectures with artificial intelligence.

Alfred Zimmermann, Rainer Schmidt, Dierk Jugel, Michael Möhring

Human Factors in Information Systems

Frontmatter
Online Peer Support Groups for Behavior Change: Moderation Requirements

Technology-assisted behaviour awareness and change is on the rise. Examples include apps and sites for fitness, healthy eating, mental health and smoking cessation. These information systems recreated principles of influence and persuasion in a digital form allowing real-time observation, interactivity and intervention. Peer support groups are one of the behavioural influence techniques which showed various benefits, including hope installation and relapse prevention. However, unmoderated groups may become a vehicle for comparisons and unmanaged interactions leading to digression, normalising the negative behaviour and lowering self-esteem. A typical requirement of such groups is to be of a social and supportive nature whereas moderation, through humans or artificial agents, may face a risk of being seen as centralised and overly managed governance approach. In this paper, we explore the requirements and different preferences about moderators as seen by members. We follow a mixed-method approach consisting of a qualitative phase that included two focus groups and 16 interviews, followed by a quantitative phase, including a survey with 215 participants who declared having well-being issues. We report on the qualitative phase findings achieved through thematic analysis. We also report and discuss the survey results studying the role of gender, self-control, personality traits, culture, the perception of usefulness and willingness to join the group as predictors of the members’ expectations from moderators, resulted from the qualitative phase.

Manal Aldhayan, Mohammad Naiseh, John McAlaney, Raian Ali
User-Experience in Business Intelligence - A Quality Construct and Model to Design Supportive BI Dashboards

Business Intelligence (BI) intends to provide business managers with timely information about their company. Considerable research effort has been devoted to the modeling and specification of BI systems, with the objective to improve the quality of resulting BI output and decrease the risk of BI projects failure. In this paper, we focus on the specification and modeling of one component of the BI architecture: the dashboards. These are the interface between the whole BI system and end-users, and received smaller attention from the scientific community. We report preliminary results from an Action-Research project conducted since February 2019 with three Belgian companies. Our contribution is threefold: (i) we introduce BIXM, an extension of the existing Business Intelligence Model (BIM) that accounts for BI user-experience aspects, (ii) we propose a quality framework for BI dashboards and (iii) we review existing BI modeling notations and map them to our quality framework as a way to identify existing gaps in the literature.

Corentin Burnay, Sarah Bouraga, Stéphane Faulkner, Ivan Jureta
FINESSE: Fair Incentives for Enterprise Employees

Service enterprises typically motivate their employees by providing incentives in addition to their basic salary. Generally speaking, an incentive scheme should reflect the enterprise wide objectives, e.g., maximize productivity, ensure fairness etc. Often times, after an incentive scheme is rolled out, non-intuitive outcomes (e.g., low performers getting high incentives) may become visible, which are undesired for an organization. A poorly designed incentive mechanism can hurt the operations of a service business in many ways including: (a) de-motivating the top performers from delivering high volume and high quality of work, (b) allowing the mid-performers not to push themselves to the limit that they can deliver, and (c) potentially increasing the number of low performers and thereby, reducing the profit of the organization. This paper describes FINESSE, a systematic framework to evaluate the fairness of a given incentive scheme. Such fairness is quantified in terms of the employee ordering with respect to a notion of employee utility, as captured through disparate key performance indicators (KPIs, e.g., work duration, work quality). Our approach uses a multi-objective formulation via Pareto optimal front generation followed by front refinement with domain specific constraints. We evaluate FINESSE by comparing two candidate incentive schemes: (a) an operational scheme that is known for non-intuitive disbursements, and (b) a contender scheme that is aimed at filling the gaps of the operational scheme. Using real anonymized dataset from a BPO services business, we show that FINESSE is effectively able to distinguish between the fairness (or lack thereof) of the two schemes across a set of metrics. Finally, we build and demonstrate a prototype dashboard that implements FINESSE and can be used by the business leaders in practice.

Soumi Chattopadhyay, Rahul Ghosh, Ansuman Banerjee, Avantika Gupta, Arpit Jain
Explainable Recommendations in Intelligent Systems: Delivery Methods, Modalities and Risks

With the increase in data volume, velocity and types, intelligent human-agent systems have become popular and adopted in different application domains, including critical and sensitive areas such as health and security. Humans’ trust, their consent and receptiveness to recommendations are the main requirement for the success of such services. Recently, the demand on explaining the recommendations to humans has increased both from humans interacting with these systems so that they make an informed decision and, also, owners and systems managers to increase transparency and consequently trust and users’ retention. Existing systematic reviews in the area of explainable recommendations focused on the goal of providing explanations, their presentation and informational content. In this paper, we review the literature with a focus on two user experience facets of explanations; delivery methods and modalities. We then focus on the risks of explanation both on user experience and their decision making. Our review revealed that explanations delivery to end-users is mostly designed to be along with the recommendation in a push and pull styles while archiving explanations for later accountability and traceability is still limited. We also found that the emphasis was mainly on the benefits of recommendations while risks and potential concerns, such as over-reliance on machines, is still a new area to explore.

Mohammad Naiseh, Nan Jiang, Jianbing Ma, Raian Ali
Participation in Hackathons: A Multi-methods View on Motivators, Demotivators and Citizen Participation

Hackathons are problem-focused programming events that allow conceiving, implementing, and presenting digital innovations. The number of participants is one of the key success factors of hackathons. In order to maximize that number, it is essential to understand what motivates people to participate. Previous work on the matter focused on quantitative studies and addressed neither the topic of demotivators nor the relationship between participation in hackathons and citizen participation, although hackathons constitute a promising participation method where citizens can build their own project, amongst other methods such as meetings or online platforms. Therefore, in this study, we examined a specific hackathon organized in Belgium and collected data about the motivators and demotivators of the participants through a questionnaire and in-depth interviews, thereby following a multi-methods approach. This study contributes to the scarce theoretical discussion on the topic by defining precisely the motivators and demotivators and provides recommendations for hackathon organizers to help them bring in more participants. Furthermore, from our exploration of the relationship between participation in hackathons and citizen participation, we suggest a citizen participation ecosystem embedding hackathons to provide benefits for the society.

Anthony Simonofski, Victor Amaral de Sousa, Antoine Clarinval, Benoît Vanderose

Information Systems Development and Testing

Frontmatter
A Systematic Literature Review of Blockchain-Enabled Smart Contracts: Platforms, Languages, Consensus, Applications and Choice Criteria

Blockchain technology is touted to revolutionize the financial sector at the beginning of its emergence. However, its area of application has expanded to include: Supply Chain Management (SCM), healthcare, e-commerce, IoT, etc. Moreover, Smart contracts are now used by different industries not only for their high transparency and accuracy but also for their capability to exclude the third parties’ involvement. Blockchain-enabled smart contracts are being adopted in different kinds of projects but still face many challenges and technical issues. This gap stems mostly from the lack of standards in smart contracts despite the Ethereum Foundation’s efforts. When seeking to use this technology, it is a challenge for companies to find their way in this multiplicity. This paper is a tentative response to this problem; we conduct a systematic review of the literature and propose a preliminary guidance framework. This framework is applied to three illustrative cases to demonstrate feasibility and relevance.

Samya Dhaiouir, Saïd Assar
Scriptless Testing at the GUI Level in an Industrial Setting

TESTAR is a traversal-based and scriptless tool for test automation at the Graphical User Interface (GUI) level. It is different from existing test approaches because no test cases need to be defined before testing. Instead, the tests are generated during the execution, on-the-fly. This paper presents an empirical case study in a realistic industrial context where we compare TESTAR to a manual test approach of a web-based application in the rail sector. Both qualitative and quantitative research methods are used to investigate learnability, effectiveness, efficiency, and satisfaction. The results show that TESTAR was able to detect more faults and higher functional test coverage than the used manual test approach. As far as efficiency is concerned, the preparation time of both test approaches is identical, but TESTAR can realize test execution without the use of human resources. Finally, TESTAR turns out to be a learnable test approach. As a result of the study described in this paper, TESTAR technology was successfully transferred and the company will use both test approaches in a complementary way in the future.

Hatim Chahim, Mehmet Duran, Tanja E. J. Vos, Pekka Aho, Nelly Condori Fernandez
Improving Performance and Scalability of Model-Driven Generated Web Applications
An Experience Report

Context. Performance and scalability are of critical value for distributed and multiuser systems like web applications. Posity is a model-driven development tool that allows software engineers to specify a set of graphical diagrams for the automatic generation of web and/or desktop software applications. Posity provides the benefits of model-driven engineering (MDE) tools in terms of high-quality code generation, implementation speed, support for traceability and debuggability, etc. However, web applications generated with Posity do not scale properly to satisfy unpredictable performance demands. As a result, Posity industrial adoption is hindered. Objective. Design a treatment for improving performance and scalability of web applications generated with Posity. Method. We investigate current problems of web applications generated with Posity. Results from our investigation suggest candidate architectures, which we evaluate by applying the architecture trade-off analysis method (ATAM). The outcome of the ATAM evaluation guides the design and implementation of a thick-client architecture for the Posity runtime environment for web applications; which we validate by means of a laboratory demonstration. Results. i) we contribute with criteria for selecting a proper architecture for solving performance and scalability problems, and ii) we report on the experience of designing, developing and validating an architecture for Posity runtime environment. Conclusions. Results from the laboratory demonstration show tangible improvements in terms of performance and scalability of web applications generated by Posity. These advancements are promising and motivate further development of the thick-client architecture for Posity runtime environment for web applications. This experience report concludes with lessons learnt on promoting the adoption of model-driven development tools.

Gioele Moretti, Marcela Ruiz, Jürgen Spielberger
TesCaV: An Approach for Learning Model-Based Testing and Coverage in Practice

Academy and industry permanently remark the importance of software-testing techniques to improve software quality and to reduce development and maintenance costs. A testing method to be considered for this purpose is Model-Based Testing (MBT), which generates test cases from a model that represents the structure and the behavior of the system to be developed. The generated test suite is easier to maintain and adapt to changes in requirements or evolution of the developed system. However, teaching and learning MBT techniques are not easy tasks; students need to know the different testing techniques to assure that the requirements are fulfilled as well as to identify any failure in the software system modeled. In this work, we present TesCaV, an MBT teaching tool for university students, which is based on a model-driven technology for the automatic software generation from UML diagrams. TesCaV allows validating the test cases defined by students and graphically determines the level of testing coverage over the system modeled. Preliminary results show TesCaV as a promising approach for MBT teaching/learning processes.

Beatriz Marín, Sofía Alarcón, Giovanni Giachetti, Monique Snoeck

Machine Learning and Text Processing

Frontmatter
Automatic Classification Rules for Anomaly Detection in Time-Series

Anomaly detection in time-series is an important issue in many applications. It is particularly hard to accurately detect multiple anomalies in time-series. Pattern discovery and rule extraction are effective solutions for allowing multiple anomaly detection. In this paper, we define a Composition-based Decision Tree algorithm that automatically discovers and generates human-understandable classification rules for multiple anomaly detection in time-series. To evaluate our solution, our algorithm is compared to other anomaly detection algorithms on real datasets and benchmarks.

Ines Ben Kraiem, Faiza Ghozzi, Andre Peninou, Geoffrey Roman-Jimenez, Olivier Teste
Text Embeddings for Retrieval from a Large Knowledge Base

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain question answering context, where the first task is to find paragraphs useful for answering a given question. First, we compare the quality of various text-embedding methods on the performance of retrieval and give an extensive empirical comparison on the performance of various non-augmented base embedding with, and without IDF weighting. Our main results are that by training deep residual neural models, specifically for retrieval purposes, can yield significant gains when it is used to augment existing embeddings. We also establish that deeper models are superior to this task. The best base baseline embeddings augmented by our learned neural approach improves the top-1 paragraph recall of the system by $$14\%$$14%.

Tolgahan Cakaloglu, Christian Szegedy, Xiaowei Xu
Predicting Unemployment with Machine Learning Based on Registry Data

Many statistical models have been developed to understand the causes of unemployment, but predicting unemployment has received less attention. In this study, we develop a model to predict the labour market state of a person based on machine learning trained with a large administrative unemployment registry. The model specifies individuals as Markov chains with person specific transition rates. We evaluate the model on three tasks, where the goal is to predict who has the highest risk of escaping unemployment, becoming unemployed, and being unemployed at any given time. We obtain good performance (AUC: 0.80) for the machine learning model of lifetime unemployment, and very good performance (AUC: 0.90+) to the near future when we know the recent labour market state of a person. We find that person information affects the predictions in an intuitive way, but there still are significant differences that can be learned by utilizing labour market histories.

Markus Viljanen, Tapio Pahikkala
Anomaly Detection on Data Streams – A LSTM’s Diary

In the past years, the importance of processing data streams increased with the emergence of new technologies and application domains. The Internet of Things provides many examples, in which processing and analyzing data streams are critical success factors. An important use case is to identify anomalies, i.e. something that is different or unexpected. We often have to cope with anomaly detection use cases in sequences within data streams, for instance in network intrusion, in predictive analytics or in forecasting.Sequence analysis can be performed using recurrent neural nets and in particular, we use long short-term memory (LSTM) neural nets. An LSTM is not only capable of storing a sequence of data but also of deciding to forget certain parts of it. Unfortunately, the internal representation of learned data does not clearly illustrate what was learned. Moreover, like many neural net-based approaches, these nets tend to need a high volume of data in order to produce valuable insights.In this paper, we want to present an experimental setting, comprising an architecture, a structured way of producing sample data and end-to-end pipelines to store and evaluate the hidden state of a LSTM per training batch. Main purpose is to extract the hidden state as well as to analyze changes during training and thus to identify patterns in the hidden state as well as anomalies.

Christoph Augenstein, Bogdan Franczyk

Process Mining, Discovery, and Simulation

Frontmatter
Discovering Business Process Simulation Models in the Presence of Multitasking

Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the faithfulness of the process model and simulation parameters given as input to the simulator. To tackle this limitation, several authors have proposed to discover simulation models from process execution logs so that the resulting simulation models more closely match reality. Existing techniques in this field assume that each resource in the process performs one task at a time. In reality, however, resources may engage in multitasking behavior. Traditional simulation approaches do not handle multitasking. Instead, they rely on a resource allocation approach wherein a task instance is only assigned to a resource when the resource is free. This inability to handle multitasking leads to an overestimation of execution times. This paper proposes an approach to discover multitasking in business process execution logs and to generate a simulation model that takes into account the discovered multitasking behavior. The key idea is to adjust the processing times of tasks in such a way that executing the multitasked tasks sequentially with the adjusted times is equivalent to executing them concurrently with the original processing times. The proposed approach is evaluated using a real-life dataset and synthetic datasets with different levels of multitasking. The results show that, in the presence of multitasking, the approach improves the accuracy of simulation models discovered from execution logs.

Bedilia Estrada-Torres, Manuel Camargo, Marlon Dumas, Maksym Yerokhin
TLKC-Privacy Model for Process Mining

Process mining aims to provide insights into the actual processes based on event data. These data are widely available and often contain private information about individuals. Consider for example health-care information systems recording highly sensitive data related to diagnosis and treatment activities. Process mining should reveal insights in the form of annotated models, yet, at the same time, should not reveal sensitive information about individuals. In this paper, we discuss the challenges regarding directly applying existing well-known privacy-preserving techniques to event data. We introduce the TLKC-privacy model for process mining that provides privacy guarantees in terms of group-based anonymization. It extends and customizes the LKC-privacy model presented to deal with high-dimensional, sparse, and sequential trajectory data. Experiments on real-life event data demonstrate that our privacy model maintains a high utility for process discovery and performance analyses while preserving the privacy of the cases.

Majid Rafiei, Miriam Wagner, Wil M. P. van der Aalst
Incremental Discovery of Hierarchical Process Models

Many of today’s information systems record the execution of (business) processes in great detail. Process mining utilizes such data and aims to extract valuable insights. Process discovery, a key research area in process mining, deals with the construction of process models based on recorded process behavior. Existing process discovery algorithms aim to provide a “push-button-technology”, i.e., the algorithms discover a process model in a completely automated fashion. However, real data often contain noisy and/or infrequent complex behavioral patterns. As a result, the incorporation of all behavior leads to very imprecise or overly complex process models. At the same time, data pre-processing techniques have shown to be able to improve the precision of process models, i.e., without explicitly using domain knowledge. Yet, to obtain superior process discovery results, human input is still required. Therefore, we propose a discovery algorithm that allows a user to incrementally extend a process model by new behavior. The proposed algorithm is designed to localize and repair nonconforming process model parts by exploiting the hierarchical structure of the given process model. The evaluation shows that the process models obtained with our algorithm, which allows for incremental extension of a process model, have, in many cases, superior characteristics in comparison to process models obtained by using existing process discovery and model repair techniques.

Daniel Schuster, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Security and Privacy

Frontmatter
Ontology Evolution in the Context of Model-Based Secure Software Engineering

Ontologies as a means to formally specify the knowledge of a domain of interest have made their way into information and communication technology. Most often, such knowledge is subject to continuous change, which demands for consistent evolution of ontologies and dependent artifacts. In this paper, we study ontology evolution in the context of a model-based approach to engineering of secure software, where ontologies are used to formalize the security context knowledge which is needed to come up with software systems which can be considered secure. In this application scenario, techniques for detecting ontology changes and determining their semantic impact are faced with a couple of challenging requirements which are not met by existing solutions. To overcome these shortcomings, we adapt a state-based approach to model differencing to OWL ontologies. Our solution is capable of detecting semantic editing patterns which may be customly defined using graph transformation rules, but it does not depend on information about editing processes such as persistently managed change logs. We showcase how to leverage semantic editing patterns for the sake of system model co-evolution in response to changing security context knowledge, and demonstrate the feasibility of the approach using a realistic medical information system.

Jens Bürger, Timo Kehrer, Jan Jürjens
Blockchain-Based Personal Health Records for Patients’ Empowerment

With the current trend of patient-centric health-care, blockchain-based Personal Health Records (PHRs) frameworks have been emerging. The adoption of these frameworks is still in its infancy stage and is dependent on a broad range of factors. In this paper we look at some of the typical concerns raised from a centralized medical records solution such as the one deployed in France. Based on the state of the art literature in terms of Electronic Health Records (EHRs) and PHRs, we discuss the main implementation bottlenecks that can be encountered when deploying a blockchain solution and how to avoid them. In particular, we explore these bottlenecks in the context of the French PHR system and suggest some recommendations for a paradigm shift towards patients’ empowerment.

Omar El Rifai, Maelle Biotteau, Xavier de Boissezon, Imen Megdiche, Franck Ravat, Olivier Teste
COPri - A Core Ontology for Privacy Requirements Engineering

In their daily practice, most enterprises collect, store, and manage personal information for customers in order to deliver their services. In such a setting, privacy has emerged as a key concern as companies often neglect or even misuse personal data. In response to this, governments around the world have enacted laws and regulations for privacy protection. These laws dictate privacy requirements for any system that acquires and manages personal data. Unfortunately, these requirements are often incomplete and/or inaccurate as many RE practitioners might be unsure of what exactly are privacy requirements and how are they different from other requirements, such as security. To tackle this problem, we developed a comprehensive ontology for privacy requirements. To make it comprehensive, we base our ontology on a systematic review of the literature on privacy requirements. The contributions of this work include the derivation of an ontology from a previously conducted systematic literature review, an implementation using an ontology definition tool (Protégé), a demonstration of its coverage through an extensive example on Ambient Assisted Living, and a validation through a competence questionnaire answered by lexical semantics experts as well as privacy and security researchers.

Mohamad Gharib, John Mylopoulos, Paolo Giorgini
Privacy Preserving Real-Time Video Stream Change Detection Based on the Orthogonal Tensor Decomposition Models

In this paper the video change detection method that allows for data privacy protection is proposed. Signal change detection is based on the tensor models constructed in the orthogonal tensor subspaces. Tensor methods allow for processing of any kind of multi-dimensional signals since computation of special features is not required. The proposed signal encoding method makes that person identification in the processed signal is very difficult or impossible for the unauthorized personnel. It is demonstrated that despite the input being distorted for encryption, the proposed tensor based method can still correctly identify video shots in real-time. Compared with the non-distorted signals, the obtained accuracy is only slightly lower, at the same time providing data privacy.

Bogusław Cyganek

Posters and Demos

Frontmatter
How the Anti-TrustRank Algorithm Can Help to Protect the Reputation of Financial Institutions

When financial institutions are found to have their customers conduct money laundering through them, they are subjected to large fines. Moreover, the reputation of those institutions suffers greatly through public exposure. Consequently, financial institutions invest significant resources in building systems to automatically detect money laundering in order to minimize the negative impact of money launderers on their reputation. This paper investigates a graph algorithm called Anti-TrustRank and demonstrates how it can be used to identify money launderers. Our approach to using Anti-TrustRank is not replacing money laundering detection systems, rather is generating additional inputs to feed into such systems in order to improve their overall detection accuracy.

Irina Astrova
Punctuation Restoration System for Slovene Language

Punctuation restoration is the process of adding punctuation symbols to raw text. It is typically used as a post-processing task of Automatic Speech Recognition (ASR) systems. In this paper we present an approach for punctuation restoration for texts in Slovene language. The system is trained using bi-directional Recurrent Neural Networks fed by word embeddings only. The evaluation results show our approach is capable of restoring punctuations with a high recall and precision. The F1 score is specifically high for commas and periods, which are considered most important punctuation symbols for the understanding of the ASR based transcripts.

Marko Bajec, Marko Janković, Slavko Žitnik, Iztok Lebar Bajec
Practice and Challenges of (De-)Anonymisation for Data Sharing

Personal data is a necessity in many fields for research and innovation purposes, and when such data is shared, the data controller carries the responsibility of protecting the privacy of the individuals contained in their dataset. The removal of direct identifiers, such as full name and address, is not enough to secure the privacy of individuals as shown by de-anonymisation methods in the scientific literature. Data controllers need to become aware of the risks of de-anonymisation and apply the appropriate anonymisation measures before sharing their datasets, in order to comply with privacy regulations. To address this need, we defined a procedure that makes data controllers aware of the de-anonymisation risks and helps them in deciding the anonymisation measures that need to be taken in order to comply with the General Data Protection Regulation (GDPR). We showcase this procedure with a customer relationship management (CRM) dataset provided by a telecommunications provider. Finally, we recount the challenges we identified during the definition of this procedure and by putting existing knowledge and tools into practice.

Alexandros Bampoulidis, Alessandro Bruni, Ioannis Markopoulos, Mihai Lupu
A Study of Text Summarization Techniques for Generating Meeting Minutes

A lot of research has been conducted all over the world in the domain of automatic text summarization and more specifically using machine learning techniques. Many state of the art prototypes partially solve this problem so we decided to use some of them to build a tool for automatic generation of meeting minutes. In fact, this was not an easy work and this paper presents various experiments that we did using Deep Learning, GANs and Transformers to achieve this goal as well as dead ends we have encountered during this study. We think providing such a feedback may be useful to other researchers who would like to undertake the same type of work to allow them to know where to go and where not to go.

Tu My Doan, Francois Jacquenet, Christine Largeron, Marc Bernard
CCOnto: Towards an Ontology-Based Model for Character Computing

Our lives are rewritten by technology and data, making it crucial for machines to understand humans and their behavior and react accordingly. Technology systems could adapt to different factors such as affect (Affective Computing), personality (Personality Computing), or character (Character Computing). Character consists of personality, affect, socio-cultural embedding, cognitive abilities, health, and all other attributes distinguishing one individual from another. Ontology-based conceptual models representing individuals i.e. their character and resulting behavior in situations is needed for providing a unified framework for building truly interactive and adaptive systems. We propose CCOnto, an ontology for Character Computing that models human character. The ontology is to be used for adaptive interactive systems to understand and predict an individual’s behavior in a given situation, more specifically their performance in different tasks. The developed ontology models the different character attributes, their building blocks, and interactions with each other and with a person’s performance in different tasks.

Alia El Bolock, Cornelia Herbert, Slim Abdennadher
A Tool for the Verification of Decision Model and Notation (DMN) Models

The Decision Model and Notation (DMN) is a decision modelling standard consisting of two levels: the decision requirement diagram (DRD) level which depicts the dependencies between elements involved in the decision model, and the decision logic level, which specifies the underlying decision logic, usually in the form of decision tables. As the decision tables and DRD are modelled in conjunction, the need to verify the consistency of both levels arises. While there have been some works geared towards the verification of decision tables, the DRD-level has been strongly neglected. In this work, we therefore present a tool for the model verification of DMN models at both the logic and the DRD level, along with the performance assessment of the tool.

Faruk Hasić, Carl Corea, Jonas Blatt, Patrick Delfmann, Estefanía Serral
Text as Semantic Fields: Integration of an Enriched Language Conception in the Text Analysis Tool Evoq

Analysis of interviews transcripts plays a key role in many human sciences research protocols. Numerous IT tools are already used to support this task. Most of them leave the interpretation task to the analyst, or involve an implicit conception of language which is rarely questioned.Developed in the context of the EFFaTA-MeM (Evocative Framework For Text Analysis - Mediality Models) trans-disciplinary research project, the Evoq software takes a radically innovative approach. It voluntarily integrates concepts from post-structuralism theory which are thus offered as a reading sieve at the analyst’s disposal.This demo paper briefly introduces the main concepts of post-structuralism, then it presents how these concept are modelised in a formal system. Finally, it shows how this approach is integrated in the Evoq software and how the human scientist can benefit from its functionalities.

Isabelle Linden, Anne Wallemacq, Bruno Dumas, Guy Deville, Antoine Clarinval, Maxime Cauz
MERLIN: An Intelligent Tool for Creating Domain Models

The complexity of modelling languages and the lack of intelligent tool support add unnecessary difficulties to the process of modelling, a process that is in itself already demanding, given the challenges associated to capturing user requirements and abstracting these in the correct way. In the past, the MERODE method has been developed to address the problem of UML’s complexity and lack of formalization. In this paper, we demonstrate how the formalization of a multi-view modelling approach entails the possibility to create smart and user-friendly modelling support.

Monique Snoeck
Business Intelligence and Analytics: On-demand ETL over Document Stores

For many decades, Business Intelligence and Analytics (BI&A) has been associated with relational databases. In the era of big data and NoSQL stores, it is important to provide approaches and systems capable of analyzing this type of data for decision-making. In this paper, we present a new BI&A approach that both: (i) extracts, transforms and loads the required data for OLAP analysis (on-demand ETL) from document stores, and (ii) provides the models and the systems required for suitable OLAP analysis. We focus here, on the on-demand ETL stage where, unlike existing works, we consider the dispersion of data over two or more collections.

Manel Souibgui, Faten Atigui, Sadok Ben Yahia, Samira Si-Said Cherfi
Towards an Academic Abstract Sentence Classification System

This research in progress paper introduces a novel academic abstract sentence classification system intended to improve researcher literature discovery efficiency. The system provides three key functions: 1) displays abstracts with visual identification of each sentence’s indicated literature characteristic class, 2) conversion of unstructured abstracts into structured variants and 3) categorised class sentence extraction available for export to CSV alongside literature metadata. This functionality is made possible by a web application connected to a Python instance via PHP, integration with an open access literature index via an API and a deployed academic abstract sentence classification model. The contribution of the proposed system is its ability to enhance researcher literature discovery. This paper provides context and motivation behind the development of the system, outlines its functionality and provides an outlook for future research.

Connor Stead, Stephen Smith, Peter Busch, Savanid Vatanasakdakul
DiálogoP - A Language and a Graphical Tool for Formally Defining GDPR Purposes

The notion of processing purpose, as set out in the EU General Data Protection Regulation (GDPR), comprises a crucial part of a software system’s privacy policy. Processing purposes are meant to characterize the usage of personal data within a system. In this work, we propose a formal type language for defining purposes as the communication exchanges between a system’s entities, based on session types enhanced with privacy notions. In order to provide software engineers with the means to easily define processing purposes, we encode the formal language syntax to a UML-based domain model and we present DiálogoP, a tool that supports the graphical model definition and subsequently translates it into formal language definitions.

Evangelia Vanezi, Georgia M. Kapitsaki, Dimitrios Kouzapas, Anna Philippou, George A. Papadopoulos
Identifying the Challenges and Requirements of Enterprise Architecture Frameworks for IoT Systems

Enterprise Architecture Frameworks (EAFs) have been around since the last decades of the 20th century. They are a proven practice to analyze, describe, organize, implement and manage changes in the global architecture of an enterprise’s data, processes, applications and technology. Recently, new promising technologies such as big data, machine learning, and the always-and-everywhere connected Internet of Things (IoT), have made their way into all sorts of business-generating activities. The vast number of possible connectable devices, with almost infinite useful applications throughout an enterprise such as operations, human resource management, communications, and customer service, demonstrates the holistic nature of IoT. Because of that, making use of IoT cannot be treated in isolation, but should be integrated in all aspects of Enterprise Architecture. Therefore, this paper identifies the main architectural challenges and derived requirements of IoT systems for an EAF. A literature study and a questionnaire aimed at industry EA experts have been used as main data sources.

Filip Vanhoorelbeke, Monique Snoeck, Estefanía Serral

Doctoral Consortium

Frontmatter
A Holistic Approach Towards Human Factors in Information Security and Risk

Businesses take various precautions and measures to protect their assets, and at the centre of their computer systems are users. Many data breaches originate from accidental human error, which has lasting damaging financial or reputation loss. Although companies intend to change behaviour, one of the biggest problems with this approach is the lack of Psychology informed theories to understand why and how users are targeted. To understand why users defy compliance procedures and policy, despite warnings and training, we need to understand every internal and external factor that contributes to such behaviour. The literature proposes that users are the main cause for system dysfunction, and this is accentuated by media headlines that portray users as the source of the problem. One of the biggest problems is that, research continues to evaluate surface level problems, rather than explore or acknowledge more systemic factors that can have damaging results. In this paper, we discuss factors, that could impact the way that information is processed and how this is translated into action or no action. Also we, identify how an environment can encourage or discourage desired behaviour.

Omolola Fagbule
A Framework for Privacy Policy Compliance in the Internet of Things

Internet of Things (IoT) structures are pervasive, incredibly complex, heterogeneous, based on various architectures and infrastructure. IoT exposes users to a number of different privacy threats that are related to leakage of personal information and loss of service. User privacy is the most important aspect of IoT environments as user’s data are transmitted among connected devices without user’s intervention. Therefore, the challenges that IoT privacy and security analysts are facing is relating to having difficulties to analyse and design such complex, heterogeneous systems by guaranteeing the protection of the exchanged user data. Accordingly, tools to support and guide the analyst are needed, in order to make them to design IoT systems that are compliant with privacy policies. In this paper, preliminary results are provided for designing a tool-supported, theoretical framework, including a privacy policy language and a model for the analysis of IoT systems to enforce the protection of user data in IoT environments. In this work, the literature review is illustrated for identifying the concepts and relationships needed for such a framework, an outline our preliminary design of it and the included components.

Constantinos Ioannou
Social-Based Physical Reconstruction Planning in Case of Natural Disaster: A Machine Learning Approach

Natural disasters have several adverse effects on human lives. It is challenging for the governments to tackle these events and to reconstruct damaged areas with minimal budget and time, but still guaranteeing social benefits to the affected population. This article presents an approach of decision-support system for post-disaster re-construction planning of buildings damaged by a natural disaster. The proposed framework determines a set of alternative plans which satisfy all constraints, accommodate political priorities, and guarantee social benefits for the affected population. The determined plans are then provided to public servants that select the plan to implement. The approach is generic and it can be applied to areas of any extension as long as the decision makers share the same goals. We will demonstrate the approach on the L’Aquila city destroyed by an earthquake in 2009.

Ghulam Mudassir
Explainability Design Patterns in Clinical Decision Support Systems

This paper reports on the ongoing PhD project in the field of explaining the clinical decision support systems (CDSSs) recommendations to medical practitioners. Recently, the explainability research in the medical domain has witnessed a surge of advances with a focus on two main methods: The first focuses on developing models that are explainable and transparent in its nature (e.g. rule-based algorithms). The second investigates the interpretability of the black-box models without looking at the mechanism behind it (e.g. LIME) as a post-hoc explanation. However, overlooking the human-factors and the usability aspect of the explanation introduced new risks following the system recommendations, e.g. over-trust and under-trust. Due to such limitation, there is a growing demand for usable explanations for CDSSs to enable the integration of trust calibration and informed decision-making in these systems by identifying when the recommendation is correct to follow. This research aims to develop explainability design patterns with the aim of calibrating medical practitioners trust in the CDSSs. This paper concludes the PhD methodology and literature around the research problem is also discussed.

Mohammad Naiseh
Backmatter
Metadata
Title
Research Challenges in Information Science
Editors
Fabiano Dalpiaz
Jelena Zdravkovic
Pericles Loucopoulos
Copyright Year
2020
Electronic ISBN
978-3-030-50316-1
Print ISBN
978-3-030-50315-4
DOI
https://doi.org/10.1007/978-3-030-50316-1

Premium Partner