main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 44th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2018, held in Krems, Austria, in January/February 2018.
The 48 papers presented in this volume were carefully reviewed and selected from 97 submissions. They were organized in topical sections named: foundations of computer science; software engineering: advances methods, applications, and tools; data, information and knowledge engineering; network science and parameterized complexity; model-based software engineering; computational models and complexity; software quality assurance and transformation; graph structure and computation; business processes, protocols, and mobile networks; mobile robots and server systems; automata, complexity, completeness; recognition and generation; optimization, probabilistic analysis, and sorting; filters, configurations, and picture encoding; machine learning; text searching algorithms; and data model engineering.

## Inhaltsverzeichnis

### Swift Logic for Big Data and Knowledge Graphs

Overview of Requirements, Language, and System

Many modern companies wish to maintain knowledge in the form of a corporate knowledge graph and to use and manage this knowledge via a knowledge graph management system (KGMS). We formulate various requirements for a fully-fledged KGMS. In particular, such a system must be capable of performing complex reasoning tasks but, at the same time, achieve efficient and scalable reasoning over Big Data with an acceptable computational complexity. Moreover, a KGMS needs interfaces to corporate databases, the web, and machine-learning and analytics packages. We present KRR formalisms and a system achieving these goals. To this aim, we use specific suitable fragments from the Datalog$$^\pm$$± family of languages, and we introduce the vadalog system, which puts these swift logics into action. This system exploits the theoretical underpinning of relevant Datalog$$^\pm$$± languages and combines it with existing and novel techniques from database and AI practice.

Luigi Bellomarini, Georg Gottlob, Andreas Pieris, Emanuel Sallinger

### On Architecture Specification

The design, specification, and correct implementation of an architectural design are after the task of requirements specification the perhaps most important design decisions, when building large software or software based systems. Architectures are responsible for software quality, for a number of quality attributes such as maintainability, portability, changeability, reusability but also reliability, security, and safety. Therefore, the design of architectures is a key issue in system and software development. For highly distributed, networked systems and for cyber-physical systems we need a design concept which supports composition, parallelism, and concurrency and finally real time but keeps all of the general advantages of object-oriented programming. We describe an approach to specify and implement systems along the lines of some of the established concepts of object-orientation – such as inheritance and class instantiation. This leads to an approach that nevertheless provides an execution model which is parallel and concurrent in nature and supports real time and modular composition. This way, it lays the foundation of a software and systems engineering style where classical object-orientation can be extended to cyber-physical systems in straightforward way.

Manfred Broy

### The State of the Art in Dynamic Graph Algorithms

A dynamic graph algorithm is a data structure that supports operations on dynamically changing graphs.

Monika Henzinger

### Diversity in UML Modeling Explained: Observations, Classifications and Theorizations

Modeling is a common part of modern day software engineering practice. Little evidence exists about how models are used in software development and how they help in producing better software. In this talk we introduce a classification-matrix and a theoretical framework that helps explain the large variety of models and modeling styles found in industrial practice. As part of this explanation, we will explore empirical findings on the uses of UML modeling in practice. We intersperse this paper with some insights about modeling in software development that may be common to some, but certainly not generally accepted throughout the software engineering community.

Michel R. V. Chaudron, Ana Fernandes-Saez, Regina Hebig, Truong Ho-Quang, Rodi Jolak

### Self-managing Internet of Things

Internet of Things (IoT) are in full expansion. Applications range from factory floors to smart city environments. IoT applications consist of battery powered small computing devices (motes) that communicate wirelessly and interact with the environment through sensors and actuators. A key challenge that IoT engineers face is how to manage such systems that are subject to inherent uncertainties in their operation contexts, such as interferences and dynamic traffic in the network. Often these uncertainties are difficult to predict at development time. In practice, IoT applications are therefore typically over-provisioned at deployment; however, this leads to inefficiency. In this paper, we make a case for IoT applications that manage themselves at runtime to deal with uncertainties. We contribute: (1) a set of concerns that motivate the need for self-management for IoT systems, (2) three initial approaches that illustrate the potential of realising self-managing IoT systems, and (3) a set of open challenges for future research on self-adaptation in IoT.

Danny Weyns, Gowri Sankar Ramachandran, Ritesh Kumar Singh

### LARS: A Logic-Based Framework for Analytic Reasoning over Streams

(Extended Abstract)

Stream reasoning considers continuously deriving conclusions on streaming data. While traditional stream processing approaches focus on throughput and are often based on operational grounds, reasoning approaches aim at high expressiveness based on declarative semantics; yet according theoretical underpinning in the streaming area has been lacking. To fill this gap, we provide LARS, a Logic-based Framework for Analytic Reasoning over Streams. It provides generic window operators to limit reasoning to recent snapshots of data, and modalities to control the temporal information of data. Building on resulting formulas, a rule-based language is presented which can be seen as extension of Answer Set Programming (ASP) for streams. We study semantic properties and the computational complexity of LARS, its relation to other formalisms and mention various work that builds on it.

Harald Beck, Minh Dao-Tran, Thomas Eiter

### Network Analysis of the Science of Science: A Case Study in SOFSEM Conference

A rising issue in the scientific community entails the identification of temporal patterns in the evolution of the scientific enterprise and the emergence of trends that influence scholarly impact. In this direction, this paper investigates the mechanism with which citation accumulation occurs over time and how this affects the overall impact of scientific output. Utilizing data regarding the SOFSEM Conference (International Conference on Current Trends in Theory and Practice of Computer Science), we study a corpus of 1006 publications with their associated authors and affiliations to uncover the effects of collaboration network on the conference output. We proceed to group publications into clusters based on the trajectories they follow in their citation acquisition. Representative patterns are identified to characterize dominant trends of the conference, while exploring phenomena of early and late recognition by the scientific community and their correlation with impact.

Antonia Gogoglou, Theodora Tsikrika, Yannis Manolopoulos

### The Parameterized Complexity of Centrality Improvement in Networks

The centrality of a vertex v in a network intuitively captures how important v is for communication in the network. The task of improving the centrality of a vertex has many applications, as a higher centrality often implies a larger impact on the network or less transportation or administration cost. In this work we study the parameterized complexity of the NP-complete problems Closeness Improvement and Betweenness Improvement in which we ask to improve a given vertex’ closeness or betweenness centrality by a given amount through adding a given number of edges to the network. Herein, the closeness of a vertex v sums the multiplicative inverses of distances of other vertices to v and the betweenness sums for each pair of vertices the fraction of shortest paths going through v. Unfortunately, for the natural parameter “number of edges to add” we obtain hardness results, even in rather restricted cases. On the positive side, we also give an island of tractability for the parameter measuring the vertex deletion distance to cluster graphs.

Clemens Hoffmann, Hendrik Molter, Manuel Sorge

### Local Structure Theorems for Erdős–Rényi Graphs and Their Algorithmic Applications

We analyze local properties of sparse Erdős–Rényi graphs, where d(n)/n is the edge probability. In particular we study the behavior of very short paths. For $$d(n)=n^{o(1)}$$d(n)=no(1) we show that $$G(n,d(n)/n)$$G(n,d(n)/n) has asymptotically almost surely (a.a.s.) bounded local treewidth and therefore is a.a.s. nowhere dense. We also discover a new and simpler proof that $$G(n,d/n)$$G(n,d/n) has a.a.s. bounded expansion for constant d. The local structure of sparse Erdős–Rényi graphs is very special: The r-neighborhood of a vertex is a tree with some additional edges, where the probability that there are m additional edges decreases with m. This implies efficient algorithms for subgraph isomorphism, in particular for finding subgraphs with small diameter. Finally, experiments suggest that preferential attachment graphs might have similar properties after deleting a small number of vertices.

Jan Dreier, Philipp Kuinke, Ba Le Xuan, Peter Rossmanith

### Target Set Selection Parameterized by Clique-Width and Maximum Threshold

The Target Set Selection problem takes as an input a graph G and a non-negative integer threshold $$\mathsf {thr}(v)$$thr(v) for every vertex v. A vertex v can get active as soon as at least $$\mathsf {thr}(v)$$thr(v) of its neighbors have been activated. The objective is to select a smallest possible initial set of vertices, the target set, whose activation eventually leads to the activation of all vertices in the graph.We show that Target Set Selection is in FPT when parameterized with the combined parameters clique-width of the graph and the maximum threshold value. This generalizes all previous FPT-membership results for the parameterization by maximum threshold, and thereby solves an open question from the literature. We stress that the time complexity of our algorithm is surprisingly well-behaved and grows only single-exponentially in the parameters.

Tim A. Hartmann

### Combining Versioning and Metamodel Evolution in the ChronoSphere Model Repository

Model Driven Engineering (MDE) has gained a lot of popularity in recent years and is being applied in a wide variety of domains. As teams, models and applications grow in size, the need for faster and more scalable technology emerges, in particular in the crucial area of model repositories. These software components are responsible for persisting, querying and versioning the model content and act as central hubs for interaction with the model. However, existing repository solutions do not consider metamodel evolution, which is important in long-running projects. In this paper, we present ChronoSphere, a novel model repository, targeted specifically towards developers working with MDE technology in industry, with a focus on models-at-runtime scenarios. By utilizing the latest innovations in graph databases and version control, ChronoSphere provides transparent and efficient versioning as well as metamodel evolution capabilities. This paper focuses on the core concepts of ChronoSphere, in particular data management, versioning and metamodel evolution. Our open-source implementation serves as proof of concept.

Martin Haeusler, Thomas Trojer, Johannes Kessler, Matthias Farwick, Emmanuel Nowakowski, Ruth Breu

### Automated Change Propagation from Source Code to Sequence Diagrams

Sequence diagrams belong to three most frequently used UML diagrams and they are often an integral part of a software design. Designers utilize sequence diagrams to define and visualize designed software’s behavior. But during software development and maintenance, multiple vendor’s changes are implemented into a source code. These changes lead to inconsistencies between a software model and the source code, that are omitted due to lack of time. This paper is focused on problems with automated source code changes propagation into UML sequence diagrams. In the paper, we propose the architecture for synchronization of outdated designers’ sequence diagrams with current software behavior implemented in a source code. The proposed architecture is focused on updating and not on regenerating sequence diagrams, what helps designers to understand modified behavior and changes provided in it. We evaluated the proposed architecture via implemented extension for Eclipse Papyrus, which analyzes differences between sequence diagrams and source code model, and based on developers’ styles, it propagates differences to sequence diagrams.

Karol Rástočný, Andrej Mlynčár

### Multi-paradigm Architecture Constraint Specification and Configuration Based on Graphs and Feature Models

Currently, architecture constraints can be specified and checked in different paradigms of software development, the object-oriented, component-based and service-based one. But the current state of the art and practice do not consider their specification at a high level of abstraction, independently from any paradigm vocabulary. We propose in this paper a process combining graphs and feature modeling to specify multi-paradigm architecture constraints. These constraints are expressed with OCL on a particular meta-model of graphs. Then these constraints can be transformed to any chosen paradigm, after their configuration using a feature/variability model. This transformation allows later to handle these constraints in that (chosen) paradigm: to refine them, to generate source code from them, and to check them on models and on source code. A case study is presented in this paper; it concerns architecture constraint specification and configuration under software migration from the object-oriented to the component-based paradigm.

Sahar Kallel, Chouki Tibermacine, Ahmed Hadj Kacem, Christophe Dony

### Lower Bounds and Hierarchies for Quantum Memoryless Communication Protocols and Quantum Ordered Binary Decision Diagrams with Repeated Test

We explore multi-round quantum memoryless communication protocols. These are restricted version of multi-round quantum communication protocols. The “memoryless” term means that players forget history from previous rounds, and their behavior is obtained only by input and message from the opposite player. The model is interesting because this allows us to get lower bounds for models like automata, Ordered Binary Decision Diagrams and streaming algorithms. At the same time, we can prove stronger results with this restriction. We present a lower bound for quantum memoryless protocols. Additionally, we show a lower bound for Disjointness function for this model. As an application of communication complexity results, we consider Quantum Ordered Read-k-times Branching Programs (k-QOBDD). Our communication complexity result allows us to get lower bound for k-QOBDD and to prove hierarchies for sublinear width bounded error k-QOBDDs, where $$k=o(\sqrt{n})$$k=o(n). Furthermore, we prove a hierarchy for polynomial size bounded error k-QOBDDs for constant k. This result differs from the situation with an unbounded error where it is known that an increase of k does not give any advantage.

Farid Ablayev, Andris Ambainis, Kamil Khadiev, Aliya Khadieva

### Computational Complexity of Atomic Chemical Reaction Networks

Informally, a chemical reaction network is “atomic” if each reaction may be interpreted as the rearrangement of indivisible units of matter. There are several reasonable definitions formalizing this idea. We investigate the computational complexity of deciding whether a given network is atomic according to each of these definitions.Primitive atomic, which requires each reaction to preserve the total number of atoms, is shown to be equivalent to mass conservation. Since it is known that it can be decided in polynomial time whether a given chemical reaction network is mass-conserving [28], the equivalence we show gives an efficient algorithm to decide primitive atomicity.Subset atomic further requires all atoms be species. We show that deciding if a network is subset atomic is in $$\mathsf {NP}$$NP, and “whether a network is subset atomic with respect to a given atom set” is strongly $$\mathsf {NP}$$NP-$$\mathsf {complete}$$complete.Reachably atomic, studied by Adleman, Gopalkrishnan et al. [1, 22], further requires that each species has a sequence of reactions splitting it into its constituent atoms. Using a combinatorial argument, we show that there is a polynomial-time algorithm to decide whether a given network is reachably atomic, improving upon the result of Adleman et al. that the problem is decidable. We show that the reachability problem for reachably atomic networks is $$\mathsf {PSPACE}$$PSPACE-$$\mathsf {complete}$$complete.Finally, we demonstrate equivalence relationships between our definitions and some cases of an existing definition of atomicity due to Gnacadja [21].

David Doty, Shaopeng Zhu

### Conjugacy of One-Dimensional One-Sided Cellular Automata is Undecidable

Two cellular automata are strongly conjugate if there exists a shift-commuting conjugacy between them. We prove that the following two sets of pairs (F, G) of one-dimensional one-sided cellular automata over a full shift are recursively inseparable:(i)pairs where F has strictly larger topological entropy than G, and(ii)pairs that are strongly conjugate and have zero topological entropy.Because there is no factor map from a lower entropy system to a higher entropy one, and there is no embedding of a higher entropy system into a lower entropy system, we also get as corollaries that the following decision problems are undecidable: Given two one-dimensional one-sided cellular automata F and G over a full shift: Are F and G conjugate? Is F a factor of G? Is F a subsystem of G? All of these are undecidable in both strong and weak variants (whether the homomorphism is required to commute with the shift or not, respectively). It also immediately follows that these results hold for one-dimensional two-sided cellular automata.

Joonatan Jalonen, Jarkko Kari

### Formal Verification and Safety Assessment of a Hemodialysis Machine

Given the safety-critical nature of healthcare systems, their rigorous safety assessment, in terms of studying their behavior in the presence of potential faults and how the malfunctioning components cause system failures, is of paramount importance. Traditionally, the safety assessment of a system is done analytically or using simulation based tools. However, the former is prone to human error and the later does not provide a complete analysis, which makes them inappropriate for the safety assessment of healthcare systems. These limitations can be overcome by using formal methods based safety assessment. This paper presents our experience of applying model based safety assessment and system verification tools on a hemodialysis machine. In particular, we use the nuXmv model checker to formally verify a formal model of the given hemodialysis machine. The formal model of the given system is then extended with various fault modes of the system components and the eXtended Safety Assessment Platform is used to check various undesired behaviors of the system using invariant properties defined as Top Level Events. This way, we can automatically generate the FTA and FMEA to do the safety assessment of the given hemodialysis machine.

Shahid Khan, Osman Hasan, Atif Mashkoor

### Automatic Decomposition of Java Open Source Pull Requests: A Replication Study

The presence of large changesets containing several independent modifications (e.g. bug fixes, refactorings, features) can negatively affect the efficacy of code review. To cope with this problem, Barnett et al. developed ClusterChanges — a lightweight static analysis technique for decomposing changesets in different partitions that can be reviewed independently. They have found that ClusterChanges can indeed decompose such changesets and that developers agree with their decomposition. However, the authors’ restricted their analysis to software that is: (i) closed source, (ii) written in C# and (iii) developed by a single organization. To address this threat to validity, we implemented JClusterChanges, a free and open source (FOSS) implementation of ClusterChanges for Java software, and replicated the original Barnett et al. study using changesets from Java open source projects hosted on GitHub. We found that open source changesets are similar to the changesets of the original study. Thus, our research confirms that the problem is relevant to other environments and provides a FOSS implementation of ClusterChanges that not only shows the feasibility of the technique in other contexts but can also be used to help future research.

Victor da C. Luna Freire, João Brunet, Jorge C. A. de Figueiredo

### Transformation of OWL2 Property Axioms to Groovy

Ontology is a formal representation of domain knowledge. It may be effectively used in software development – large parts of the object-oriented code can be automatically generated from existing domain ontologies. The paper is related to transformations from OWL2 to Groovy. It proposes transformations of OWL2 properties together with object property axioms. Many axioms, e.g. asymmetry, irreflexivity have not been considered in the existing literature up to now. Mapping of some others is incomplete. Proposed transformations preserve the OWL2 semantics of axioms, assuring model consistency with the original definition. The implemented rules either guarantee consistency of the source code by performing additional actions ‘behind the scene’ or prohibit inconsistency by throwing exceptions. As a result, their application can speed up the development process and produce the source code of high quality at the same time. All defined transformation rules were implemented and verified by several examples. A bigger case study confirmed the usability of the rules. Both the tool as well as the case study are publicly available.

Bogumiła Hnatkowska, Paweł Woroniecki

### Simple Paths and Cycles Avoiding Forbidden Paths

A graph with forbidden paths is a pair (G, F) where G is a graph and F is a subset of the set of paths in G. A simple path avoiding forbidden paths in (G, F) is a simple path in G such that each subpath is not in F. It is shown in [S. Szeider, Finding paths in graphs avoiding forbidden transitions, DAM 126] that the problem of deciding the existence of a simple path avoiding forbidden paths in a graph with forbidden paths is Np-complete even when the forbidden paths are restricted to be transitions, i.e., of length two. We give an exact exponential time algorithm that decides in $$O(2^nn^{2k+O(1)})$$O(2nn2k+O(1)) time and $$O(n^{k+O(1)})$$O(nk+O(1)) space whether there exists a simple path between two vertices of a given n-vertex graph where k is the length of the longest forbidden path. We also obtain an exact $$O(2^nn^{2k+O(1)})$$O(2nn2k+O(1)) time and $$O(n^{k+O(1)})$$O(nk+O(1)) space algorithm to check the existence of a Hamiltonian path avoiding forbidden paths and for the graphs with forbidden transitions an exact $$O^*(2^n)$$O∗(2n) time and polynomial space algorithm to check the existence of a Hamiltonian cycle avoiding forbidden transitions. In the last section, we present a new sufficient condition for graphs to have a Hamiltonian cycle, which gives us some interesting corollaries for graphs with forbidden paths.

Benjamin Momège

### External Memory Algorithms for Finding Disjoint Paths in Undirected Graphs

Consider the following well-known combinatorial problem: given an undirected graph $$G = (V,E)$$ G=(V,E), terminals $$s, t \in V$$ s,t∈V, and an integer $$k \ge 1$$ k≥1, find k edge-disjoint s–t paths in G or report that such paths do not exist.We study this problem in the external memory (EM) model of Agrawal and Vitter, i.e. assume that only M words of random access memory (RAM) are available while the graph resides in EM, which enables reading and writing contiguous blocks of B words per single I/O. The latter external memory is also used for storing the output and some intermediate data.For $$k = 1$$ k=1, the problem consists in finding a single s–t path in an undirected graph and can be solved in $$Conn(V,E) = O\left( \frac{V+E}{V}\right. \left. Sort(V) \log \log \frac{VB}{E}\right)$$ Conn(V,E)=OV+EVSort(V)loglogVBE I/Os, where $$Sort(N) = O\left( \frac{N}{B}\log _{M/B} \frac{N}{B}\right)$$ Sort(N)=ONBlogM/BNB is the complexity of sorting N words in external memory.Our contribution is two novel EM algorithms that solve the problem for $$k \le \frac{M}{B}$$ k≤MB. The first takes $$O(k \cdot Conn(V,E))$$ O(k·Conn(V,E)) I/Os. The second one applies the ideas of Ibaraki–Nagamochi sparse connectivity certificates and takes $$O\left( (Sort(V+E) + k \cdot Conn(V,kV)) \cdot \log \frac{V}{M}\right)$$ O(Sort(V+E)+k·Conn(V,kV))·logVM I/Os, which improves upon the first bound for sufficiently dense graphs.Both algorithms outperform the naive approach based on successive BFS- or DFS-augmentations for a wide range of parameters $$| V |$$ |V|, $$| E |$$ |E|, M, B.

Maxim Babenko, Ignat Kolesnichenko

### On Range and Edge Capacity in the Congested Clique

The congested clique is a synchronous, message-passing model of distributed computing in which each computational unit (node) in each round can send message of $$O(\log n)$$O(logn) bits to each other node of the network, where n is the number of nodes.Following recent progress in design of algorithms for graph connectivity and minimum spanning tree (MST) in the congested clique, we study these problems in limited variants of the congested clique. We show that MST can be computed deterministically and connected components can be computed by a randomized algorithm with optimal edge capacity $$\varTheta (\log n)$$Θ(logn), while preserving the best known round complexity [6, 13]. Moreover, our algorithms work in the rcast model with range $$r=2$$r=2, the weakest model of the congested clique above the broadcast variant ($$r=1$$r=1) in the hierarchy with respect to the range [2].

Tomasz Jurdziński, Krzysztof Nowicki

### Global vs. Local Semantics of BPMN 2.0 OR-Join

Nowadays, BPMN 2.0 has acquired a clear predominance for modelling business processes. However, one of its drawback is the lack of a formal semantics, that leads to different interpretations, and hence implementations, of some of its features. This, as a matter of fact, results on process implementations using such features that do not fit with designers expectations, and that are not portable from one BPMN enactment tools to another. Among the BPMN elements particular ambiguous is the semantics of the OR-Join. Several formalisations of this element have been proposed in the literature, but none of them is derived from a direct and faithful translation of the current version of BPMN standard. In this work we instead provide direct, global and local, formalisations compliant with the OR-Join semantics reported in the BPMN 2.0 standard. In particular, the local semantics is devised to more efficiently determine the OR-Join enablement. The soundness of the approach is given by demonstrating the correspondence of the local semantics with respect to the global one.

Flavio Corradini, Chiara Muzi, Barbara Re, Lorenzo Rossi, Francesco Tiezzi

### AODVv2: Performance vs. Loop Freedom

We compare two evolutions of the Ad-hoc On-demand Distance Vector (AODV) routing protocol, i.e. DYMO and AODVv2-16. In particular, we apply statistical model checking to investigate the performance of these two protocols in terms of routes established and looping routes. Our modelling and analysis are carried out by the Uppaal Statistical Model Checker on 3$$\,\times \,$$×3 grids, with possibly lossy communication.

Mojgan Kamali, Massimo Merro, Alice Dal Corso

### Multivendor Deployment Integration for Future Mobile Networks

During the last few years, we have seen a tremendous explosion in the range of possibilities when speaking about software delivery. The web-scale IT capabilities have evolved drastically and complex web-based applications have adapted rapidly in an ever changing world where user experience is in the focus. Terms like Agile, Cloudification, microservices or DevOps are lately on the crest of the wave.The focus of this paper, however, is not within the managed services providers, but with the network providers, or operators. These companies are experiencing similar challenges as described above especially with the imminent arrival of 5th generation mobile networks, but with a different set of constraints that make the adoption of the new paradigms or best practices a tough process. Along the paper we will cover the main bottlenecks that operators face in terms of adapting to the ever increasing network needs, paying especial attention to the multivendor nature and the extreme high availability expected in this kind of services. We present our tool for scheduling deployments in this special environment with the related workflows and leveraged DevOps best practices. Finally, we measure how a proof of concept tool helps to improve the delivery process in multivendor environments.

Manuel Perez Martinez, Tímea László, Norbert Pataki, Csaba Rotter, Csaba Szalai

### Patrolling a Path Connecting a Set of Points with Unbalanced Frequencies of Visits

Patrolling consists of scheduling perpetual movements of a collection of mobile robots, so that each point of the environment is regularly revisited by any robot in the collection. In previous research, it was assumed that all points of the environment needed to be revisited with the same minimal frequency.In this paper we study efficient patrolling protocols for points located on a path, where each point may have a different constraint on frequency of visits. The problem of visiting such divergent points was recently posed by Gąsieniec et al. in [14], where the authors study protocols using a single robot patrolling a set of n points located in nodes of a complete graph and in Euclidean spaces.The focus in this paper is on patrolling with two robots. We adopt a scenario in which all points to be patrolled are located on a line. We provide several approximation algorithms concluding with the best currently known $$\sqrt{3}$$3-approximation.

Huda Chuangpishit, Jurek Czyzowicz, Leszek Gąsieniec, Konstantinos Georgiou, Tomasz Jurdziński, Evangelos Kranakis

### Exploring Graphs with Time Constraints by Unreliable Collections of Mobile Robots

A graph environment must be explored by a collection of mobile robots. Some of the robots, a priori unknown, may turn out to be unreliable. The graph is weighted and each node is assigned a deadline. The exploration is successful if each node of the graph is visited before its deadline by a reliable robot. The edge weight corresponds to the time needed by a robot to traverse the edge. Given the number of robots which may crash, is it possible to design an algorithm, which will always guarantee the exploration, independently of the choice of the subset of unreliable robots by the adversary? We find the optimal time, during which the graph may be explored. Our approach permits to find the maximal number of robots, which may turn out to be unreliable, and the graph is still guaranteed to be explored.We concentrate on line graphs and rings, for which we give positive results. We start with the case of the collections involving only reliable robots. We give algorithms finding optimal times needed for exploration when the robots are assigned to fixed initial positions as well as when such starting positions may be determined by the algorithm. We extend our consideration to the case when some number of robots may be unreliable. Our most surprising result is that solving the line exploration problem with robots at given positions, which may involve crash-faulty ones, is NP-hard. The same problem has polynomial solutions for a ring and for the case when the initial robots’ positions on the line are arbitrary. The exploration problem is shown to be NP-hard for star graphs, even when the team consists of only two reliable robots.

Jurek Czyzowicz, Maxime Godon, Evangelos Kranakis, Arnaud Labourel, Euripides Markou

### The k-Server Problem with Advice in d Dimensions and on the Sphere

We study the impact of additional information on the hardness of the k-server problem on different metric spaces. To this end, we consider the well-known model of computing with advice. In particular, we design an algorithm for the d-dimensional Euclidean space, which generalizes a known result for the Euclidean plane. As another relevant setting, we investigate a metric space with positive curvature; in particular, the sphere. Both algorithms have constant strict competitive ratios while reading a constant number of advice bits with every request, independent of the number k of servers, and solely depending on parameters of the underlying metric structure.

Elisabet Burjons, Dennis Komm, Marcel Schöngens

### Deciding Universality of ptNFAs is PSpace-Complete

An automaton is partially ordered if the only cycles in its transition diagram are self-loops. We study the universality problem for ptNFAs, a class of partially ordered NFAs recognizing piecewise testable languages. The universality problem asks if an automaton accepts all words over its alphabet. Deciding universality for both NFAs and partially ordered NFAs is PSpace-complete. For ptNFAs, the complexity drops to coNP-complete if the alphabet is fixed but is open if the alphabet may grow. We show, using a novel and nontrivial construction, that the problem is PSpace-complete if the alphabet may grow polynomially.

Tomáš Masopust, Markus Krötzsch

### Theoretical Aspects of Symbolic Automata

Symbolic finite automata extend classical automata by allowing infinite alphabets given by Boolean algebras and having transitions labeled by predicates over such algebras. Symbolic automata have been intensively studied recently and they have proven useful in several applications. We study some theoretical aspects of symbolic automata. Especially, we study minterms of symbolic automata, that is, the set of maximal satisfiable Boolean combinations of predicates of automata. We define canonical minterms of a language accepted by a symbolic automaton and show that these minterms can be used to define symbolic versions of some known classical automata. Also we show that canonical minterms have an important role in finding minimal nondeterministic symbolic automata. We show that Brzozowski’s double-reversal method for minimizing classical deterministic automata as well as its generalization is applicable for symbolic automata.

Hellis Tamm, Margus Veanes

### Complete Algorithms for Algebraic Strongest Postconditions and Weakest Preconditions in Polynomial ODE’S

A system of polynomial ordinary differential equations (ode’s) is specified via a vector of multivariate polynomials, or vector field, F. A safety assertion $$\psi \longrightarrow [F]\,\phi$$ψ⟶[F]ϕ means that the system’s trajectory will lie in a subset $$\phi$$ϕ (the postcondition) of the state-space, whenever the initial state belongs to a subset $$\psi$$ψ (the precondition). We consider the case when $$\phi$$ϕ and $$\psi$$ψ are algebraic varieties, that is, zero sets of polynomials. In particular, polynomials specifying the postcondition can be seen as conservation laws implied by $$\psi$$ψ. Checking the validity of algebraic safety assertions is a fundamental problem in, for instance, hybrid systems. We consider generalized versions of this problem, and offer algorithms to: (1) given a user specified polynomial set P and a precondition $$\psi$$ψ, find the smallest algebraic postcondition $$\phi$$ϕ including the variety determined by the valid conservation laws in P (relativized strongest postcondition); (2) given a user specified postcondition $$\phi$$ϕ, find the largest algebraic precondition $$\psi$$ψ (weakest precondition). The first algorithm can also be used to find the weakest algebraic invariant of the system implying all conservation laws in P valid under $$\psi$$ψ. The effectiveness of these algorithms is demonstrated on a challenging case study from the literature.

Michele Boreale

### Influence of Body Postures on Touch-Based Biometric User Authentication

Due to user mobility, many factors may influence a user’s touch screen behavior and consequently affect the accuracy of user authentication based on touch-screen behavioral biometrics. Existing studies have shown significant differences in typing behavior for different hand postures. In this paper we examine touch-based biometric user authentication when performing simple swipes under different body postures. Our proposed authentication method generates numerous features from the swipes and selects the most distinctive features using mutual information. Using an experimental dataset with 43 participants, we have concluded that performing swipes in different body postures has a negative impact on authentication accuracy and that several features based on finger touch size are not significantly affected by different body postures.

Kamil Burda, Daniela Chuda

### Michiko: Poem Models used in Automated Haiku Poetry Generation

Computational creativity is a part of artificial intelligence that sets its goal to determine what is creativity and what does it mean to be creative - not only in humans, but in general. It also aims to mimic creativity by computer programs. There are multiple approaches and systems that are imitating creativity in different artistic forms. The approach proposed in this article suggests a method for automated generation of written text, specifically haiku poetry. Experiments with system that implements the proposed approach are presented.

Miroslava Hrešková, Kristína Machová

### House Allocation Problems with Existing Tenants and Priorities for Teacher Recruitment

We study a variant of house allocation problems with application to a real-world job assignment problem where some applicants cannot result unmatched. Such applicants hold posts initially, which enter the market if they can get strictly better posts. All applicants are strictly ordered by priority in a single master list. Their preference lists may be incomplete and may contain ties. We seek a matching that assigns the best possible post to each applicant, taking into account their preferences, priorities and initial posts. We give algorithms for solving the problem in polynomial time for three different cases.

Ana Paula Tomás

### Runtime Distributions and Criteria for Restarts

Randomized algorithms sometimes employ a restart strategy. After a certain number of steps, the current computation is aborted and restarted with a new, independent random seed. In some cases, this results in an improved overall expected runtime. This work introduces properties of the underlying runtime distribution which determine whether restarts are advantageous. The most commonly used probability distributions admit the use of a scale and a location parameter. Location parameters shift the density function to the right, while scale parameters affect the spread of the distribution. It is shown that for all distributions scale parameters do not influence the usefulness of restarts and that location parameters only have a limited influence. This result simplifies the analysis of the usefulness of restarts. The most important runtime probability distributions are the log-normal, the Weibull, and the Pareto distribution. In this work, these distributions are analyzed for the usefulness of restarts. Secondly, a condition for the optimal restart time (if it exists) is provided. The log-normal, the Weibull, and the generalized Pareto distribution are analyzed in this respect. Moreover, it is shown that the optimal restart time is also not influenced by scale parameters and that the influence of location parameters is only linear.

Jan-Hendrik Lorenz

### Inversions from Sorting with Distance-Based Errors

We study the number of inversions after running the Insertion Sort or Quicksort algorithm, when errors in the comparisons occur with some probability. We investigate the case in which probabilities depend on the difference between the two numbers to be compared and only differences up to some threshold $$\tau$$τ are prone to errors. We give upper bounds for this model and show that for constant $$\tau$$τ, the expected number of inversions is linear in the number of elements to be sorted. For Insertion Sort, we also yield an upper bound on the expected number of runs, i.e., the number of consecutive increasing subsequences.

Barbara Geissmann, Paolo Penna

### An Optimization Problem Related to Bloom Filters with Bit Patterns

Bloom filters are hash-based data structures for membership queries without false negatives widely used across many application domains. They also have become a central data structure in bioinformatics. In genomics applications and DNA sequencing the number of items and number of queries are frequently measured in the hundreds of billions. Consequently, issues of cache behavior and hash function overhead become a pressing issue. Blocked Bloom filters with bit patterns offer a variant that can better cope with cache misses and reduce the amount of hashing. In this work we state an optimization problem concerning the minimum false positive rate for given numbers of memory bits, stored elements, and patterns. The aim is to initiate the study of pattern designs best suited for the use in Bloom filters. We provide partial results about the structure of optimal solutions and a link to two-stage group testing.

Peter Damaschke, Alexander Schliep

### Nivat’s Conjecture Holds for Sums of Two Periodic Configurations

Nivat’s conjecture is a long-standing open combinatorial problem. It concerns two-dimensional configurations, that is, maps $$\mathbb {Z}^2 \rightarrow \mathcal {A}$$ Z2→A where $$\mathcal {A}$$ A is a finite set of symbols. Such configurations are often understood as colorings of a two-dimensional square grid. Let $$P_c(m,n)$$ Pc(m,n) denote the number of distinct $$m \times n$$ m×n block patterns occurring in a configuration c. Configurations satisfying $$P_c(m,n) \le mn$$ Pc(m,n)≤mn for some $$m,n \in \mathbb {N}$$ m,n∈N are said to have low rectangular complexity. Nivat conjectured that such configurations are necessarily periodic.Recently, Kari and the author showed that low complexity configurations can be decomposed into a sum of periodic configurations. In this paper we show that if there are at most two components, Nivat’s conjecture holds. As a corollary we obtain an alternative proof of a result of Cyr and Kra: If there exist $$m,n \in \mathbb {N}$$ m,n∈N such that $$P_c(m,n) \le mn/2$$ Pc(m,n)≤mn/2, then c is periodic. The technique used in this paper combines the algebraic approach of Kari and the author with balanced sets of Cyr and Kra.

### Encoding Pictures with Maximal Codes of Pictures

A picture is a two-dimensional counterpart of a string and it is represented by a rectangular array of symbols over a finite alphabet $$\varSigma$$Σ. A set X of pictures over $$\varSigma$$Σ is a code if every picture over $$\varSigma$$Σ is tilable in at most one way with pictures in X. Recently, the definition of strong prefix code was introduced as a decidable family of picture codes, and a construction procedure for maximal strong prefix (MSP) codes was proposed. Unfortunately, the notion of completeness cannot be directly transposed from strings to pictures without loosing important properties. We generalize to pictures a special property satisfied by complete set of strings that allow to prove interesting characterization results for MSP codes. Moreover, we show an encoding algorithm for pictures using pictures from a MSP code. The algorithm is based on a new data structure for the representation of MSP codes.

Marcella Anselmo, Dora Giammarresi, Maria Madonia

### ARCID: A New Approach to Deal with Imbalanced Datasets Classification

Classification is one of the most fundamental and well-known tasks in data mining. Class imbalance is the most challenging issue encountered when performing classification, i.e. when the number of instances belonging to the class of interest (minor class) is much lower than that of other classes (major classes). The class imbalance problem has become more and more marked while applying machine learning algorithms to real-world applications such as medical diagnosis, text classification, fraud detection, etc. Standard classifiers may yield very good results regarding the majority classes. However, this kind of classifiers yields bad results regarding the minority classes since they assume a relatively balanced class distribution and equal misclassification costs. To overcome this problem, we propose, in this paper, a novel associative classification algorithm called Association Rule-based Classification for Imbalanced Datasets (ARCID). This algorithm aims to extract significant knowledge from imbalanced datasets by emphasizing on information extracted from minor classes without drastically impacting the predictive accuracy of the classifier. Experimentations, against five datasets obtained from the UCI repository, have been conducted with reference to four assessment measures. Results show that ARCID outperforms standard algorithms. Furthermore, it is very competitive to Fitcare which is a class imbalance insensitive algorithm.

Safa Abdellatif, Mohamed Ali Ben Hassine, Sadok Ben Yahia, Amel Bouzeghoub

### Fake Review Detection via Exploitation of Spam Indicators and Reviewer Behavior Characteristics

The rapid spread of Internet technologies has redefined E-commerce, since opinion sharing by product reviews is an inseparable part of online purchasing. However, e-commerce openness has attracted malicious behaviors often expressed by fake reviews targeting public opinion manipulation. To address this phenomenon, several approaches have been introduced to detect spam reviews and spammer activity. In this paper, we propose an approach which integrates content and usage information to detect fake product reviews. The proposed model exploits both product reviews and reviewers’ behavioral traits interlinked by specific spam indicators. In our proposed method, a fine-grained burst pattern detection is employed to better examine reviews generated over “suspicious” time intervals. Reviewer’s past reviewing history is also exploited to determine the reviewer’s overall “authorship” reputation as an indicator of their recent reviews’ authenticity level. The proposed approach is validated with a real-world Amazon review dataset. Experimentation results show that our method successfully detects spam reviews thanks to the complementary nature of the employed techniques and indicators.

Ioannis Dematis, Eirini Karapistoli, Athena Vakali

### Mining Spatial Gradual Patterns: Application to Measurement of Potentially Avoidable Hospitalizations

Gradual patterns aim at automatically extracting co-variations between variables of data sets in the form of “the more/the less” such as “the more experience, the higher salary”. This data mining method has been applied more and more in finding knowledge recently. However, gradual patterns are still not applicable on spatial data while such information have strong presence in many application domains. For instance, in our work we consider the issue of potentially avoidable hospitalizations. Their determinants have been studied to improve the quality, efficiency, and equity of health care delivery. Although the statistical methods such as regression method can find the associations between the increased potentially avoidable hospitalizations with its determinants such as lower density of ambulatory care nurses, there is still a challenge to identify how the geographical areas follow or not the tendencies. Therefore, in this paper, we propose to extend gradual patterns to the management of spatial data. Our work is twofold. First we propose a methodology for extracting gradual patterns at several hierarchical levels. In addition, we introduce a methodology for visualizing this knowledge. For this purpose, we rely on spatial maps for allowing decision makers to easily notice how the areas follow or not the gradual patterns. Our work is applied to the measure of the potentially avoidable hospitalizations to prove its interest.

Tu Ngo, Vera Georgescu, Anne Laurent, Thérèse Libourel, Grégoire Mercier

### New Variants of Pattern Matching with Constants and Variables

Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection constraint. In this paper, we discuss variants of those problems, where one can substitute a constant or a variable for each variable of the pattern. We give two kinds of algorithms for both problems, a convolution-based method and an extended KMP-based method, and analyze their complexity.

Yuki Igarashi, Diptarama, Ryo Yoshinaka, Ayumi Shinohara

### Duel and Sweep Algorithm for Order-Preserving Pattern Matching

Given a text and a pattern over an alphabet, the classic exact matching problem searches for all occurrences of the pattern in the text. Unlike exact matching, order-preserving pattern matching (OPPM) considers the relative order of elements, rather t han their real values. In this paper, we propose an efficient algorithm for the OPPM problem using the “duel-and-sweep” paradigm. For a pattern of length m and a text of length n, our algorithm runs in $$O(n + m\log m)$$O(n+mlogm) time in general, and in $$O(n + m)$$O(n+m) time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster than the KMP-based algorithm.

Davaajav Jargalsaikhan, Diptarama, Yohei Ueki, Ryo Yoshinaka, Ayumi Shinohara

### Longest Common Prefixes with k-Mismatches and Applications

We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size $$\sigma$$σ that occurs elsewhere in the string with Hamming distance at most k. Specifically, we show that the proposed algorithm requires time $$\mathcal {O}(n (\sigma R)^k \log \log n (\log k+ \log \log n))$$O(n(σR)kloglogn(logk+loglogn)) on average, where $$R=\lceil (k+2) (\log _{\sigma } n+1) \rceil$$R=⌈(k+2)(logσn+1)⌉, and space $$\mathcal {O}(n)$$O(n). This improves upon the state-of-the-art average-case time complexity for the case when $$k=1$$k=1 [23] by a factor of $$\log n / \log ^3 \log n$$logn/log3logn. In addition, we show how the proposed technique can be adapted and applied in order to compute the longest previous factors under the Hamming distance model within the same complexities. In terms of real-world applications, we show that our technique can be directly applied to the problem of genome mappability.

Hayam Alamro, Lorraine A. K. Ayad, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis

### Managing Reduction in Multidimensional Databases

Dealing with large amount of data has always been a key focus of the Multidimensional Database (MDB) community, especially in the current era when data volume increases more and more rapidly. In this paper, we outline a conceptual modeling solution allowing reducing data in MDBs. A MDB after reduction is modeled with multiple states. Each state is valid during a period of time and aggregates data from a more recent state. We propose three alternatives of reduced MDB modeling at the logical level: (i) the flat modeling integrates all states into one single table, (ii) the horizontal modeling converts each state into a fact table and some dimension tables associated with a temporal interval and (iii) the vertical modeling breaks down a reduced MDB into separate tables, each table includes data from one or several states. We evaluate query execution efficiency in MDBs with and without data reduction. The result shows data reduction is an interesting solution, since it significantly decreases execution costs by 98.96% during our experimental assessments.

Franck Ravat, Jiefu Song, Olivier Teste

### UML2PROV: Automating Provenance Capture in Software Engineering

In this paper we present UML2PROV, an approach addressing the gap between application design, through UML diagrams, and provenance design, using PROV-Template. PROV-Template is a declarative approach that enables software engineers to develop programs that generate provenance following the PROV standard. The main contributions of this paper are: (i) a mapping strategy from UML diagrams (UML State Machine and Sequence diagrams) to templates, (ii) a code generation technique that creates libraries, which can be deployed in an application by creating suitable artefacts for provenance generation, and (iii) a demonstration of the feasibility of UML2PROV implemented with Java, and a preliminary quantitative evaluation that shows benefits regarding aspects such as design, development and provenance capture.

Carlos Sáenz-Adán, Beatriz Pérez, Trung Dong Huynh, Luc Moreau

### Validating Data from Semantic Web Providers

As the Linked Open Data and the number of semantic web data providers hugely increase, so does the critical importance of the following question: how to get usable results, in particular for data mining and data analysis tasks? We propose a query framework equiped with integrity constraints that the user wants to be verified on the results coming from semantic web data providers. We precise the syntax and semantics of those user quality constraints. We give algorithms for their dynamic verification during the query computation, we evaluate their performance with experimental results, and discuss related works.

Jacques Chabin, Mirian Halfeld-Ferrari, Béatrice Markhoff, Thanh Binh Nguyen

### Backmatter

Weitere Informationen