main-content

## Über dieses Buch

This book constitutes the refereed proceedings of the 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, held in Limassol, Cyprus, in January 2020.
The 40 full papers presented together with 17 short papers and 3 invited papers were carefully reviewed and selected from 125 submissions. They presented new research results in the theory and practice of computer science in the each sub-area of SOFSEM 2020: foundations of computer science, foundations of data science and engineering, foundations of software engineering, and foundations of algorithmic computational biology.

## Inhaltsverzeichnis

### Certified Machine-Learning Models

The massive adoption of Machine Learning (ML) has deeply changed the internal structure, the design and the operation of software systems. ML has shifted the focus from code to data, especially in application areas where it is easier to collect samples that embody correct solutions to individual instances of a problem, than to design and code a deterministic algorithm solving it for all instances. There is an increasing awareness of the need to verify key non-functional properties of ML-based software applications like fairness and privacy. However, the traditional approach trying to verify these properties by code inspection is pointless, since ML models’ behavior mostly depends on the data and parameters used to train them. Classic software certification techniques cannot solve the issue as well. The Artificial Intelligence (AI) community has been working on the idea of preventing undesired behavior by controlling a priori the ML models’ training sets and parameters. In this paper, we take a different, online approach to ML verification, where novel behavioral monitoring techniques based on statistical testing are used to support a dynamic certification framework enforcing the desired properties on black-box ML models in operation. Our aim is to deliver a novel framework suitable for practical certification of distributed ML-powered applications in heavily regulated domains like transport, energy, healthcare, even when the certifying authority is not privy to the model training. To achieve this goal, we rely on three key ideas: (i) use test suites to define desired non-functional properties of ML models, (ii) Use statistical monitoring of ML models’ behavior at inference time to check that the desired behavioral properties are achieved, and (iii) compose monitors’ outcome within dynamic, virtual certificates for composite software applications.

Ernesto Damiani, Claudio A. Ardagna

### The Lost Recipes from the Four Schools of Amathus

Invited Talk Extended Abstract

This paper tells the story of the Four Schools of Amathus and the lost recipes for a legendary dish that attracted many people to the ancient royal city. Deciphering the recipes from snippets of the ancient scrolls that have been found recently seems to be an impossible task. Fortunately, the problem bears a strong resemblance to the haplotype phasing problem, which has been studied in more recent times. We point out the similarities between these problems, how they can be formulated as optimization problems and survey different solution strategies.

Gunnar W. Klau

### Sharing Energy for Optimal Edge Performance

Using the Energy Packet Network (EPN) model, we show how energy can be shared between heterogenous servers at the edge to minimize the overall average response time of jobs. The system is modeled as a probabilistic network where energy and jobs are being dispatched to the edge servers using G-Networks with a product-form solution for the equilibrium probability distribution of system state. The approach can also be used to design energy dispatching systems when renewable energy is used to improve the sustainability of edge computing.

Erol Gelenbe, Yunxiao Zhang

### A Characterization of the Context-Free Languages by Stateless Ordered Restart-Delete Automata

We consider stateless ordered restart-delete automata, which are actually just stateless ordered restarting automata (stl-ORWW-automata) that have an additional delete operation. While the stl-ORWW-automata just accept the regular languages, we show that the context-free languages are characterized by the swift stateless ordered restart-delete automaton, that is, by the stateless ordered restart-delete automaton that can move its window to any position after performing a restart.

Friedrich Otto

### A Constructive Arboricity Approximation Scheme

The arboricity $$\varGamma$$ of a graph is the minimum number of forests its edge set can be partitioned into. Previous approximation schemes were nonconstructive, i.e., they approximate the arboricity as a value without computing a corresponding forest partition. This is because they operate on pseudoforest partitions or the dual problem of finding dense subgraphs.We propose an algorithm for converting a partition of k pseudoforests into a partition of $$k+1$$ forests in $$\mathcal {O}(mk\log k + m \log n)$$ time with a data structure by Brodal and Fagerberg that stores graphs of arboricity k. A slightly better bound can be given if perfect hashing is used. When applied to a pseudoforest partition obtained from Kowalik’s approximation scheme, our conversion implies a constructive $$(1+\epsilon )$$-approximation algorithm for the arboricity with runtime $$\mathcal {O}(m \log n \log \varGamma \, \epsilon ^{-1})$$ for every $$\epsilon > 0$$. For fixed $$\epsilon$$, the runtime can be reduced to $$\mathcal {O}(m \log n)$$.Moreover, our conversion implies a near-exact algorithm that computes a partition into at most $$\varGamma +2$$ forests in $$\mathcal {O}(m\log n \,\varGamma \log ^* \varGamma )$$ time. It might also pave the way to faster exact arboricity algorithms.

Markus Blumenstock, Frank Fischer

### A Game of Cops and Robbers on Graphs with Periodic Edge-Connectivity

This paper considers a game in which a single cop and a single robber take turns moving along the edges of a given graph G. If there exists a strategy for the cop which enables it to be positioned at the same vertex as the robber eventually, then G is called cop-win, and robber-win otherwise. In contrast to previous work, we study this classical combinatorial game on edge-periodic graphs. These are graphs with an infinite lifetime comprised of discrete time steps such that each edge e is assigned a bit pattern of length $$l_e$$, with a 1 in the i-th position of the pattern indicating the presence of edge e in the i-th step of each consecutive block of $$l_e$$ steps. Utilising the known framework of reachability games, we obtain an $$O(\textsf {LCM}(L)\cdot n^3)$$ time algorithm to decide if a given n-vertex edge-periodic graph $$G^\tau$$ is cop-win or robber-win as well as compute a strategy for the winning player (here, L is the set of all edge pattern lengths $$l_e$$, and $$\textsf {LCM}(L)$$ denotes the least common multiple of the set L). For the special case of edge-periodic cycles, we prove an upper bound of $$2\cdot l \cdot \textsf {LCM}(L)$$ on the minimum length required of any edge-periodic cycle to ensure that it is robber-win, where $$l = 1$$ if $$\textsf {LCM}(L) \ge 2\cdot \max L$$, and $$l=2$$ otherwise. Furthermore, we provide constructions of edge-periodic cycles that are cop-win and have length $$1.5 \cdot \textsf {LCM}(L)$$ in the $$l=1$$ case and length $$3\cdot \textsf {LCM}(L)$$ in the $$l=2$$ case.

Thomas Erlebach, Jakob T. Spooner

### Approximating Shortest Connected Graph Transformation for Trees

Let G, H be two connected graphs with the same degree sequence. The aim of this paper is to find a transformation from G to H via a sequence of flips maintaining connectivity. A flip of G is an operation consisting in replacing two existing edges uv, xy of G by ux and vy.Taylor showed that there always exists a sequence of flips that transforms G into H maintaining connectivity. Bousquet and Mary proved that there exists a 4-approximation algorithm of a shortest transformation. In this paper, we show that there exists a 2.5-approximation algorithm running in polynomial time. We also discuss the tightness of the lower bound and show that, in order to drastically improve the approximation ratio, we need to improve the best known lower bounds.

Nicolas Bousquet, Alice Joffard

### Approximating Weighted Completion Time for Order Scheduling with Setup Times

Consider a scheduling problem in which jobs need to be processed on a single machine. Each job has a weight and is composed of several operations belonging to different families. The machine needs to perform a setup between the processing of operations of different families. A job is completed when its latest operation completes and the goal is to minimize the total weighted completion time of all jobs.We study this problem from the perspective of approximability and provide constant factor approximations as well as an inapproximability result. Prior to this work, only the NP-hardness of the unweighted case and the polynomial solvability of a certain special case were known.

Alexander Mäcker, Friedhelm Meyer auf der Heide, Simon Pukrop

### Bounds for the Number of Tests in Non-adaptive Randomized Algorithms for Group Testing

We study the group testing problem with non-adaptive randomized algorithms. Several models have been discussed in the literature to determine how to randomly choose the tests. For a model $$\mathcal{M}$$, let $$m_\mathcal{M}(n,d)$$ be the minimum number of tests required to detect at most d defectives within n items, with success probability at least $$1-\delta$$, for some constant $$\delta$$. In this paper, we study the measures $$c_\mathcal{M}(d)=\lim _{n\rightarrow \infty } \frac{m_\mathcal{M}(n,d)}{\ln n} \text{ and } \ c_\mathcal{M}=\lim _{d\rightarrow \infty } \frac{c_\mathcal{M}(d)}{d}.$$In the literature, the analyses of such models only give upper bounds for $$c_\mathcal{M}(d)$$ and $$c_\mathcal{M}$$, and for some of them, the bounds are not tight. We give new analyses that yield tight bounds for $$c_\mathcal{M}(d)$$ and $$c_\mathcal{M}$$ for all the known models $$\mathcal{M}$$.

### Burning Two Worlds

Algorithms for Burning Dense and Tree-Like Graphs

Graph burning is a model for the spread of social influence in networks. The objective is to measure how quickly a fire (e.g., a piece of fake news) can be spread in a network. The burning process takes place in discrete rounds. In each round, a new fire breaks out at a selected vertex and burns it. Meanwhile, the old fires extend to their adjacent vertices and burn them. A burning schedule selects where the new fire breaks out in each round, and the burning problem asks for a schedule that burns all vertices in a minimum number of rounds, termed the burning number of the graph. The burning problem is known to be NP-hard even when the graph is a tree or a disjoint set of paths. For connected graphs, it has been conjectured [3] that burning takes at most $$\lceil \sqrt{n} \ \rceil$$ rounds.In this paper, we approach the algorithmic study of graph burning from two directions. First, we consider connected n-vertex graphs with minimum degree $$\delta$$. We present an algorithm that burns any such graph in at most $$\sqrt{\frac{24n}{\delta +1}}$$ rounds. In particular, for graphs with $$\delta \in \varTheta (n)$$, all vertices are burned in a constant number of rounds. More interestingly, even when $$\delta$$ is a constant that is independent of n, our algorithm answers the graph-burning conjecture in the affirmative by burning the graph in at most $$\lceil \sqrt{n} \rceil$$ rounds. Then, we consider burning connected graphs with bounded pathlength or treelength. This includes many graph families, e.g., interval graphs (pathlength 1) and chordal graphs (treelength 1). We show that any connected graph with pathlength pl and diameter d can be burned in $$\lceil \sqrt{d-1} \rceil + pl$$ rounds. Our algorithm ensures an approximation ratio of $$1+o(1)$$ for graphs of bounded pathlength. We also give an algorithm that achieves an approximation ratio of $$2+o(1)$$ for burning connected graphs of bounded treelength. Our approximation factors are better than the best known approximation factor of 3 for burning general graphs.

Shahin Kamali, Avery Miller, Kenny Zhang

### Faster STR-EC-LCS Computation

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where $$m = |A|, n = |B|, r = |P|$$ [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in $$O(n|\varSigma | + (L+1)(m-L+1)r)$$ time where $$|\varSigma | \le \min \{m, n\}$$ denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, $$L \in O(1)$$ or $$m-L \in O(1)$$), and is at least as efficient as the O(mnr)-time algorithm for all cases.

Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

### Kernels of Sub-classes of Context-Free Languages

While the closure of a language family $$\mathscr {L}$$ under certain language operations is the least family of languages which contains all members of $$\mathscr {L}$$ and is closed under all of the operations, a kernel of $$\mathscr {L}$$ is a greatest family of languages which is a subfamily of $$\mathscr {L}$$ and is closed under all of the operations. Here we investigate properties of kernels of general language families and operations defined thereon as well as kernels of (deterministic) (linear) context-free languages with a focus on Boolean operations. While the closures of language families usually are unique, this uniqueness is not obvious for kernels. We consider properties of language families and operations that yield unique and non-unique, that is a set, of kernels. For the latter case, the question whether the union of all kernels coincides with the language family, or whether there are languages that do not belong to any kernel is addressed. Furthermore, the intersection of all kernels with respect to certain operations is studied in order to identify sets of languages that belong to all of these kernels.

Martin Kutrib

### Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an $$O(n\log \sigma )$$-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where $$\sigma$$ is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).

Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

### On Synthesis of Specifications with Arithmetic

Variable automata with arithmetic enable the specification of reactive systems with variables over an infinite domain of numeric values and whose operation involves arithmetic manipulation of these values [9]. We study the synthesis problem for such specifications. While the problem is in general undecidable, we define a fragment, namely semantically deterministic variable automata with arithmetic, for which the problem is decidable. Essentially, an automaton is semantically deterministic if the restrictions on the possible assignments to the variables that are accumulated along its runs resolve its nondeterministic choices. We show that semantically deterministic automata can specify many interesting behaviors – many more than deterministic ones, and that the synthesis problem for them can be reduced to a solution of a two-player game. For automata with simple guards, the game has a finite state space, and the synthesis problem can be solved in time polynomial in the automaton and exponential in the number of its variables.

Rachel Faran, Orna Kupferman

### On the Average State Complexity of Partial Derivative Transducers

2D regular expressions represent rational relations over two alphabets. In this paper we study the average state complexity of partial derivative standard transducers ($$\mathcal {T}_{\text {PD}}$$) that can be defined for (general) 2D expressions where basic terms are pairs of ordinary regular expressions (1D). While in the worst case the number of states of $$\mathcal {T}_{\text {PD}}$$ can be $$O(n^2)$$, where n is the size of the expression, asymptotically and on average that value is bounded from above by $$O(n^{\frac{3}{2}})$$. Moreover, asymptotically and on average the alphabetic size of a 2D expression is half of the size of that expression. All results are obtained in the framework of analytic combinatorics considering generating functions of parametrised combinatorial classes defined implicitly by algebraic curves. In particular, we generalise the methods developed in previous work to a broad class of analytic functions.

Stavros Konstantinidis, António Machiavelo, Nelma Moreira, Rogério Reis

### On the Difference Between Finite-State and Pushdown Depth

This paper expands upon existing and introduces new formulations of Bennett’s logical depth. A new notion based on pushdown compressors is developed. A pushdown deep sequence is constructed. The separation of (previously published) finite-state based and pushdown based depth is shown. The previously published finite state depth notion is extended to an almost everywhere (a.e.) version. An a.e. finite-state deep sequence is shown to exist along with a sequence that is infinitely often (i.o.) but not a.e. finite-state deep. For both finite-state and pushdown, easy and random sequences with respect to each notion are shown to be non-deep, and that a slow growth law holds for pushdown depth.

Liam Jordon, Philippe Moser

### Online Scheduling with Machine Cost and a Quadratic Objective Function

We will consider a quadratic variant of online scheduling with machine cost. Here, we have a sequence of independent jobs with positive sizes. Jobs come one by one and we have to assign them irrevocably to a machine without any knowledge about additional jobs that may follow later on. Owing to this, the algorithm has no machine at first. When a job arrives, we have the option to purchase a new machine and the cost of purchasing a machine is a fixed constant. In previous studies, the objective was to minimize the sum of the makespan and the cost of the purchased machines. Now, we minimize the sum of squares of loads of the machines and the cost paid to purchase them and we will prove that 4/3 is a general lower bound. After this, we will present a 4/3-competitive algorithm with a detailed competitive analysis.

J. Csirik, Gy. Dósa, D. Kószó

### Parallel Duel-and-Sweep Algorithm for the Order-Preserving Pattern Matching

Given a text and a pattern over an alphabet, the classic exact matching problem searches for all occurrences of pattern P in text T. Unlike the exact matching problem, order-preserving pattern matching considers the relative order of elements, rather than their exact values. In this paper, we propose the first parallel algorithm for the OPPM problem. Our algorithm is based on the “duel-and-sweep” algorithm. For a pattern of length m and a text of length n, our algorithm runs in $$O(\log ^ 3 m)$$ time and $$O(n \log ^ 3 m)$$ work on the Priority CRCW PRAM.

Davaajav Jargalsaikhan, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

### Parameterized Complexity of Synthesizing b-Bounded (m, n)-T-Systems

Let $$b\in \mathbb {N}^+$$. Synthesis of pure b-bounded (m, n)-T-systems ((m,n)-Synthesis, for short) consists in deciding whether there exists for an input (A, m, n) of transition system A and integers $$m,n\in \mathbb {N}$$ a pure b-bounded Petri net N as follows: N’s reachability graph is isomorphic to A, and each of N’s places has at most m incoming and at most n outgoing transitions. In the event of a positive decision, N should be constructed. The problem is known to be NP-complete, and (m,n)-Synthesis parameterized by $$m+n$$ is in XP [14]. In this paper, we enhance our understanding of (m,n)-Synthesis from the viewpoint of parameterized complexity by showing that it is W[1]-hard when parameterized by $$m+n$$.

Ronny Tredup

### Parameterized Dynamic Variants of Red-Blue Dominating Set

We introduce a parameterized dynamic version of the Red-Blue Dominating Set problem and its partial version. We prove the fixed-parameter tractability of the dynamic versions with respect to the (so called) edit-parameter while they remain $$\mathcal{W}[2]$$-hard with respect to the increment-parameter. We provide a complete study of the complexity of the problem with respect to combinations of the various parameters.

Faisal N. Abu-Khzam, Cristina Bazgan, Henning Fernau

### Refined Parameterizations for Computing Colored Cuts in Edge-Colored Graphs

In the Colored $$(s,t)\hbox {-}{\textsc {cut}}$$ problem, the input is a graph $$G=(V,E)$$ together with an edge-coloring $$\ell :E\rightarrow C$$, two vertices s and t, and a number k. The question is whether there is a set $$S\subseteq C$$ of at most k colors, such that deleting every edge with a color from S destroys all paths between s and t in G. We continue the study of the parameterized complexity of Colored $$(s,t)\hbox {-}{\textsc {cut}}$$. First, we consider parameters related to the structure of G. For example, we study parameterization by the number $$\xi _i$$ of edge deletions that are needed to transform G into a graph with maximum degree i. We show that Colored $$(s,t)\hbox {-}{\textsc {cut}}$$ is $$\mathrm {W}[2]$$-hard when parameterized by $$\xi _3$$, but fixed-parameter tractable when parameterized by $$\xi _2$$. Second, we consider parameters related to the coloring $$\ell$$. We show fixed-parameter tractability for three parameters that are potentially smaller than the total number of colors |C| and provide a linear-size problem kernel for a parameter related to the number of edges with a rare edge color.

Nils Morawietz, Niels Grüttemeier, Christian Komusiewicz, Frank Sommer

### Simple Distributed Spanners in Dense Congest Networks

The problem of computing a sparse spanning subgraph is a well-studied problem in the distributed setting, and a lot of research was done in the direction of computing spanners or solving the more relaxed problem of connectivity. Still, efficiently constructing a linear-size spanner deterministically remains a challenging open problem even in specific topologies.In this paper we provide several simple spanner constructions of linear size, for various graph families. Our first result shows that the connectivity problem can be solved deterministically using a linear size spanner within constant running time on graphs with bounded neighborhood independence. This is a very wide family of graphs that includes unit-disk graphs, unit-ball graphs, line graphs, claw-free graphs and many others. Moreover, our algorithm works in the $$\mathcal {CONGEST}$$ model. It also immediately leads to a constant time deterministic solution for the connectivity problem in the Congested-Clique.Our second result provides a linear size spanner in the $$\mathcal {CONGEST}$$ model for graphs with bounded diversity. This is a subtype of graphs with bounded neighborhood independence that captures various types of networks, such as wireless networks and social networks. Here too our result has constant running time and is deterministic. Moreover, the latter result has an additional desired property of a small stretch.

Leonid Barenboim, Tzalik Maimon

### The Order Type of Scattered Context-Free Orderings of Rank One Is Computable

A linear ordering is called context-free if it is the lexicographic ordering of some context-free language and is called scattered if it has no dense subordering. Each scattered ordering has an associated ordinal, called its rank. It is known that the isomorphism problem of context-free orderings is undecidable in general. In this paper we show that it is decidable whether a context-free ordering is scattered with rank at most one, and if so, its order type is effectively computable.

Kitti Gelle, Szabolcs Iván

### Up-to Techniques for Branching Bisimilarity

Ever since the introduction of behavioral equivalences on processes one has been searching for efficient proof techniques that accompany those equivalences. Both strong bisimilarity and weak bisimilarity are accompanied by an arsenal of up-to techniques: enhancements of their proof methods. For branching bisimilarity, these results have not been established yet. We show that a powerful proof technique is sound for branching bisimilarity by combining the three techniques of up to union, up to expansion and up to context for Bloom’s BB cool format. We then make an initial proposal for casting the correctness proof of the up to context technique in an abstract coalgebraic setting, covering branching but also $$\eta$$, delay and weak bisimilarity.

Rick Erkens, Jurriaan Rot, Bas Luttik

### Crowd Detection for Drone Safe Landing Through Fully-Convolutional Neural Networks

In this paper, we propose a novel crowd detection method for drone safe landing, based on an extremely light and fast fully convolutional neural network. Such a computer vision application takes advantage of the technical tools some commercial drones are equipped with. The proposed architecture is based on a two-loss model in which the main classification task, aimed at distinguishing between crowded and non-crowded scenes, is simultaneously assisted by a regression task, aimed at people counting. In addition, the proposed method provides class activation heatmaps, useful to semantically augment the flight maps. To evaluate the effectiveness of the proposed approach, we used the challenging VisDrone dataset, characterized by a very large variety of locations, environments, lighting conditions, and so on. The model developed by the proposed two-loss deep architecture achieves good values of prediction accuracy and average precision, outperforming models developed by a similar one-loss architecture and a more classic scheme based on MobileNet. Moreover, by lowering the confidence threshold, the network achieves very high recall, without sacrificing too much precision. The method also compares favorably with the state-of-the-art, providing an effective and efficient tool for several safe drone applications.

Giovanna Castellano, Ciro Castiello, Corrado Mencar, Gennaro Vessio

### Explaining Single Predictions: A Faster Method

Machine learning has proven increasingly essential in many fields. Yet, a lot obstacles still hinder its use by non-experts. The lack of trust in the results obtained is foremost among them, and has inspired several explanatory approaches in the literature. In this paper, we are investigating the domain of single prediction explanation. This is performed by providing the user a detailed explanation of the attribute’s influence on each single predicted instance, related to a particular machine learning model. A lot of possible explanation methods have been developed recently. Although, these approaches often require an important computation time in order to be efficient. That is why we are investigating about new proposals of explanation methods, aiming to increase time performances, for a small loss in accuracy.

Gabriel Ferrettini, Julien Aligon, Chantal Soulé-Dupuy

### Inferring Deterministic Regular Expression with Unorder

Schema inference has been an essential task in database management, and can be reduced to learning regular expressions from sets of positive finite-sample. In this paper, we extend the single-occurrence regular expressions (SOREs) to single-occurrence regular expressions with unorder (uSOREs), and give an inference algorithm for uSOREs. First, we present an unorder-countable finite automaton (uCFA). Then, we construct an uCFA for recognizing the given finite sample. Next, the uCFA runs on the given finite sample to count the number of occurrences of the subexpressions (connectable via unorder) for every possibly repeated matching. Finally we transform the uCFA to an uSORE according to the above results of counting. Experimental results demonstrate that, for larger samples, our algorithm can efficiently infer an uSORE with better generalization ability.

Xiaofan Wang, Haiming Chen

### POI Recommendation Based on Locality-Specific Seasonality and Long-Term Trends

This work deals with time-aware recommender systems in a domain of location-based social networks, such as Yelp or Foursquare. We propose a novel method to recommend Points of Interest (POIs) which considers their yearly seasonality and long-term trends. In contrast to the existing methods, we model these temporal aspects specifically for individual geographical localities instead of globally. According to the results achieved by the experimental evaluation on Yelp dataset, locality-specific seasonality can significantly improve the recommendation performance in comparison to its global alternative. We found out that it is helpful mostly within recommendations for highly-active users (it has a smaller influence for the novice users) and as expected, in localities with a strong seasonal weather variation. Another interesting finding is that in contrast to seasonality, we did not observe an improvement in case of locality-specific long-term trends.

Elena Stefancova, Ivan Srba

### Selection of a Green Logical Data Warehouse Schema by Anti-monotonicity Constraint

In the era of social media and big data, many organizations and countries are devoting considerable effort and money to reduce energy consumption. Despite that, current research mainly focuses on improving performance without taking into account energy consumption. Recently, great importance has been attached to finding a good compromise between energy efficiency and performance in data warehouse (DW) applications. For a given DW, multiple logical schemes may exist due to the presence of dependencies and hierarchies among the attributes. In this respect, it has been shown that varying the logical schema has an impact on energy saving. In this paper, we introduce a new approach for efficient exploration of the different logical schemes of a DW. To do so, we prune the search space by relying on anti-monotonicity based constraint to swiftly find the most energy-efficient logical schema. The carried out experiments show the sharp impact of the logical design on energy saving.

### The HyperBagGraph DataEdron: An Enriched Browsing Experience of Datasets

Track: Foundation of Data Science and Engineering

Traditional verbatim browsers give back information linearly according to a ranking performed by a search engine that may not be optimal for the surfer. The latter may need to assess the pertinence of the information retrieved, particularly when s$$\cdot$$he wants to explore other facets of a multi-facetted information space. Simultaneous facet visualisation can help to gain insights into the information retrieved and call for further refined searches. Facets are potentially heterogeneous co-occurrence networks, built choosing at least one reference type, and modeled by HyperBag-Graphs—families of multisets on a given universe. References allow to navigate inside the dataset and perform visual queries. The approach is illustrated on Arxiv scientific pre-prints searches.

Xavier Ouvrard, Jean-Marie Le Goff, Stéphane Marchand-Maillet

### Towards the Named Entity Recognition Methods in Biomedical Field

Natural Language Processing (NLP) is very important in modern data processing taking into consideration different sources, forms and purpose of data as well as information in different areas our industry, administration, public and private life. Our studies concern Natural Language Processing techniques in biomedical field. The increasing volume of information stored in medical health record databases both in natural language and in structured forms is creating increasing challenges for information retrieval (IR) technologies. The paper presents the comparison study of chosen Named Entity Recognition techniques for biomedical field.

Anna Śniegula, Aneta Poniszewska-Marańda, Łukasz Chomątek

### Vietnamese Punctuation Prediction Using Deep Neural Networks

Adding appropriate punctuation marks into text is an essential step in speech-to-text where such information is usually not available. While this has been extensively studied for English, there is no large-scale dataset and comprehensive study in the punctuation prediction problem for the Vietnamese language. In this paper, we collect two massive datasets and conduct a benchmark with both traditional methods and deep neural networks. We aim to publish both our data and all implementation codes to facilitate further research, not only in Vietnamese punctuation prediction but also in other related fields. Our project, including datasets and implementation details, is publicly available at https://github.com/BinhMisfit/vietnamese-punctuation-prediction .

Thuy Pham, Nhu Nguyen, Quang Pham, Han Cao, Binh Nguyen

### A Light-Weight Tool for the Self-assessment of Security Compliance in Software Development – An Industry Case

Companies are often challenged to modify and improve their software development processes in order to make them compliant with security standards. The complexity of these processes renders it difficult for practitioners to validate and foresee the effort required for compliance assessments. Further, performing gap analyses when processes are not yet mature enough is costly and involving auditors in early stages is, in our experience, often inefficient. An easier and more productive approach is conducting a self-assessment. However, practitioners, in particular developers, quality engineers, and product owners face difficulties to identify security-relevant process artifacts as required by standards. They would benefit from a proper and light-weight tool to perform early compliance assessments of their processes w.r.t. security standards before entering an in-depth audit. In this paper, we report on our current effort at Siemens Corporate Technology to develop such a light-weight assessment tool to assess the security compliance of software development processes with the IEC 62443-4-1 standard, and we discuss first results from an interview-based evaluation.

Fabiola Moyón, Christoph Bayr, Daniel Mendez, Sebastian Dännart, Kristian Beckers

### A Novel Hybrid Genetic Algorithm for the Two-Stage Transportation Problem with Fixed Charges Associated to the Routes

This paper concerns the two-stage transportation problem with fixed charges associated to the routes and proposes an efficient hybrid metaheuristic for distribution optimization. Our proposed hybrid algorithm incorporates a linear programming optimization problem into a genetic algorithm. Computational experiments were performed on a recent set of benchmark instances available from literature. The achieved computational results prove that our proposed solution approach is highly competitive in comparison with the existing approaches from the literature.

Ovidiu Cosma, Petrica C. Pop, Cosmin Sabo

### Do People Use Naming Conventions in SQL Programming?

In this paper, we investigate the usage of naming conventions in SQL programming. To this end, we define a reference style, consisting of naming conventions that have been proposed in the literature. Then, we perform an empirical study that involves the database schemas of 21 open source projects. In our study, we evaluate the adherence of the names that are used in the schemas to the reference style. Moreover, we study how the adherence of the names to the reference style evolves, during the lifetime of the schemas. Our study reveals that many conventions are followed in all schemas. The adherence to these conventions is typically stable, during the lifetime of the schemas. However, there are also conventions that are partially followed, or even not followed. Over time, the adherence of the schemas to these conventions may improve, decay or remain stable.

Aggelos Papamichail, Apostolos V. Zarras, Panos Vassiliadis

### Employing Costs in Multiagent Systems with Timed Migration and Timed Communication

We use a process calculus to describe easily multiagent systems with timeouts for mobility and communication, and with assigned costs for agents actions and for the locations of a distributed network. After presenting an operational semantics and some results regarding this calculus, we provide a translation of the multiagent systems to weighted timed automata having a bisimilar behaviour. Such a translation allows the use of an existing software tool for verification of various properties of the multiagent systems, and for optimizing the costs involved in the distributed networks of mobile agents.

Bogdan Aman, Gabriel Ciobanu

### Maintainability of Automatic Acceptance Tests for Web Applications—A Case Study Comparing Two Approaches to Organizing Code of Test Cases

[Context] Agile software development calls for test automation since it is critical for continuous development and delivery. However, automation is a challenging task especially for tests of user interface, which can be very expensive. [Problem] There are two extreme approaches of structuring the code of test duties for web-applicating, i.e., linear scripting and keyword-driven scripting technique employing the page object pattern. The goal of this research is to compare them focusing on the maintainability aspect. [Method] We develop and maintain two automatic test suites implementing the same test cases for a mature open-source system using these two approaches. For each approach, we measure the size of the testing codebase and the number of lines of code that need to be modified to keep the test suites passing and valid through five releases of the system. [Results] We observed that the total number of physical lines was higher for the keyword-driven approach than for the linear scripting one. However, the number of programmatical lines of code was smaller for the former. The number of lines of code that had to be modified to maintain the tests was lower for the keyword-driven scripting test suite than for the linear-scripting one. We found the linear-scripting technique was more difficult to maintain because the scripts consist only of low-level code directly interacting with a web browser making it hard to understand the purpose and broader context of the interaction they implement. [Conclusions] We conclude that test suites created using the keyword-driven approach are easier to maintain and more suitable for most of the projects. However, the results show that the linear scripting approach could be considered as a less expensive alternative for small projects that are not likely to be frequently modified in the future.

Aleksander Sadaj, Mirosław Ochodek, Sylwia Kopczyńska, Jerzy Nawrocki

### Recommending Trips in the Archipelago of Refactorings

The essence of refactoring is to improve source code quality, in a principled, behavior preserving, one step at the time, process. To this end, the developer has to figure out the refactoring steps, while working on a specific source code fragment. To facilitate this task, the documentation that explains each primitive refactoring typically provides guidelines and tips on how to combine it with further refactorings. However, the developer has to cope with many refactorings and lots of guidelines.To deal with this problem, we propose a graph-based model that formally specifies refactoring guidelines and tips in terms of nodes that correspond to refactorings and edges that denote part-of, instead-of and succession relations. We refer to this model as the Map of the Archipelago of Refactorings and we use it as the premise of the Refactoring Trip Advisor, a refactoring recommendation tool that facilitates the combination of refactorings. A first assessment of the tool in a practical scenario that involves 16 developers and a limited set of refactorings for composing and moving methods brought out positive results that motivate further studies of a larger scale and scope.

Theofanis Vartziotis, Apostolos V. Zarras, Anastasios Tsimakis, Panos Vassiliadis

### String Representations of Java Objects: An Empirical Study

String representations of objects are used for many purposes during software development, including debugging and logging. In Java, each class can define its own string representation by overriding the toString method. Despite their usefulness, these methods have been neglected by researchers so far. In this paper, we describe an empirical study of toString methods performed on a corpus of Java files. We are asking what portion of classes defines toString, how are these methods called, and what do they look like. We found that the majority of classes do not override the default (not very useful) implementation. A large portion of the toString method calls is implicit (using a concatenation operator). The calls to toString are used for nested string representation building, exception handling, in introspection libraries, for type conversion, and in test code. A typical toString implementation consists of literals, field reads, and string concatenation. Around one third of the string representation definitions is schematic. Half of such schematic implementations do not include all member variables in the printout. This fact motivates the future research direction – fully automated generation of succinct toString methods.

Matúš Sulír

### Fast Indexes for Gapped Pattern Matching

We describe indexes for searching large data sets for variable-length-gapped (VLG) patterns. VLG patterns are composed of two or more subpatterns, between each adjacent pair of which is a gap-constraint specifying upper and lower bounds on the distance allowed between subpatterns. VLG patterns have numerous applications in computational biology (motif search), information retrieval (e.g., for language models, snippet generation, machine translation) and capture a useful subclass of the regular expressions commonly used in practice for searching source code. Our best approach provides search speeds several times faster than prior art across a broad range of patterns and texts.

Manuel Cáceres, Simon J. Puglisi, Bella Zhukova

### Linearizing Genomes: Exact Methods and Local Search

In this article, we address the problem of genome linearization from the perspective of Polynomial Local Search, a complexity class related to finding local optima. We prove that the linearization problem, with a neighborhood structure, the neighbor slide, is PLS-complete. On the positive side, we develop two exact methods, one using tree decompositions with an efficient dynamic programming, the other using an integer linear programming. Finally, we compare them on real instances.

Tom Davot, Annie Chateau, Rodolphe Giroudeau, Mathias Weller

### Scanning Phylogenetic Networks Is NP-hard

Phylogenetic networks are rooted directed acyclic graphs used to depict the evolution of a set of species in the presence of reticulate events. Reconstructing these networks from molecular data is challenging and current algorithms fail to scale up to genome-wide data. In this paper, we introduce a new width measure intended to help design faster parameterized algorithms for this task. We study its relation with other width measures and problems in graph theory and finally prove that deciding it is NP-complete, even for very restricted classes of networks.

Vincent Berry, Celine Scornavacca, Mathias Weller

### The Maximum Equality-Free String Factorization Problem: Gaps vs. No Gaps

A factorization of a string w is a partition of w into substrings $$u_1,\dots ,u_k$$ such that $$w=u_1 u_2 \cdots u_k$$. Such a partition is called equality-free if no two factors are equal: $$u_i \ne u_j, \forall i,j$$ with $$i \ne j$$. The maximum equality-free factorization problem is to decide, for a given string w and integer k, whether w admits an equality-free factorization with k factors.Equality-free factorizations have lately received attention because of their application in DNA self-assembly. Condon et al. (CPM 2012) study a version of the problem and show that it is $$\mathcal {NP}$$-complete to decide if there exists an equality-free factorization with an upper bound on the length of the factors. At STACS 2015, Fernau et al. show that the maximum equality-free factorization problem with a lower bound on the number of factors is $$\mathcal {NP}$$-complete. Shortly after, Schmid (CiE 2015) presents results concerning the Fixed Parameter Tractability of the problems.In this paper we approach equality free factorizations from a practical point of view i.e. we wish to obtain good solutions on given instances. To this end, we provide approximation algorithms, heuristics, Integer Programming models, an improved FPT algorithm and we also conduct experiments to analyze the performance of our proposed algorithms.Additionally, we study a relaxed version of the problem where gaps are allowed between factors and we design a constant factor approximation algorithm for this case. Surprisingly, after extensive experiments we conjecture that the relaxed problem has the same optimum as the original.

### A Calculus for Language Transformations

In this paper we propose a calculus for expressing algorithms for programming languages transformations. We present the type system and operational semantics of the calculus, and we prove that it is type sound. We have implemented our calculus, and we demonstrate its applicability with common examples in programming languages. As our calculus manipulates inference systems, our work can, in principle, be applied to logical systems.

### Computing Directed Steiner Path Covers for Directed Co-graphs (Extended Abstract)

We consider the Directed Steiner Path Cover problem on directed co-graphs. Given a directed graph $$G=(V(G),E(G))$$ and a set $$T \subseteq V(G)$$ of so-called terminal vertices, the problem is to find a minimum number of directed vertex-disjoint paths, which contain all terminal vertices and a minimum number of non-terminal vertices (Steiner vertices). The primary minimization criteria is the number of paths. We show how to compute a minimum Steiner path cover for directed co-graphs in linear time. For $$T = V(G)$$, the algorithm computes a directed Hamiltonian path if such a path exists.

Frank Gurski, Stefan Hoffmann, Dominique Komander, Carolin Rehs, Jochen Rethmann, Egon Wanke

### Counting Infinitely by Oritatami Co-transcriptional Folding

A fixed bit-width counter was proposed as a proof-of-concept demonstration of an oritatami model of cotranscriptional folding [Geary et al., Proc. MFCS 2016, LIPIcs 58, 43:1–43:14], and it was embedded into another oritatami system that self-assembles a finite portion of Heighway dragon fractal. In order to expand its applications, we endow this counter with capability to widen bit-width at every encounter with overflow.

Kohei Maruyama, Shinnosuke Seki

### On Synchronizing Tree Automata and Their Work–Optimal Parallel Run, Usable for Parallel Tree Pattern Matching

We present a way of synchronizing finite tree automata: We define a synchronizing term and a k-local deterministic finite bottom–up tree automaton. Furthermore, we present a work–optimal parallel algorithm for parallel run of the deterministic k-local tree automaton in $$\mathcal {O}(\log {n})$$ time with $$\lceil \frac{n}{\log {n}}\rceil$$ processors, for $$k \le \log {n}$$, or in $$\mathcal {O}(k)$$ time with $$\lceil \frac{n}{k}\rceil$$ processors, for $$k \ge \log {n}$$, where n is the number of nodes of an input tree, on EREW PRAM. Finally, we prove that the deterministic finite bottom–up tree automaton that is used as a standard tree pattern matcher is k-local with respect to the height of a tree pattern.

Štěpán Plachý, Jan Janoušek

### On the Hardness of Energy Minimisation for Crystal Structure Prediction

Crystal Structure Prediction (csp) is one of the central and most challenging problems in materials science and computational chemistry. In csp, the goal is to find a configuration of ions in 3D space that yields the lowest potential energy. Finding an efficient procedure to solve this complex optimisation question is a well known open problem in computational chemistry. Due to the exponentially large search space, the problem has been referred in several materials-science papers as “NP-Hard” without any formal proof. This paper fills a gap in the literature providing the first set of formally proven NP-Hardness results for a variant of csp with various realistic constraints. In particular, this work focuses on the problem of removal: the goal is to find a substructure with minimal energy, by removing a subset of the ions from a given initial structure. The main contributions are NP-Hardness results for the csp removal problem, new embeddings of combinatorial graph problems into geometrical settings, and a more systematic exploration of the energy function to reveal the complexity of csp. These results contribute to the wider context of the analysis of computational problems for weighted graphs embedded into the 3-dimensional Euclidean space, where our NP-Hardness results holds for complete graphs with edges which are weighted proportional to the distance between the vertices.

### Practical Implementation of a Quantum Backtracking Algorithm

In previous work, Montanaro presented a method to obtain quantum speedups for backtracking algorithms, a general meta-algorithm to solve constraint satisfaction problems (CSPs). In this work, we derive a space efficient implementation of this method. Assume that we want to solve a CSP with m constraints on n variables and that the domain in which these variables take their value is of cardinality d. Then, we show that the implementation of Montanaro’s backtracking algorithm can be done by using $$\mathcal {O}(n\log {d})$$ data qubits. We detail an implementation of the predicate associated to the CSP with an additional register of $$\mathcal {O}(\log {m})$$ qubits. We explicit our implementation for graph coloring and SAT problems, and present simulation results. Finally, we discuss the impact of the usage of static and dynamic variable ordering heuristics in the quantum setting.

Simon Martiel, Maxime Remaud

### Simplified Emanation Graphs: A Sparse Plane Spanner with Steiner Points

Emanation graphs of grade k, introduced by Hamedmohseni, Rahmati, and Mondal, are plane spanners made by shooting $$2^{k+1}$$ rays from each given point, where the shorter rays stop the longer ones upon collision. The collision points are the Steiner points of the spanner.We introduce a method of simplification for emanation graphs of grade $$k=2$$, which makes it a competent spanner for many possible use cases such as network visualization and geometric routing. In particular, the simplification reduces the number of Steiner points by half and also significantly decreases the total number of edges, without increasing the spanning ratio. Exact methods of simplification is provided along with comparisons of simplified emanation graphs against Shewchuk’s constrained Delaunay triangulations on both synthetic and real-life datasets. Our experimental results reveal that the simplified emanation graphs outperform constrained Delaunay triangulations in common quality measures.

Bardia Hamedmohseni, Zahed Rahmati, Debajyoti Mondal

### Simultaneous FPQ-Ordering and Hybrid Planarity Testing

We study the interplay between embedding constrained planarity and hybrid planarity testing. We consider a constrained planarity testing problem, called 1-Fixed Constrained Planarity, and prove that this problem can be solved in quadratic time for biconnected graphs. Our solution is based on a new definition of fixedness that makes it possible to simplify and extend known techniques about Simultaneous PQ-Ordering. We apply these results to different variants of hybrid planarity testing, including a relaxation of NodeTrix Planarity with fixed sides, that allows rows and columns to be independently permuted.

Giuseppe Liotta, Ignaz Rutter, Alessandra Tappini

### Two-Player Competitive Diffusion Game: Graph Classes and the Existence of a Nash Equilibrium

The competitive diffusion game is a game-theoretic model of information spreading on a graph proposed by Alon et al. (2010). In the model, a player chooses an initial vertex of the graph, from which information by the player spreads through the edges connected with the initial vertex. If a vertex that is not yet influenced by any information receives information by a player, it is influenced by the information and it diffuses it to adjacent vertices. A vertex that simultaneously receives two or more types of information does not diffuse any type of information from then on. The objective of a player is to maximize the number of vertices influenced by the player’s information. In this paper, we investigate the existence of a pure Nash equilibrium of the two-player competitive diffusion game on chordal and its related graphs. We show that a pure Nash equilibrium always exists on block graphs, split graphs and interval graphs, all of which are well-known subclasses of chordal graphs. On the other hand, we show that there is an instance with no pure Nash equilibrium on (strongly) chordal graphs; the boundary of the existence of a pure Nash equilibrium is found.

Naoka Fukuzono, Tesshu Hanaka, Hironori Kiya, Hirotaka Ono, Ryogo Yamaguchi

### Automatic Text Generation in Slovak Language

Automatic text generation can significantly help to ease human effort in many every-day tasks. Recent advancements in neural networks supported further research in this area and also brought significant improvement in quality of text generation. Unfortunately, most of the research deals with English language and possibilities of text generation of Slavic languages was not fully explored yet. Our work is concerned with automatic text generation and language modeling for Slovak language. Since Slovak language has more complicated grammatical structure and morphology, the task of text generation is also more challenging. We experimented with the neural approaches in natural language generation and performed several experiments with text generation in both Slovak and English language for two different domains. Additionally, we performed an experiment with human annotators to assess the quality of generated texts. Our experiments showed promising results and we can consider using neural networks for text generation as sufficient also for text generation in Slovak language.

Dominik Vasko, Samuel Pecar, Marian Simko

### Connecting Galaxies: Bridging the Gap Between Databases and Applications

An incompatibility of object-oriented application code and relational database engine often causes performance problems, known as Impedance Mismatch, which negatively affect business-critical application functions. The incompatibility can also over-complicate application design and increase the costs of development.We address these issues, applying a concept of the API contracts to the interaction between the application and the database. We introduce a new technique providing for the transfer of complex objects rather than low-level records. We describe the implementation of the proposed solution in industrial settings and show how suggested techniques streamline the application development, at the same time providing significant performance gains.

Henrietta Dombrovskaya, Jeff Czaplewski, Boris Novikov

### GRaCe: A Relaxed Approach for Graph Query Caching

SPARQL query optimization is an important issue for RDF data stores that can benefit from the usage of caching frameworks. Most caching approaches rely on a precise match semantics, that limits the number of cache hits and, as a consequence, the potential benefit. Others propose relaxed matches for the entire query, which is precisely executed over the cached result set. In this paper, to overcome these limitations we propose GRaCe, a Graph Relaxed Caching approach for RDF data stores. GRaCe supports relaxed cache matches and a relaxed query semantics, thus increasing the number of cache hits. Experimental results show that a relaxed cache can significantly reduce query execution time in all the scenarios where a relaxed query result is tolerated.

Francesco De Fino, Barbara Catania, Giovanna Guerrini

### Modelling of the Fake Posting Recognition in On-Line Media Using Machine Learning

Discuss content in the online web space has a significant impact on social life in recent years, especially in the political world. The impact of social networks has its advantages and disadvantages. An important disadvantage is a rising of the antisocial content in online communities. The antisocial content represents a serious and actual problem that is reinforced by a simplifying the process of creating and disseminating of antisocial posts. A typical example is a spreading of fake reviews. Detection of fake reviews is becoming one of the most important areas of research in last years. It is easier to track the impact of fake reviews than to detect them. The aim of this paper is to create suitable models for the fake reviews recognition using machine learning algorithms particularly decision tree, random forests, support vector machine and naïve Bayes classifier. Using a confusion matrix, several indicators of binary classification efficiency were quantified in the process of these models testing.

Kristína Machová, Marián Mach, Gabriela Demková

### Two-Step Memory Networks for Deep Semantic Parsing of Geometry Word Problems

Semantic parsing of geometry word problems (GWPs) is the first step towards automated geometry problem solvers. Existing systems for this task heavily depend on language-specific NLP tools, and use hard-coded parsing rules. Moreover, these systems produce a static set of facts and record low precision scores. In this paper, we present the two-step memory network, a novel neural network architecture for deep semantic parsing of GWPs. Our model is language independent and optimized for low-resource domains. Without using any language-specific NLP tools, our system performs as good as existing systems. We also introduce on-demand fact extraction, where a solver can query the model about entities during the solving stage that alleviates the problem of imperfect recalls.

### A Case Study on a Hybrid Approach to Assessing the Maturity of Requirements Engineering Practices in Agile Projects (REMMA)

Context: Requirements Engineering (RE) is one of the key processes in software development. With the advent of agile software development methods, new challenges have emerged for traditional, prescriptive maturity models aiming to support the improvement of RE process. One of the main problems is that frequently the guidelines prescribed by agile approaches have to be adapted to a project’s context to provide benefits. Therefore, it might be naive to believe that it is possible to propose a prescriptive method of RE process improvement that will suit all agile projects without any alteration. Objective: The aim of the paper is to evaluate a hybrid approach to assessing the maturity of agile RE (REMMA), which combines elements of prescriptive and problem-oriented improvement methods. Method: The usefulness, ease of use, and cost-effectiveness of REMMA were investigated through a case study performed in one of the biggest software houses in Central Europe. Results: The results of the case study suggest that the method seems easy to use, affordable, and is perceived as a useful tool to support the process of improving RE practices in agile projects. Its feature of taking into account the dependencies between practices and the necessity to adapt them to a certain project context was regarded as well suited for the agile context. Conclusions: REMMA, which includes two main components: a maturity model for agile RE (a set of state-of-the-art agile RE practices) and an assessment method that makes it possible to evaluate how well the agile RE practices are implemented, seems to be a useful tool supporting improvement of RE in agile projects.

Mirosław Ochodek, Sylwia Kopczyńska, Jerzy Nawrocki

### Does Live Regression Testing Help?

Regression testing is an expensive, yet crucial part of the software development process. As regression test suites grow in size, the time required for their execution increases proportionally, and their execution is often either delegated to a specialized testing environment out of developers reach, or they are omitted completely. This could have a variety of negative effects on the developers’ productivity, including interruptions and slowdown of developers’ workflow. We propose a method of live regression unit testing to address these issues via incorporating Regression Test Selection and Test Case Prioritization techniques and an automatized change detection mechanism to run the regression testing in the background automatically. By combining the test results with source code changes and code coverage information, we are able to precisely identify source code changes responsible for test failures. By the paired two-sample t-test we proved, that our method is able to increase the speed of fault detection and to fix changes responsible for incorrect behaviour almost 2 times (p-value = 0.001, $$\alpha$$ = 0.05).

Marek Bruchatý, Karol Rástočný

### Dense Subgraphs in Biological Networks

A fundamental problem in analysing biological networks is the identification of dense subgraphs, since they are considered to be related to relevant parts of networks, like communities. Many contributions have been focused mainly in computing a single dense subgraph, but in many applications we are interested in finding a set of dense, possibly overlapping, subgraphs. In this paper we consider the Top-k-Overlapping Densest Subgraphs problem, that aims at finding a set of k dense subgraphs, for some integer $$k \ge 1$$, that maximize an objective function that consists of the density of the subgraphs and the distance among them. We design a new heuristic for the Top-k-Overlapping Densest Subgraphs and we present an experimental analysis that compares our heuristic with an approximation algorithm developed for Top-k-Overlapping Densest Subgraphs (called DOS) on biological networks. The experimental result shows that our heuristic provides solutions that are denser than those computed by DOS, while the solutions computed by DOS have a greater distance. As for time-complexity, the DOS algorithm is much faster than our method.