Skip to main content
Top

2021 | Book

SOFSEM 2021: Theory and Practice of Computer Science

47th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2021, Bolzano-Bozen, Italy, January 25–29, 2021, Proceedings

Editors: Tomáš Bureš, Prof. Riccardo Dondi, Johann Gamper, Prof. Giovanna Guerrini, Tomasz Jurdziński, Prof. Dr. Claus Pahl, Dr. Florian Sikora, Dr. Prudence W.H. Wong

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book contains the invited and contributed papers selected for presentation at SOFSEM 2021, the 47th International Conference on Current Trends in Theory and Practice of Computer Science, which was held online during January 25–28, 2021, hosted by the Free University of Bozen-Bolzano, Italy.

The 33 full and 7 short papers included in the volume were carefully reviewed and selected from 100 submissions. They were organized in topical sections on: foundations of computer science; foundations of software engineering; foundations of data science and engineering; and foundations of algorithmic computational biology. The book also contains 5 invited papers.

Table of Contents

Frontmatter

Invited Papers

Frontmatter
Algorithms that Access the Input via Queries

Problems where an algorithm cannot simply access the whole input but needs to obtain information about it using queries arise naturally in many settings. We discuss different aspects of models where an algorithm needs to query the input, and of how the performance of algorithms for such models can be measured. After that, we give some concrete examples of algorithmic settings and results for scenarios where algorithms access the input via queries. Finally, we discuss recent results for the setting of computing with explorable uncertainty with parallel queries and with untrusted predictions.

Thomas Erlebach
Towards Knowledge Exchange: State-of-the-Art and Open Problems

We discuss our experience in bringing data exchange to knowledge graphs. This experience includes the development of Kensho, a tool for generating mapping rules and performing knowledge exchange between two Knowledge Bases (KBs). We highlight the challenges addressed in Kensho, including managing the rich structural complexity of KBs and the need to handle incomplete correspondences between property paths. We use Kensho to highlight many open problems related to knowledge exchange including how knowledge translation can inform the task of KB integration and population.

Bahar Ghadiri Bashardoost, Kelly Lyons, Renée J. Miller
Invited Talk: Resilient Distributed Algorithms

Following the immense recent advances in distributed networks, the explosive growth of the Internet, and our increased dependence on these infrastructures, guaranteeing the uninterrupted operation of communication networks has become a major objective in network algorithms. The modern instantiations of distributed networks, such as, the Bitcoin network and cloud computing, introduce in addition, new security challenges that deserve urgent attention in both theory and practice.This extended abstract describes our initial steps towards developing a unified framework for obtaining fast, resilient and secure distributed algorithms for fundamental graph problems. We will be focusing on two main objectives: Initiating and establishing the theoretical exploration of security in distributed graph algorithms. Such a notion has been addressed before mainly in the context of secure multi-party computation (MPC). The heart of our approach is to develop new graph theoretical infrastructures to provide graphical secure channels between nodes in a communication network of an arbitrary topology. Developing efficient distributed algorithms against various adversarial settings, such as, node crashes and Byzantine attacks. The main novelty in addressing these objectives is in our approach, which is based on taking a graph theoretic perspective where common notions of resilience requirements will be translated into suitably tailored combinatorial graph structures. We believe that the proposed perspective will deepen the theoretical foundations for resilient distributed computation, strengthen the connections with the areas of fault tolerant network design and information theoretic security, and provide a refreshing perspective on extensively-studied graph theoretical concepts.

Merav Parter
Towards Minimally Conscious Cyber-Physical Systems: A Manifesto

Incidents like the crash of Lion Air Flight 610 in 2018 challenge the design of reliable and secure cyber-physical systems that operate in the real-world and have to cope with unpredictable external phenomena and error-prone technology. We argue that their design needs to guarantee minimal machine consciousness, which expresses that these systems must operate with full awareness of (the state of) their components and the environment. The concept emerged from our recent effort to develop a computational model for conscious behavior in robots, based on the theory of automata. Making systems ‘minimal machine conscious’ leads to more trustworthy systems, as it strengthens their behavioral flexibility in varying environments and their resilience to operation and cooperation failures of their components and as a whole. The notion of minimal machine consciousness has the potential to become one of the defining attributes of Industry 4.0.

Jiří Wiedermann, Jan van Leeuwen

Foundations of Computer Science – Full Papers

Frontmatter
Amnesiac Flooding: Synchronous Stateless Information Dissemination

A recently introduced stateless variant of network flooding for synchronous systems is called amnesiac flooding. Stateless protocols are advantageous in high volume applications, increasing performance by removing the load caused by retention of session information. In this paper we analyze the termination time of multi-source amnesiac flooding. We provide tight upper and lower bounds for the time complexity.

Volker Turau
Asymptotic Approximation by Regular Languages

This paper investigates a new property of formal languages called $$\mathrm {REG}$$ REG -measurability where $$\mathrm {REG}$$ REG is the class of regular languages. Intuitively, a language $$L$$ L is $$\mathrm {REG}$$ REG -measurable if there exists an infinite sequence of regular languages that “converges” to $$L$$ L . A language without $$\mathrm {REG}$$ REG -measurability has a complex shape in some sense so that it can not be (asymptotically) approximated by regular languages. We show that several context-free languages are $$\mathrm {REG}$$ REG -measurable (including languages with transcendental generating function and transcendental density, in particular), while a certain simple deterministic context-free language and the set of primitive words are $$\mathrm {REG}$$ REG -immeasurable in a strong sense.

Ryoma Sin’ya
Balanced Independent and Dominating Sets on Colored Interval Graphs

We study two new versions of independent and dominating set problems on vertex-colored interval graphs, namely f-Balanced Independent Set (f-BIS) and f-Balanced Dominating Set (f-BDS). Let $$G=(V,E)$$ G = ( V , E ) be an interval graph with a color assignment function $${{\,\mathrm{\gamma }\,}}:V \rightarrow \{1,\ldots ,k\}$$ γ : V → { 1 , … , k } that maps all vertices in G onto k colors. A subset of vertices $$S\subseteq V$$ S ⊆ V is called f-balanced if S contains f vertices from each color class. In the f-BIS and f-BDS problems, the objective is to compute an independent set or a dominating set that is f-balanced. We show that both problems are NP-complete even on proper interval graphs. For the BIS problem on interval graphs, we design two FPT algorithms, one parameterized by (f, k) and the other by the vertex cover number of G. Moreover, for an optimization variant of BIS on interval graphs, we present a polynomial time approximation scheme (PTAS) and an $$O(n\log n)$$ O ( n log n ) time 2-approximation algorithm.

Sujoy Bhore, Jan-Henrik Haunert, Fabian Klute, Guangping Li, Martin Nöllenburg
Bike Assisted Evacuation on a Line

Two hikers and a bike are initially placed at the origin of an infinite line. When walking, the hikers can keep a constant speed of 1, but when riding the bike they can reach a constant speed $$v>1$$ v > 1 (same for both hikers). The hikers are modelled as autonomous mobile robots with communication capabilities (either in the wireless or face-to-face model) while the bike is not autonomous in that it cannot move on its own but instead it must be picked up by a hiker. An exit is placed on the line at distance d from the origin; the distance and direction of the exit from the origin is unknown to the hikers. The hikers may either walk or ride the bike however only one hiker may ride the bike at a time. The goal of the hikers is to evacuate from the exit in the minimum time possible as measured by the time it takes the last hiker to exit.We develop algorithms for this “bike assisted” evacuation of the two hikers from an unknown exit on a line and analyze their evacuation time. In the wireless model we present three algorithms: in the first the robots move in opposite direction with max speed, in the second with a specially selected “optimal” speed, and in the third the hiker imitates the biker. We also give three algorithms in the Face-to-Face model: in the first algorithm the hiker pursues the biker, in the second the hiker and the biker use zig-zag algorithms with specially chosen expansion factors, and the third algorithm establishes a sequence a specially constructed meeting points near the exit. In either case, the optimality of these algorithms depends on $$v >1$$ v > 1 . We also discuss lower bounds.

Khaled Jawhar, Evangelos Kranakis
Blocksequences of k-local Words

The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the letters of the word are marked in steps: in the i $$^{\text {th}}$$ th marking step, all occurrences of the i $$^{\text {th}}$$ th letter of the marking sequence are marked. As such, after each marking step, the word can be seen as a sequence of blocks of marked letters separated by blocks of non-marked letters. By keeping track of the evolution of the marked blocks of the word through the marking defined by a marking sequence, one defines the blocksequence of the respective marking sequence. We first show that the words sharing the same blocksequence are only loosely connected, so we consider the stronger notion of extended blocksequence, which stores additional information on the form of each single marked block. In this context, we present a series of combinatorial results for words sharing the extended blocksequence.

Pamela Fleischmann, Lukas Haschke, Florin Manea, Dirk Nowotka, Cedric Tsatia Tsida, Judith Wiedenbeck
Complexity of Limit-Cycle Problems in Boolean Networks

Boolean networks are a general model of interacting entities, with applications to biological phenomena such as gene regulation. Attractors play a central role, and the schedule of entity updates is a priori unknown. This article presents results on the computational complexity of problems related to the existence of update schedules such that some limit-cycle lengths are possible or not. We first prove that given a Boolean network updated in parallel, knowing whether it has at least one limit-cycle of length k is $${\mathsf {NP}}$$ NP -complete. Adding an existential quantification on the block-sequential update schedule does not change the complexity class of the problem, but the following alternation brings us one level above in the polynomial hierarchy: given a Boolean network, knowing whether there exists a block-sequential update schedule such that it has no limit-cycle of length k is $${\mathsf {NP^{NP}}}$$ NP NP -complete.

Florian Bridoux, Caroline Gaze-Maillot, Kévin Perrot, Sylvain Sené
Concatenation Operations and Restricted Variants of Two-Dimensional Automata

A two-dimensional automaton operates on arrays of symbols. While a standard (four-way) two-dimensional automaton can move its input head in four directions, restricted two-dimensional automata are only permitted to move their input heads in three or two directions; these models are called three-way and two-way two-dimensional automata, respectively.In two dimensions, we may extend the notion of concatenation in multiple ways, depending on the words to be concatenated. We may row-concatenate (resp., column-concatenate) a pair of two-dimensional words when they have the same number of columns (resp., rows). In addition, the diagonal concatenation operation combines two words at their lower-right and upper-left corners, and is not dimension-dependent.In this paper, we investigate closure properties of restricted models of two-dimensional automata under three concatenation operations. We give non-closure results for two-way two-dimensional automata under row and column concatenation in both the deterministic and nondeterministic cases. We further give positive closure results for the same concatenation operations on unary nondeterministic two-way two-dimensional automata. Finally, we study closure properties of diagonal concatenation on both two- and three-way two-dimensional automata.

Taylor J. Smith, Kai Salomaa
Distance Hedonic Games

In this paper we consider Distance Hedonic Games (DHGs), a class of non-transferable utility coalition formation games that properly generalizes previously existing models, like Social Distance Games (SDGs) and unweighted Fractional Hedonic Games (FHGs). In particular, in DHGs we assume the existence of a scoring vector $$\alpha $$ α , in which the i-th coefficient $$\alpha _i$$ α i expresses the extent to which an agent x contributes to the utility of an agent y if they are at distance i. We focus on Nash stable outcomes in the arising games, i.e., on coalition structures in which no agent can unilaterally improve her gain by deviating.We consider two different natural scenarios for the scoring vector, with monotonically increasing and monotonically decreasing coefficients. In both cases we give NP-hardness and inapproximability results on the problems of finding a social optimum and a best Nash stable outcome. Moreover, we characterize the topologies of coalitions that provide high social welfare and consequently give suitable bounds on the Price of Anarchy and on the Price of Stability.

Michele Flammini, Bojana Kodric, Martin Olsen, Giovanna Varricchio
Distributed Independent Sets in Interval and Segment Intersection Graphs

The Maximal Independent Set problem is a well-studied problem in the distributed community. We study Maximum and Maximal Independent Set problems on two geometric intersection graphs; interval graphs and axis-parallel segment intersection graphs, and present deterministic distributed algorithms in the local communication model. We compute the maximum independent set on interval graphs in O(k) rounds and O(n) messages, where k is the size of the maximum independent set and n is the number of nodes in the graph. We provide a matching lower bound of $$\varOmega (k)$$ Ω ( k ) on the number of rounds whereas $$\varOmega (n)$$ Ω ( n ) is a trivial lower bound on message complexity. Thus our algorithm is both time and message optimal. We also study the maximal independent set problem in bi-interval graphs, a special case of the interval graphs where the intervals have two different lengths. We prove that a maximal independent set can be computed in bi-interval graphs in constant rounds that is $$\frac{1}{6}$$ 1 6 -approximation. For axis-parallel segment intersection graphs, we design an algorithm that finds a maximal independent set in O(D) rounds, where D is the diameter of the graph. We further show that this independent set is a $$\frac{1}{2}$$ 1 2 -approximation. The results in this paper extend the results of Molla et al. [J. Parallel Distrib. Comput. 2019].

Barun Gorain, Kaushik Mondal, Supantha Pandit
Hierarchical b-Matching

A matching of a graph is a subset of edges no two of which share a common vertex, and a maximum matching is a matching of maximum cardinality. In a b-matching every vertex v has an associated bound $$b_v$$ b v , and a maximum b-matching is a maximum set of edges, such that every vertex v appears in at most $$b_v$$ b v of them. We study an extension of this problem, termed Hierarchical b-Matching. In this extension, the vertices are arranged in a hierarchical manner. At the first level the vertices are partitioned into disjoint subsets, with a given bound for each subset. At the second level the set of these subsets is again partitioned into disjoint subsets, with a given bound for each subset, and so on. We seek for a maximum set of edges, that obey all bounds (that is, no vertex v participates in more than $$b_v$$ b v edges, then all the vertices in one subset do not participate in more that subset’s bound of edges, and so on hierarchically). This is a sub-problem of the matroid matching problem which is $$\textsc {NP}\text {-hard}$$ NP -hard in general. It corresponds to the special case where the matroid is restricted to be laminar and the weights are unity. A pseudo-polynomial algorithm for the weighted laminar matroid matching problem is presented in [8]. We propose a polynomial-time algorithm for Hierarchical b-matching, i.e. the unweighted laminar matroid matching problem, and discuss how our techniques can possibly be generalized to the weighted case.

Yuval Emek, Shay Kutten, Mordechai Shalom, Shmuel Zaks
Improved Algorithms for Online Load Balancing

We consider an online load balancing problem and its extensions in the framework of repeated games. On each round, the player chooses a distribution (task allocation) over K servers, and then the environment reveals the load of each server, which determines the computation time of each server for processing the task assigned. After all rounds, the cost of the player is measured by some norm of the cumulative computation-time vector. The cost is the makespan if the norm is $$L_\infty $$ L ∞ -norm. The goal is to minimize the regret, i.e., minimizing the player’s cost relative to the cost of the best fixed distribution in hindsight. We propose algorithms for general norms and prove their regret bounds. In particular, for $$L_\infty $$ L ∞ -norm, our regret bound matches the best known bound and the proposed algorithm runs in polynomial time per trial involving linear programming and second order programming, whereas no polynomial time algorithm was previously known to achieve the bound.

Yaxiong Liu, Kohei Hatano, Eiji Takimoto
Iterated Uniform Finite-State Transducers on Unary Languages

We consider the model of an iterated uniform finite-state transducer, which executes the same length-preserving transduction in iterative sweeps. The first sweep takes place on the input string, while any subsequent sweep works on the output of the previous one. We focus on unary languages.We show that any unary regular language can be accepted by a deterministic iterated uniform finite-state transducer with at most $$\max \{2\cdot \varrho ,p\}+1$$ max { 2 · ϱ , p } + 1 states, where $$\varrho $$ ϱ and p are the greatest primes in the factorization of the, respectively, pre-periodic and periodic part of the language. Such a state cost cannot be improved by using nondeterminism, and it turns out to be exponentially lower in the worst case than the state costs of equivalent classical models of finite-state automata.Next, we give a characterization of classes of unary languages accepted by non-constant sweep-bounded iterated uniform finite-state transducers in terms of time bounded one-way cellular automata. This characterization enables both to exhibit interesting families of unary non-regular languages accepted by iterated uniform finite-state transducers, and to prove the undecidability of several questions related to iterated uniform finite-state transducers accepting unary languages with an amount of sweeps that is at least logarithmic.

Martin Kutrib, Andreas Malcher, Carlo Mereghetti, Beatrice Palano
New Bounds on the Half-Duplex Communication Complexity

In this work, we continue the research started in [6], where the authors suggested to study the half-duplex communication complexity. Unlike the classical model of communication complexity introduced by Yao, in the half-duplex model, Alice and Bob can speak or listen simultaneously, as if they were talking using a walkie-talki.e. The motivation for such a communication model comes from the study of the KRW conjecture. Following the open questions formulated in [6], we prove lower bounds for the disjointness function in all variants of half-duplex models and an upper bound in the half-duplex model with zero, that separates disjointness from the inner product function in this setting. Next, we prove lower and upper bounds on the half-duplex complexity of the Karchmer-Wigderson games for the counting functions and for the recursive majority function, adapting the ideas used in the classical communication complexity. Finally, we define the non-deterministic half-duplex complexity and establish bounds connecting it with non-deterministic complexity in the classical model.

Yuriy Dementiev, Artur Ignatiev, Vyacheslav Sidelnik, Alexander Smal, Mikhail Ushakov
Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as r) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the r parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures.In this paper, we investigate $$\rho (v)$$ ρ ( v ) , the ratio of r and of the number of runs of the BWT of the reverse of v. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as $$\rho (v) = O(\log ^2(n))$$ ρ ( v ) = O ( log 2 ( n ) ) , for any string v of length n. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which $$\rho (v) = \varTheta (\log n)$$ ρ ( v ) = Θ ( log n ) holds, thus giving the first non-trivial lower bound on $$\rho (n)$$ ρ ( n ) , the maximum over all strings of length n.Our results suggest that r is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string’s combinatorial properties.

Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Nicola Prezza, Marinella Sciortino, Anna Toffanello
On the Redundancy of D-Ary Fano Codes

We study the redundancy of D-ary Fano source codes. We show that a novel splitting criterion allows to prove a bound on the redundancy of the resulting code which sharpens the guarantee provided by Shannon’s classical result for the case of an optimal code.In particular we show that, for any $$D \ge 2$$ D ≥ 2 and for every source distribution $$\mathbf{p} = p_1, \dots , p_n,$$ p = p 1 , ⋯ , p n , there is a D-ary Fano code that satisfies the redundancy bound 1 $$\begin{aligned} \overline{L} - H_D(\mathbf{p}) \le 1- p_{\min }, \end{aligned}$$ L ¯ - H D ( p ) ≤ 1 - p min , where, $$\overline{L}$$ L ¯ denotes the average codeword length, $$p_{\min } = \min _i p_i,$$ p min = min i p i , and $$H_D(\mathbf{p}) = -\sum _{i=1}^n p_i \log _D(p_i)$$ H D ( p ) = - ∑ i = 1 n p i log D ( p i ) is the D-ary entropy of the source.The existence of D-ary Fano codes achieving such a bound had been conjectured in [ISIT2015], where, however, the construction proposed achieves the bound only for $$D = 2, 3,4.$$ D = 2 , 3 , 4 . In [ISIT2020], a novel construction was proposed leading to the proof that the redundancy bound in (1) above also holds for $$D=5$$ D = 5 (and some other special cases). This result was attained by a dynamic programming based algorithm with time complexity O(Dn) (per node of the codetree).Here, besides proving that the redundancy bound in (1) can be achieved, unconditionally, for every $$D > 3,$$ D > 3 , we also significantly improve the time complexity of the algorithm building a D-ary Fano code tree achieving such a bound: We show that, for every $$D \ge 4,$$ D ≥ 4 , a D-ary Fano code tree satisfying (1) can be constructed by an efficient greedy procedure that has complexity $$O(D\log _2 n)$$ O ( D log 2 n ) per node of the codetree (i.e., improving from linear time to logarithmic time in n).

Ferdinando Cicalese, Massimiliano Rossi
On the Terminal Connection Problem

A connection tree of a graph G for a terminal set W is a tree subgraph T of G such that $${\text {leaves}}(T) \subseteq W \subseteq V(T)$$ leaves ( T ) ⊆ W ⊆ V ( T ) . A non-terminal vertex of a connection tree T is called linker if its degree in T is exactly 2, and it is called router if its degree in T is at least 3. The Terminal connection problem (TCP) asks whether G admits a connection tree for W with at most $$\ell $$ ℓ linkers and at most r routers, while the Steiner tree problem asks whether G admits a connection tree for W with at most k non-terminal vertices. We prove that TCP is NP-complete even when restricted to strongly chordal graphs and $$r \ge 0$$ r ≥ 0 is fixed. This result separates the complexity of TCP from the complexity of Steiner tree, which is known to be polynomial-time solvable on strongly chordal graphs. In contrast, when restricted to cographs, we prove that TCP is polynomial-time solvable, agreeing with the complexity of Steiner tree. Finally, we prove that TCP remains NP-complete on graphs of maximum degree 3 even if either $$\ell \ge 0$$ ℓ ≥ 0 or $$r\ge 0$$ r ≥ 0 is fixed.

Alexsander A. de Melo, Celina M. H. de Figueiredo, Uéverton S. Souza
Parameterized Complexity of d-Hitting Set with Quotas

In this paper we study a variant of the classic d -Hitting Set problem with lower and upper capacity constraints, say A and B, respectively. The input to the problem consists of a universe U, a set family, $$\mathscr {S} $$ S , of sets over U, where each set in the family is of size at most d, a non-negative integer k; and additionally two functions $$\alpha :\mathscr {S} \rightarrow \{1,\ldots ,A\}$$ α : S → { 1 , … , A } and $$\beta :\mathscr {S} \rightarrow \{1,\ldots ,B\}$$ β : S → { 1 , … , B } . The goal is to decide if there exists a hitting set of size at most k such that for every set S in the family $$\mathscr {S} $$ S , the solution contains at least $$\alpha (S)$$ α ( S ) elements and at most $$\beta (S)$$ β ( S ) elements from S. We call this the $$(A, B)$$ ( A , B ) -Multi d-Hitting Set problem. We study the problem in the realm of parameterized complexity. We show that $$(A, B)$$ ( A , B ) -Multi d-Hitting Set can be solved in $$\mathcal {O}^{\star }(d^{k}) $$ O ⋆ ( d k ) time. For the special case when $$d=3$$ d = 3 and $$d=4$$ d = 4 , we have an improved bound of $$\mathcal {O}^\star (2.2738^k)$$ O ⋆ ( 2 . 2738 k ) and $$\mathcal {O}^\star (3.562^{k})$$ O ⋆ ( 3 . 562 k ) , respectively. The former matches the running time of the classical 3-Hitting Set problem. Furthermore, we show that if we do not have an upper bound constraint and the lower bound constraint is same for all the sets in the family, say $$A>1$$ A > 1 , then the problem can be solved even faster than d-Hitting Set.We next investigate some graph-theoretic problems which can be thought of as an implicit d-Hitting Set problem. In particular, we study $$(A, B)$$ ( A , B ) -Multi Vertex Cover and $$(A, B)$$ ( A , B ) -Multi Feedback Vertex Set in Tournaments. In $$(A, B)$$ ( A , B ) -Multi Vertex Cover, we are given a graph G and a non-negative integer k, the goal is to find a subset $$S\subseteq V(G)$$ S ⊆ V ( G ) of size at most k such that for every edge in G, S contains at least A and at most B of its endpoints. Analogously, we can define $$(A, B)$$ ( A , B ) -Multi Feedback Vertex Set in Tournaments. We show that unlike Vertex Cover, which is same as $$(1, 2)$$ ( 1 , 2 ) -Multi Vertex Cover, $$(1, 1)$$ ( 1 , 1 ) -Multi Vertex Cover is polynomial-time solvable. Furthermore, unlike Feedback Vertex Set in Tournaments, $$(A, B)$$ ( A , B ) -Multi Feedback Vertex Set in Tournaments can be solved in polynomial time.

Sushmita Gupta, Pallavi Jain, Aditya Petety, Sagar Singh
Parameterizing Role Coloring on Forests

A role coloring of a graph is an assignment of colors to the vertices such that if any two vertices are assigned the same color, then their neighborhood are assigned the same set of colors. In k-Role Coloring, we want to ask whether a given graph is role colorable by using exactly k colors.Determining whether a graph admits a k-Role Coloring is a notoriously hard problem even for a fixed $$k \ge 2$$ k ≥ 2 . It is known to be NP-complete for $$k \ge 2$$ k ≥ 2 on general graphs. For many hereditary graph classes like chordal graphs, planar graphs and split graphs, k-Role Coloring is NP-complete even when k is a constant. A recent result shows that k-Role Coloring is NP-complete for bipartite graphs when $$k \ge 3$$ k ≥ 3 . The only known classes of graphs for which k-role coloring is polynomial time for any fixed k, are Cographs and Proper Interval graphs.We consider the parameterized complexity of $$(n-k)$$ ( n - k ) -role coloring on n vertex graphs parameterized by k. This parameterization had interesting fixed-parameter tractable algorithms for the standard proper coloring [Chor et al. WG2004] and list-coloring variants [Banik et al. IWOCA 2019, Gutin et al. STACS 2020]. As our main results, we show that this parameterization for role coloring has an $$n^{O(k)}$$ n O ( k ) algorithm on general graphs (putting the problem in the parameterized complexity class XP), and an $$f(k)n^{O(1)}$$ f ( k ) n O ( 1 ) algorithm (placing the problem in the class FPT) on forests (here f is a computable exponential function).

Sukanya Pandey, Venkatesh Raman, Vibha Sahlot
The Balanced Satisfactory Partition Problem

The Satisfactory Partition problem asks whether it is possible to partition the vertex set of a given undirected graph into two parts such that each vertex has at least as many neighbours in its own part as in the other part. The Balanced Satisfactory Partition problem is a variant of the above problem where the two partite sets are required to have the same cardinality. Both problems are known to be NP-complete but its parameterized complexity remains open until now. We enhance our understanding of the problem from the viewpoint of parameterized complexity. The two main results of the paper are the following: (1) The Satisfactory Partition problem and its balanced version are fixed parameter tractable (FPT) when parametrized by neighbourhood diversity, (2) The Balanced Satisfactory Partition problem is W[1]-hard when parametrized by treewidth.

Ajinkya Gaikwad, Soumen Maity, Shuvam Kant Tripathi
The Multiple Traveling Salesman Problem on Spiders

Given (i) a set of $$N+1$$ N + 1 vertices, that corresponds to N clients and 1 depot, (ii) the travel time between each pair of vertices and (iii) a number m of salespersons, the multiple traveling salesman problem consists in finding m tours such that, starting from the depot, all clients are visited in such a way that some objective function is minimized. The objective function we consider in this paper is the makespan. More precisely, the goal is to find m tours (one for each salesperson) that minimize the time elapsed from the beginning of the operation until the last salesman comes back to the depot. We take into account handling times, i.e., the time spent visiting each client, which we assume to be the same for all of them. We address the problem in the particular case where the depot-clients network is a spider, with the depot located at its center. We show that this case is NP-hard even for 2 salespersons. We also show structural properties of the optimal solutions. These properties allow us to devise a PTAS for minimizing the makespan. More precisely, a $$(1+ \varepsilon )$$ ( 1 + ε ) -approximation algorithm with running time $$N^{O(m / \varepsilon )}$$ N O ( m / ε ) .

Pedro Pérez-Escalona, Ivan Rapaport, José Soto, Ian Vidal
Tightness of Sensitivity and Proximity Bounds for Integer Linear Programs

We consider Integer Linear Programs (ILPs), where each variable corresponds to an integral point within a polytope $$\mathcal {P}\subseteq \mathbb {R}^{d}$$ P ⊆ R d , i. e., ILPs of the form $$\min \{c^{\top }x\mid \sum _{p\in \mathcal {P}\cap \mathbb {Z}^d} x_p p = b, x\in \mathbb {Z}^{|\mathcal {P}\cap \mathbb {Z}^d|}_{\ge 0}\}$$ min { c ⊤ x ∣ ∑ p ∈ P ∩ Z d x p p = b , x ∈ Z ≥ 0 | P ∩ Z d | } . The distance between an optimal fractional solution and an optimal integral solution (called the proximity) is an important measure. A classical result by Cook et al. (Math. Program., 1986) shows that it is at most $$\varDelta ^{\varTheta (d)}$$ Δ Θ ( d ) where $$\varDelta =\Vert \mathcal {P}\cap \mathbb {Z}^{d} \Vert _{\infty }$$ Δ = ‖ P ∩ Z d ‖ ∞ is the largest coefficient in the constraint matrix. Another important measure studies the change in an optimal solution if the right-hand side b is replaced by another right-hand side $$b'$$ b ′ . The distance between an optimal solution x w.r.t. b and an optimal solution $$x'$$ x ′ w.r.t. $$b'$$ b ′ (called the sensitivity) is similarly bounded, i. e., $$\Vert b-b' \Vert _{1}\cdot \varDelta ^{\varTheta (d)}$$ ‖ b - b ′ ‖ 1 · Δ Θ ( d ) , also shown by Cook et al. (Math. Program., 1986).Even after more than thirty years, these bounds are essentially the best known bounds for these measures. While some lower bounds are known for these measures, they either only work for very small values of $$\varDelta $$ Δ , require negative entries in the constraint matrix, or have fractional right-hand sides. Hence, these lower bounds often do not correspond to instances from algorithmic problems. This work presents for each $$\varDelta > 0$$ Δ > 0 and each $$d > 0$$ d > 0 ILPs of the above type with non-negative constraint matrices such that their proximity and sensitivity is at least $$\varDelta ^{\varTheta (d)}$$ Δ Θ ( d ) . Furthermore, these instances are closely related to instances of the Bin Packing problem as they form a subset of columns of the configuration ILP. We thereby show that the results of Cook et al. are indeed tight, even for instances arising naturally from problems in combinatorial optimization.

Sebastian Berndt, Klaus Jansen, Alexandra Lassota
Using the Metro-Map Metaphor for Drawing Hypergraphs

For a planar graph G and a set $$\varPi $$ Π of simple paths in G, we define a metro-map embedding to be a planar embedding of G and an ordering of the paths of $$\varPi $$ Π along each edge of G. This definition of a metro-map embedding is motivated by visual representations of hypergraphs using the metro-map metaphor. In a metro-map embedding, two paths cross in a so-called vertex crossing if they pass through the vertex and alternate in the circular ordering around it.We study the problem of constructing metro-map embeddings with the minimum number of crossing vertices, that is, vertices where paths cross. We show that the corresponding decision problem is NP-complete for general planar graphs but can be solved efficiently for trees or if the number of crossing vertices is constant. All our results hold both in a fixed and variable embedding settings.

Fabian Frank, Michael Kaufmann, Stephen Kobourov, Tamara Mchedlidze, Sergey Pupyrev, Torsten Ueckerdt, Alexander Wolff
Weighted Microscopic Image Reconstruction

Assume that we inspect a specimen represented as a collection of points. The points are typically organized on a grid structure in 2D- or 3D-space, and each point has an associated physical value. The goal of the inspection is to determine these values. Yet, measuring these values directly (by surgical probes) may damage the specimen or is simply impossible. The alternative is to employ aggregate measuring techniques (e.g., CT or MRI), whereby measurements are taken over subsets of points, and the exact values at each point are subsequently extracted by computational methods. In the Minimum Surgical Probing problem (MSP) the inspected specimen is represented by a graph G and a vector $$\ell \in \mathbb {R}^n$$ ℓ ∈ R n that assigns a value $$\ell _i$$ ℓ i to each vertex i. An aggregate measurement (called probe) centered at vertex i captures its entire neighborhood, i.e., the outcome of a probe centered at i is $$\mathcal{P}_i = \sum _{j \in N(i) \cup \{i\}} \ell _j$$ P i = ∑ j ∈ N ( i ) ∪ { i } ℓ j where N(i) is the open neighborhood of vertex i. Bar-Noy et al. [4] gave a criterion whether the vector $$\ell $$ ℓ can be recovered from the collection of probes $$\mathcal{P}= \{\, \mathcal{P}_v \; | \; v \in V(G)\}$$ P = { P v | v ∈ V ( G ) } alone. However, there are graphs where $$\mathcal{P}$$ P is inconclusive, i.e., there are several vectors $$\ell $$ ℓ that are consistent with $$\mathcal{P}$$ P . In these cases, we are allowed to use surgical probes. A surgical probe at vertex i returns $$\ell _i$$ ℓ i . The objective of MSP is to recover $$\ell $$ ℓ from $$\mathcal{P}$$ P and G using as few surgical probes as possible.In this work, we introduce the Weighted Minimum Surgical Probing (WMSP) problem in which a vertex i may have an aggregation coefficient $$w_i$$ w i , namely $$\mathcal{P}_i = \sum _{j \in N(i)} \ell _j + w_i \ell _i$$ P i = ∑ j ∈ N ( i ) ℓ j + w i ℓ i . We show that WMSP can be solved in polynomial time. Moreover, we analyze the number of required surgical probes depending on the weight vector $$w$$ w . For any graph, we give two boundaries outside of which no surgical probes are needed to recover the vector $$\ell $$ ℓ . The boundaries are connected to the (Signless) Laplacian matrix.In addition, we focus on the special case, where . We explore the range of possible behaviors of WMSP by determining the number of surgical probes necessary in certain graph families, such as trees and various grid graphs. Finally, we analyze higher dimensional grids graphs. For the hypercube, when , we only need surgical probes if the dimension is odd, and when , we only need surgical probes if the dimension is even. The number of surgical probes follows the binomial coefficients.

Amotz Bar-Noy, Toni Böhnlein, Zvi Lotker, David Peleg, Dror Rawitz

Foundations of Computer Science – Short Papers

Frontmatter
A Normal Sequence Compressed by  But Not by Lempel-Ziv 78

This paper compares the difference in performance of the Prediction by Partial Matching family of compressors ( $$\mathrm {PPM}^*$$ PPM ∗ and the original $$\mathrm {PPM}_{k}$$ PPM k ) and the Lempel-Ziv 78 (LZ) algorithm. We construct an infinite binary sequence whose worst-case compression ratio for PPM* is 0, while $$\mathrm {PPM}_{k}$$ PPM k ’s and LZ’s best-case compression ratios are at least 1/2 and 1 respectively. This sequence is normal and is an enumeration of all binary strings in order of length, i.e. all strings of length 1 followed by all strings of length 2 and so on. The sequence is built using repetitions of de Bruijn strings of increasing order.

Liam Jordon, Philippe Moser
Clusters of Repetition Roots: Single Chains

This work proposes a new approach towards solving an over 20 years old conjecture regarding the maximum number of distinct squares that a word can contain. To this end we look at clusters of repetition roots, that is, the set of positions where the root u of a repetition $$u^\ell $$ u ℓ occurs. We lay the foundation of this theory by proving basic properties of these clusters and establishing upper bounds on the number of distinct squares when their roots form a chain with respect to the prefix order.

Szilárd Zsolt Fazekas, Robert Mercaş
Drawing Two Posets

We investigate the problem of drawing two posets of the same ground set so that one is drawn from left to right and the other one is drawn from the bottom up. The input to this problem is a directed graph $$G = (V, E)$$ G = ( V , E ) and two sets X, Y with $$X \cup Y = E$$ X ∪ Y = E , each of which can be interpreted as a partial order of V. The task is to find a planar drawing of G such that each directed edge in X is drawn as an x-monotone edge, and each directed edge in Y is drawn as a y-monotone edge. Such a drawing is called an xy-planar drawing.Testing whether a graph admits an xy-planar drawing is NP-complete in general. We consider the case that the planar embedding of G is fixed and the subgraph of G induced by the edges in Y is a connected spanning subgraph of G whose upward embedding is fixed. For this case we present a linear-time algorithm that determines whether G admits an xy-planar drawing and, if so, produces an xy-planar polyline drawing with at most three bends per edge.

Guido Brückner, Vera Chekan
Fair Division Is Hard Even for Amicable Agents

We consider the problem of distributing a collection of indivisible objects among agents in a manner that satisfies some desirable notions of fairness and efficiency. We allow agents to “share” goods in order to achieve efficiency and fairness goals which may be otherwise impossible to attain. In this context, our goal is to find allocations that minimize the “amount of sharing”. We follow up on recent work demonstrating that finding fair allocations with minimum sharing is tractable when valuations are non-degenerate, a notion which captures scenarios that are “far from identical”. This result holds for any fixed number of agents. We show that the usefulness of non-degeneracy does not scale to the setting of many agents. In particular, we demonstrate that the problem of finding fractionally Pareto optimal and envy-free allocations is NP-complete even for instances with constant degeneracy and no sharing. We also demonstrate an alterate approach to enumerating distinct consumption graphs for allocations with a small number of sharings.

Neeldhara Misra, Aditi Sethia
The Complexity of Flow Expansion and Electrical Flow Expansion

FlowExpansion is a network design problem, in which the input consists of a flow network and a set of candidate edges, which may be added to the network. Adding a candidate incurs given costs. The goal is to determine the cheapest set of candidate edges that, if added, allow the demands to be satisfied. FlowExpansion is a variant of the Minimum-Cost Flow problem with non-linear edge costs.We study FlowExpansion for both graph-theoretical and electrical flow networks. In the latter case this problem is also known as the Transmission Network Expansion Planning problem. We give a structured view over the complexity of the variants of FlowExpansion that arise from restricting, e.g., the graph classes, the capacities, or the number of sources and sinks. Our goal is to determine which restrictions have a crucial impact on the computational complexity. The results in this paper range from polynomial-time algorithms for the more restricted variants over $$\mathcal {NP}$$ NP -hardness proofs to proofs that certain variants are $$\mathcal {NP}$$ NP -hard to approximate even within a logarithmic factor of the optimal solution.

Dorothea Wagner, Matthias Wolf

Foundations of Software Engineering – Full papers

Frontmatter
An Infrastructure for Platform-Independent Experimentation of Software Changes

Current experimentation platforms for online controlled experimentation focus on the technical execution of an experiment. This makes them specific to the application domain, the expected infrastructure, and the used technology. Moreover, the experiment definitions include numerous implicit assumptions about the platform’s implementation. As a result, experiments are difficult to replicate or compare across platforms or even platform versions.This paper presents an experimentation infrastructure based on platform-independent experimentation of software changes. Experiments are defined technology-independently and the experimentation platform’s role is reduced to execution. The explicit definition of experiments in an independent artifact and the modular architecture of the services make experimentation replicable and the architecture open to change. Additionally, a lightweight approach to include the knowledge of past experiments is demonstrated. The infrastructure is presented by a running example experiment and its prototypical implementation.

Florian Auer, Michael Felderer
Using Process Models to Understand Security Standards

Many industrial software development processes today have to comply with security standards such as the IEC 62443-4-1. These standards, written in natural language, are ambiguous and complex to understand. This is especially true for non-security experts. Security practitioners thus invest much effort into comprehending standards and, later, into introducing them to development teams. However, our experience in the industry shows that development practitioners might very well also read such standards, but nevertheless end up inviting experts for interpretation (or confirmation). Such a scenario is not in tune with current trends and needs of increasing velocity in continuous software engineering. In this paper, we propose a tool-supported approach to make security standards more precise and easier to understand for both non-security as well as security experts by applying process models. This approach emerges from a large industrial company and encompasses so far the IEC 62443-4–1 standard. We further present a case study with 16 industry practitioners showing how the approach improves communication between development and security compliance practitioners.

Fabiola Moyón, Daniel Méndez, Kristian Beckers, Sebastian Klepper
Web Test Automation: Insights from the Grey Literature

This paper provides the results of a survey of the grey literature concerning best practices for end-to-end web test automation. We analyzed more than 2,400 sources (e.g., blog posts, white-papers, user manuals, GitHub repositories) looking for guidelines by IT professionals on how to develop and maintain web test code. Ultimately, we filtered 142 relevant documents from which we extracted a taxonomy of guidelines divided into technical tips (i.e., concerning the development, maintenance, and execution of web tests), and business-level tips (i.e, concerning the planning and management of testing teams, design, and process). The paper concludes by distilling the ten most cited best practices for developing good quality automated web tests.

Filippo Ricca, Andrea Stocco

Foundations of Data Science and Engineering – Full Papers

Frontmatter
A Pipeline for Measuring Brand Loyalty Through Social Media Mining

Enhancing customer relationships through social media is an area of high relevance for companies. To this aim, Social Business Intelligence (SBI) plays a crucial role by supporting companies in combining corporate data with user-generated content, usually available as textual clips on social media. Unfortunately, SBI research is often constrained by the lack of publicly-available, real-world data for experimental activities. In this paper, we describe our experience in extracting social data and processing them through an enrichment pipeline for brand analysis. As a first step, we collect texts from social media and we annotate them based on predefined metrics for brand analysis, using features such as sentiment and geolocation. Annotations rely on various learning and natural language processing approaches, including deep learning and geographical ontologies. Structured data obtained from the annotation process are then stored in a distributed data warehouse for further analysis. Preliminary results, obtained from the analysis of three well known ICT brands, using data gathered from Twitter, news portals, and Amazon product reviews, show that different evaluation metrics can lead to different outcomes, indicating that no single metric is dominant for all brand analysis use cases.

Hazem Samoaa, Barbara Catania
Predicting Tennis Match Outcomes with Network Analysis and Machine Learning

Singles tennis is one of the most popular individual sports in the world. Many researchers have embarked on a wide range of approaches to model a tennis match, using probabilistic modeling, or applying machine learning models to predict the outcome of matches. In this paper, we propose a novel approach based on network analysis to infer a surface-specific and time-varying score for professional tennis players and use it in addition to players’ statistics of previous matches to represent tennis match data. Using the resulting features, we apply advanced machine learning paradigms such as Multi-Output Regression and Learning Using Privileged Information, and compare the results with standard machine learning approaches. The models are trained and tested on more than 83,000 men’s singles tennis matches between the years 1991 and 2020. Evaluating the results shows the proposed methods provide more accurate predictions of tennis match outcome than classical approaches and outperform the existing methods in the literature and the current state-of-the-art models in tennis.

Firas Bayram, Davide Garbarino, Annalisa Barla
Role-Based Access Control on Graph Databases

We propose a novel access control system for graph-based models, which supports schema constraints and constraint rules to protect the data, as well as user context rules. We consider systems with huge volumes of data, where the efficient processing of aggregation operations is of paramount importance. To comply with this goal, we introduce an architecture with modules for rewriting, planning and executing queries in parallel, respecting the access constraints. Performance tests show the efficiency of our distributed query processing mechanism. Compared to a centralized approach, it reduces execution time from 25% to 68% for conjunctive queries and from 12% to 59% for queries involving aggregation.

Jacques Chabin, Cristina D. A. Ciferri, Mirian Halfeld-Ferrari, Carmem S. Hara, Raqueline R. M. Penteado
Semi-automatic Column Type Inference for CSV Table Understanding

Spreadsheets are often used as a simple way for representing tabular data. However, since they do not impose any restriction on their table structures and contents, their automatic processing and the integration with other information sources are particularly hard problems to solve. Many table understanding approaches have been proposed for extracting data from tables and transforming them in meaningful information. However, they require some regularities on the table contents.Starting from CSV spreadsheets that present values of different types and errors, in this paper we introduce an approach for inferring the types of columns in CSV tables by exploiting a multi-label classification approach. By means of our approach, each column of the table can be associated with a simple datatype (such as integer, float, text), a domain-specific one (such as the name of a municipality, and address), or an “union” of types (that takes into account the frequency of the corresponding values). Since the automatically inferred types might not be accurate, graphical interfaces have been developed for supporting the user in fixing the mistakes. Experimental results are finally reported on real spreadsheets obtained by a debt collection agency.

Sara Bonfitto, Luca Cappelletti, Fabrizio Trovato, Giorgio Valentini, Marco Mesiti

Foundations of Data Science and Engineering – Short Papers

Frontmatter
Metadata Management on Data Processing in Data Lakes

Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst’s time. To improve the efficiency of data processing in a DL, we propose a framework which includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data transforming operations which can be applied to different types of datasets. We illustrate and validate our proposal with a real medical use case implementation.

Imen Megdiche, Franck Ravat, Yan Zhao
S2CFT: A New Approach for Paper Submission Recommendation

There have been a massive number of conferences and journals in computer science that create a lot of difficulties for scientists, especially for early-stage researchers, to find the most suitable venue for their scientific submission. In this paper, we present a novel approach for building a paper submission recommendation system by using two different types of embedding methods, GloVe and FastText, as well as Convolutional Neural Network (CNN) and LSTM to extract useful features for a paper submission recommendation model. We consider seven combinations of initial attributes from a given submission: title, abstract, keywords, title + keyword, title + abstract, keyword + abstract, and title + keyword + abstract. We measure these approaches’ performance on one dataset, presented by Wang et al., in terms of top K accuracy and compare our methods with the S2RSCS model, the state-of-the-art algorithm on this dataset. The experimental results show that CNN + FastText can outperform other approaches (CNN + GloVe, LSTM + GloVe, LSTM + FastText, S2RSCS) in term of top 1 accuracy for seven types of input data. Without using a list of keywords in the input data, CNN + GloVe/FastText can surpass other techniques. It has a bit lower performance than S2RSCS in terms of the top 3 and top 5 accuracies when using the keyword information. Finally, the combination of S2RSCS and CNN + FastText, namely S2CFT, can create a better model that bypasses all other methods by top K accuracy (K = 1,3,5,10).

Dac Nguyen, Son Huynh, Phong Huynh, Cuong V. Dinh, Binh T. Nguyen

Foundations of Algorithmic Computational Biology – Full Papers

Frontmatter
Adding Matrix Control: Insertion-Deletion Systems with Substitutions III

We discuss substitutions as a further type of operations, added to matrix insertion-deletion systems. For such systems, we additionally discuss the effect of appearance checking. This way, we obtain new characterizations of the families of context-sensitive and the family of recursively enumerable languages. To reach computational completeness, not much context is needed for systems with appearance checking.

Martin Vu, Henning Fernau
Sorting by Multi-cut Rearrangements

Let S be a string built on some alphabet $$\varSigma $$ Σ . A multi-cut rearrangement of S is a string $$S'$$ S ′ obtained from S by an operation called k-cut rearrangement, that consists in (1) cutting S at a given number k of places in S, making S the concatenated string $$X_1\cdot X_2\cdot X_3\ldots X_k\cdot X_{k+1}$$ X 1 · X 2 · X 3 … X k · X k + 1 , where $$X_1$$ X 1 and $$X_{k+1}$$ X k + 1 are possibly empty, and (2) rearranging the $$X_i$$ X i s so as to obtain $$S'=X_{\pi (1)}\cdot X_{\pi (2)}\cdot X_{\pi (3)}\ldots X_{\pi (k+1)}$$ S ′ = X π ( 1 ) · X π ( 2 ) · X π ( 3 ) … X π ( k + 1 ) , $$\pi $$ π being a permutation on $$1,2\ldots k+1$$ 1 , 2 … k + 1 satisfying $$\pi (1)=1$$ π ( 1 ) = 1 and $$\pi (k+1)=k+1$$ π ( k + 1 ) = k + 1 . Given two strings S and T built on the same multiset of characters from $$\varSigma $$ Σ , the Sorting by Multi-cut Rearrangements (SMCR) problem asks whether a given number $$\ell $$ ℓ of $$k$$ k -cut rearrangements suffices to transform S into T. The SMCR problem generalizes and thus encompasses several classical genomic rearrangements problems, such as Sorting by Transpositions and Sorting by Block Interchanges. It may also model chromoanagenesis, a recently discovered phenomenon consisting in massive simultaneous rearrangements. In this paper, we study the SMCR problem from an algorithmic complexity viewpoint, and provide, depending on the respective values of k and $$\ell $$ ℓ , polynomial-time algorithms as well as NP-hardness, FPT-algorithms, W[1]-hardness and approximation results, either in the general case or when S and T are permutations.

Laurent Bulteau, Guillaume Fertin, Géraldine Jean, Christian Komusiewicz
Graphs Cannot Be Indexed in Polynomial Time for Sub-quadratic Time String Matching, Unless SETH Fails

The string matching problem on a node-labeled graph $$G=(V,E)$$ G = ( V , E ) asks whether a given pattern string P has an occurrence in G, in the form of a path whose concatenation of node labels equals P. This is a basic primitive in various problems in bioinformatics, graph databases, or networks, but only recently proven to have a O(|E||P|)-time lower bound, under the Orthogonal Vectors Hypothesis (OVH). We consider here its indexed version, in which we can index the graph in order to support time-efficient string queries.We show that, under OVH, no polynomial-time indexing scheme of the graph can support querying P in time $$O(|P|+|E|^\delta |P|^\beta )$$ O ( | P | + | E | δ | P | β ) , with either $$\delta < 1$$ δ < 1 or $$\beta < 1$$ β < 1 . As a side-contribution, we introduce the notion of linear independent-components (lic) reduction , allowing for a simple proof of our result. As another illustration that hardness of indexing follows as a corollary of a lic reduction, we also translate the quadratic conditional lower bound of Backurs and Indyk (STOC 2015) for the problem of matching a query string inside a text, under edit distance. We obtain an analogous tight quadratic lower bound for its indexed version, improving the recent result of Cohen-Addad, Feuilloley and Starikovskaya (SODA 2019), but with a slightly different boundary condition.

Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu
Backmatter
Metadata
Title
SOFSEM 2021: Theory and Practice of Computer Science
Editors
Tomáš Bureš
Prof. Riccardo Dondi
Johann Gamper
Prof. Giovanna Guerrini
Tomasz Jurdziński
Prof. Dr. Claus Pahl
Dr. Florian Sikora
Dr. Prudence W.H. Wong
Copyright Year
2021
Electronic ISBN
978-3-030-67731-2
Print ISBN
978-3-030-67730-5
DOI
https://doi.org/10.1007/978-3-030-67731-2

Premium Partner