Skip to main content

2016 | Buch

SOFSEM 2016: Theory and Practice of Computer Science

42nd International Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 23-28, 2016, Proceedings

herausgegeben von: Rūsiņš Mārtiņš Freivalds, Gregor Engels, Barbara Catania

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 42nd International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2016, held in Harrachov, Czech Republic, in January 2016.
The 43 full papers presented in this volume were carefully reviewed and selected from 116 submissions. They are organized in topical sections named: foundations of computer science; software engineering: methods, tools, applications; and data, information, and knowledge engineering. The volume also contains 7 invited talks in full paper length.

Inhaltsverzeichnis

Frontmatter

Foundations of Computer Science (Invited Talks)

Frontmatter
Cryptography in a Quantum World

Although practised as an art and science for ages, cryptography had to wait until the mid-twentieth century before Claude Shannon gave it a strong mathematical foundation. However, Shannon’s approach was rooted is his own information theory, itself inspired by the classical physics of Newton and Einstein. But our world is ruled by the laws of quantum mechanics. When quantum-mechanical phenomena are taken into account, new vistas open up both for codemakers and codebreakers. Is quantum mechanics a blessing or a curse for the protection of privacy? As we shall see, the jury is still out!

Gilles Brassard
Relating Sublinear Space Computability Among Graph Connectivity and Related Problems

We investigate sublinear-space computability relation among the directed graph vertex connectivity problem and its related problems, where by “sublinear-space computability” we mean in this paper $$O(n^{1-\varepsilon })$$-space and polynomial-time computability w.r.t. the number n of vertices. We demonstrate algorithmic techniques to relate the sublinear-space computability of directed graph connectivity and undirected graph length bounded connectivity.

Tatsuya Imai, Osamu Watanabe
Learning Automatic Families of Languages

A class of languages is automatic if it is uniformly regular using some regular index set for the languages. In this survey we report on work about the learnability in the limit of automatic classes of languages, with some special emphasis to automatic learners.

Sanjay Jain, Frank Stephan

Software Engineering: Methods, Tools, Applications (Invited Talks)

Frontmatter
From ESSENCE to Theory Oriented Software Engineering

The Essence standard combines a kernel and a modelling language for software engineering. It defines dynamic semantics of Essence by a mixture of formal and informal means. This paper presents a uniform formalization of the dynamic semantics based on a graph grammar and discusses various applications of this grammar. It is shown that solid formal foundation is useful for research towards theory oriented software engineering.

Sebastian Holtappels, Michael Striewe, Michael Goedicke
Incremental Queries and Transformations: From Concepts to Industrial Applications

Model-driven engineering (MDE) is widely used nowadays in the design of embedded systems, especially in the automotive, avionics or telecommunication domain. Behind the scenes, design and verification tools in these domains frequently exploit advanced model query and transformation techniques to support various rich tool features. The rapid increase in the size and complexity of system models has drawn significant attention to incremental model query and transformation approaches, which enable fast and incremental reactions to model changes caused by systems engineers or automated design steps. In this paper, I overview two open source Eclipse projects, EMF-IncQuery and Viatra, which have been actively used as a basis for developing various academic and industrial tools for critical systems.

Dániel Varró

Data, Information, and Knowledge Engineering (Invited Talks)

Frontmatter
Big Sequence Management: A glimpse of the Past, the Present, and the Future

There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences, or data series. Examples of such applications come from biology, astronomy, entomology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. In this work, we describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce solutions to this problem. Furthermore, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series. Finally, we present our vision for the future in big sequence management research.

Themis Palpanas
Pay-as-you-go Data Integration: Experiences and Recurring Themes

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual (with tool support) construction of mappings between sources and the global schema. This overall approach can provide high quality integrations but at high cost, and tends to be unsuitable for areas with large numbers of rapidly changing sources, where users may be willing to cope with a less than perfect integration. Pay-as-you-go data integration has been proposed to overcome the need for costly manual data integration. Pay-as-you-go data integration tends to involve two steps. Initialisation: automatic creation of mappings (generally of poor quality) between sources. Improvement: the obtaining of feedback on some aspect of the integration, and the application of this feedback to revise the integration. There has been considerable research in this area over a ten year period. This paper reviews some experiences with pay-as-you-go data integration, providing a framework that can be used to compare or develop pay-as-you-go data integration techniques.

Norman W. Paton, Khalid Belhajjame, Suzanne M. Embury, Alvaro A. A. Fernandes, Ruhaila Maskat

Foundations of Computer Science (Regular Papers)

Frontmatter
Robust Recoverable Path Using Backup Nodes

We consider routing in networks in the presence of node failures. The focus is specifically on the single-node failure model, which captures the resilience of networks in a realistic fault setting. We introduce a model of recoverable routing, where we ask for an s-t-path that can be repaired easily and locally by assigning ‘backup nodes:’ when a node on the path fails, it is replaced by its backup node. We resolve the basic algorithmic and complexity questions for finding paths in this model, depending on the properties we require of the backup assignment. For some cases we provide polynomial-time algorithms, and for the others we prove $$\mathcal {N}\!\mathcal {P}$$-completeness and provide exponential-time algorithms. Lastly, we consider a problem variant where the path is given and ask for a backup assignment.

Marjan van den Akker, Hans L. Bodlaender, Thomas C. van Dijk, Han Hoogeveen, Erik van Ommeren
On Contact Graphs with Cubes and Proportional Boxes

We study two variants of the problem of contact representation of planar graphs with axis-aligned boxes. In a cube-contact representation we realize each vertex with a cube, while in a proportional box-contact representation each vertex is an axis-aligned box with a prespecified volume. We show how to construct such representations representation for some classes of planar graphs.

M. Jawaherul Alam, Michael Kaufmann, Stephen G. Kobourov
Orthogonal Layout with Optimal Face Complexity

We study a problem motivated by rectilinear schematization of geographic maps. Given a biconnected plane graph G and an integer $$k\ge 0$$, does G have a strict-orthogonal drawing with at most k reflex angles per face? For $$k=0$$ the problem is equivalent to realizing each face as a rectangle. The problem can be reduced to a max-flow problem in some linear-size nonplanar network, but the best solutions require $$\varOmega (n^{1.5} \log n\log k)$$ time. We describe a graph matching approach that can decide strict-orthogonal drawability for arbitrary reflex complexity k in $$O((nk)^{1.5})$$ time, which is faster for constant values of k. In contrast, if the embedding is not fixed, we prove that it is NP-complete to decide whether a planar graph admits a strict-orthogonal drawing with reflex face complexity 4.

M. Jawaherul Alam, Stephen G. Kobourov, Debajyoti Mondal
L-Drawings of Directed Graphs

We introduce L-drawings, a novel paradigm for representing directed graphs aiming at combining the readability features of orthogonal drawings with the expressive power of matrix representations. In an L-drawing, vertices have exclusive x- and y-coordinates and edges consist of two segments, one exiting the source vertically and one entering the destination horizontally.We study the problem of computing L-drawings using minimum ink. We prove its NP-completeness and provide a heuristic based on a polynomial-time algorithm that adds a vertex to a drawing using the minimum additional ink. We performed an experimental analysis of the heuristic which confirms its effectiveness.

Patrizio Angelini, Giordano Da Lozzo, Marco Di Bartolomeo, Valentino Di Donato, Maurizio Patrignani, Vincenzo Roselli, Ioannis G. Tollis
A Combinatorial Model of Two-Sided Search

We study a new model of combinatorial group testing in a network. An object (the target) occupies an unknown node in the network. At each time instant, we can test (or query) a subset of the nodes to learn whether the target occupies any of such nodes. Unlike the case of conventional group testing problems on graphs, the target in our model can move immediately after each test to any node adjacent to each present location. The search finishes when we are able to locate the object with some predefined accuracy s (a parameter fixed beforehand), i.e., to indicate a set of s nodes that include the location of the object.In this paper we study two types of problems related to the above model: (i) what is the minimum value of the accuracy parameter for which a search strategy in the above sense exists; (ii) given the accuracy, what is the minimum number of tests that allow to locate the target. We study these questions on paths, cycles, and trees as underlying graphs and provide tight answer for the above questions. We also considered a restricted variant of the problem, where the number of moves of the target is bounded.

Harout Aydinian, Ferdinando Cicalese, Christian Deppe, Vladimir Lebedev
On the Power of Laconic Advice in Communication Complexity

We continue the study of a recently introduced model of communication complexity with advice, focusing on the power of advice in the context of equality of bitstrings and divisibility of natural numbers. First, we establish that the equality problem admits a protocol of polylogarithmic communication, provided a laconic advice of just one bit. For the divisibility problem, we design a protocol with sublinear communication and advice of roughly $${\tilde{O}}(\sqrt{n})$$. We complement our result on divisibility with a matching lower bound in a restricted setting using a recent result of Chattopadhyay et al. and a reduction from set-disjointness to divisibility.

Kfir Barhum, Juraj Hromkovič
Using Attribute Grammars to Model Nested Workflows with Extra Constraints

Workflow is a formal description of a process. Nested workflows were proposed to model processes with a hierarchical structure and they support extra logical and temporal constraints to express relations beyond the hierarchical structure. This workflow model supports scheduling applications with a known number of activities in the process, but it cannot be used to model planning problems, where the number of activities is unknown beforehand. In this paper we propose to model nested workflows using a modified version of attribute grammars. In particular we show that nested workflows with extra constraints can be fully translated to attribute grammars. The major advantage of this novel modeling framework is a support for recursive tasks that can model planning problems in the style of hierarchical task networks.

Roman Barták
A Natural Counting of Lambda Terms

We study the sequence of numbers corresponding to $$\lambda $$-terms of given size in the model based on de Bruijn indices. It turns out that the sequence enumerates also two families of binary trees, i.e. black-white and zigzag-free ones. We provide a constructive proof of this fact by exhibiting appropriate bijections. Moreover, we investigate the asymptotic density of $$\lambda $$-terms containing an arbitrary fixed subterm, showing that strongly normalizing terms are of density 0 among all $$\lambda $$-terms.

Maciej Bendkowski, Katarzyna Grygiel, Pierre Lescanne, Marek Zaionc
Online Minimum Spanning Tree with Advice
(Extended Abstract)

In the online minimum spanning tree problem, a graph is revealed vertex by vertex; together with every vertex, all edges to vertices that are already known are given, and an online algorithm must irrevocably choose a subset of them as a part of its solution. The advice complexity of an online problem is a means to quantify the information that needs to be extracted from the input to achieve good results. For a graph of size n, we show an asymptotically tight bound of $$\varTheta (n\log n)$$ on the number of advice bits to produce an optimal solution for any given graph. For particular graph classes, e.g., with bounded degree or a restricted edge weight function, we prove that the upper bound can be drastically reduced; e.g., $$5(n-1)$$ advice bits allow to compute an optimal result if the weight function is the Euclidean distance; if the graph is complete, even a logarithmic number suffices. Some of these results make use of the optimality of Kruskal’s algorithm for the offline setting. We also study the trade-off between the number of advice bits and the achievable competitive ratio. To this end, we perform a reduction from another online problem to obtain a linear lower bound on the advice complexity for any near-optimal solution. Using our results from the advice complexity finally allows us to give a lower bound on the expected competitive ratio of any randomized online algorithm for the problem.

Maria Paola Bianchi, Hans-Joachim Böckenhauer, Tatjana Brülisauer, Dennis Komm, Beatrice Palano
Subsequence Automata with Default Transitions

Let S be a string of length n with characters from an alphabet of size $$\sigma $$. The subsequence automaton of S (often called the directed acyclic subsequence graph) is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is $$O(n\sigma )$$ and that this bound is asymptotically optimal.In this paper, we consider subsequence automata with default transitions, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the delay, i.e., the maximum number of consecutive default transitions followed before consuming a character.Specifically, given any integer parameter k, $$1 < k \le \sigma $$, we present a subsequence automaton with default transitions of size $$O(nk\log _{k}\sigma )$$ and delay $$O(\log _k \sigma )$$. Hence, with $$k = 2$$ we obtain an automaton of size $$O(n \log \sigma )$$ and delay $$O(\log \sigma )$$. On the other extreme, with $$k = \sigma $$, we obtain an automaton of size $$O(n \sigma )$$ and delay O(1), thus matching the bound for the standard subsequence automaton construction. The key component of our result is a novel hierarchical automata construction of independent interest.

Philip Bille, Inge Li Gørtz, Frederik Rye Skjoldjensen
Run-Time Checking Multi-threaded Java Programs

Assertion checking traditionally focused on state-based properties. In a multi-threaded environment, approaches based on shared-state require complex locking mechanisms to ensure that specifications are checked atomically (in the same state). In addition to this increased complexity, locks also negatively affect performance.In this paper, we extend both the underlying theory and the practical implementation of SAGA, a run-time checker for single-threaded Java programs, to multi-threading, while avoiding locks.

Frank S. de Boer, Stijn de Gouw
Online Graph Coloring with Advice and Randomized Adversary
(Extended Abstract)

We generalize the model of online computation with three players (algorithm, adversary and an oracle called advisor) by strengthening the power of the adversary by randomization. In our generalized model, the advisor knows everything about the adversary except the random bits the adversary may use.We examine the expected competitive ratio of online algorithms within this model in order to measure the hardness of online problems in a new way. We start our investigation by proving upper and lower bounds on the competitive ratio for the online graph coloring problem.

Elisabet Burjons, Juraj Hromkovič, Xavier Muñoz, Walter Unger
Pseudoknot-Generating Operation

A pseudoknot is an intra-molecular structure formed primarily in RNA strands and much research has been done to predict efficiently pseudoknot structures in RNA. We define an operation that generates all pseudoknots from a given sequence and consider algorithmic and language theoretic properties of the operation. We give an efficient algorithm to decide whether a given string is a pseudoknot of a regular language L—the runtime is linear if L is given by a deterministic finite automaton. We consider closure and decision properties of the pseudoknot-generating operation. For DNA encoding applications, pseudoknot structures are undesirable. We give polynomial-time algorithms to decide whether a regular language L contains a pseudoknot or a pseudoknot generated by some string of L. Furthermore, we show that the corresponding questions for context-free languages are undecidable.

Da-Jung Cho, Yo-Sub Han, Timothy Ng, Kai Salomaa
Capabilities of Ultrametric Automata with One, Two, and Three States

Ultrametric automata use p-adic numbers to describe the random branching of the process of computation. Previous research has shown that ultrametric automata can have a significant decrease in computing complexity. In this paper we consider the languages that can be recognized by one-way ultrametric automata with one, two, and three states. We also show an example of a promise problem that can be solved by ultrametric integral automaton with three states.

Maksims Dimitrijevs
The Complexity of Paging Against a Probabilistic Adversary

We consider deterministic online algorithms for paging. The offline version of the paging problem, in which the whole input is given in advance, is known to be easily solvable. If the input is random, chosen according to some known probability distribution, an $$\mathcal {O}\mathopen {}\left( \log k\right) $$-competitive algorithm exists. Moreover, there are distributions, where no algorithm can be better than $$\mathrm {\Omega }\mathopen {}\left( \log k\right) $$-competitive.In this paper, we ask the question of what happens if it is known that the input is one from a set of $$\ell $$ potential candidates, chosen according to some probability distribution. We present an $$\mathcal {O}\mathopen {}\left( \log \ell \right) $$-competitive algorithm, and show a matching lower bound.

Stefan Dobrev, Juraj Hromkovič, Dennis Komm, Richard Královič, Rastislav Královič, Tobias Mömke
On Parity Game Preorders and the Logic of Matching Plays

Parity games can be used to solve satisfiability, verification and controller synthesis problems. As part of an effort to better understand their nature, or the nature of the problems they solve, preorders on parity games have been studied. Defining these relations, and in particular proving their transitivity, has proven quite difficult on occasion. We propose a uniform way of lifting certain preorders on Kripke structures to parity games and study the resulting preorders. We explore their relation with parity game preorders from the literature and we study new relations. Finally, we investigate whether these preorders can also be obtained via modal characterisations of the preorders.

M. W. Gazda, T. A. C. Willemse
A PTAS for Scheduling Unrelated Machines of Few Different Types

Scheduling on Unrelated Machines is a classical optimization problem where n jobs have to be distributed to m machines. Each of the jobs $$j \in \{1, \ldots , n\}$$ has on machine $$i \in \{1, \ldots , m\}$$ a processing time $$p_{ij} \ge 0$$. The goal is to minimize the makespan, i.e. the maximum completion time of the longest-running machine. Unless $$\mathrm {P}= \mathrm {NP}$$, this problem does not allow for a polynomial-time approximation algorithm with a ratio better than $$\frac{3}{2}$$. A natural scenario is however that many machines are of the same type, like a CPU and GPU cluster: for each of the K machine types, the machines $$i \ne i'$$ of the same type k satisfy $$p_{ij} = p_{i'j}$$ for all jobs j. For the case where the number K of machine types is constant, this paper presents an approximation scheme, i.e. an algorithm of approximation ratio $$1+\varepsilon $$ for $$\varepsilon > 0$$, with an improved running time only single exponential in $$\frac{1}{\varepsilon }$$.

Jan Clemens Gehrke, Klaus Jansen, Stefan E. J. Kraft, Jakob Schikowski
Compacting a Dynamic Edit Distance Table by RLE Compression

Kim and Park [A dynamic edit distance table, J. Disc. Algo., 2:302–312, 2004] proposed a method (KP) based on a “dynamic edit distance table” that allows one to efficiently maintain edit distance information between two strings A of length m and B of length n when the strings can be modified by single-character edits to their left or right ends. This type of computation is useful e.g. in cyclic string comparison. KP uses linear time, $$O(m+n)$$, to update the distance representation after each single edit. As noted in a recent extension of KP by Hyyrö et al. [Incremental string comparison, J. Disc. Algo., 34:2-17, 2015], a practical bottleneck is that the method needs $$\varTheta (mn)$$ space to store a representation of a complete $$m \times n$$ edit distance table. In this paper we take the first steps towards reducing the space usage by RLE compressing A and B. Let M and N be the lengths of RLE compressed versions of A and B, respectively. We propose how to store the edit distance table using $$\varTheta (mN + Mn)$$ space while maintaining the same time complexity as the original method that does not use compression.

Heikki Hyyrö, Shunsuke Inenaga
Walking Automata in Free Inverse Monoids

Walking automata, be they running over words, trees or even graphs, possibly extended with pebbles that can be dropped and lifted on vertices, have long been defined and studied in Computer Science. However, questions concerning walking automata are surprisingly complex to solve. In this paper, we study a generic notion of walking automata over graphs whose semantics naturally lays within inverse semigroup theory. Then, from the simplest notion of walking automata on birooted trees, that is, elements of free inverse monoids, to the more general cases of walking automata on birooted finite subgraphs of Cayley’s graphs of groups, that is, elements of free E-unitary inverse monoids, we provide a robust algebraic framework in which various classes of recognizable or regular languages of birooted graphs can uniformly be defined and related one with the other.

David Janin
Precedence Scheduling with Unit Execution Time is Equivalent to Parametrized Biclique

We consider the following scheduling problem. Given m machines and n jobs with unit execution times and a precedence relation between the jobs, the problem is to assign each job to one of the machines. The objective is to find a schedule that minimizes the makespan (i.e. the length of the schedule).We reduce $$3\text {-CNF-SAT}$$ to this problem and obtain a new lower bound for the running time of $$2^{o(\sqrt{n\log n})}$$ assuming the Expontential Time Hypothesis ($$\text {ETH}$$). This improves the previous lower bound of $$2^{o(\sqrt{n})}$$ also due to the $$\text {ETH}$$ and a reduction by Ullman [13] or, alternatively, a reduction from the k-Clique problem by Lenstra and Rinnooy Kan [10].For the corresponding decision problem of whether there is a schedule with target makespan $$\mathbf{T }=3$$ or not, we further show the equivalence to a classical graph problem, the parametrized Biclique problem. The equivalence also holds for the same scheduling problem with the additional restriction that no job has both a predecessor and a successor. By this we show that an improved lower bound for the running time for the Biclique problem will lead to an improved lower bound for the running time for our scheduling problem and vice versa. Moreover a transfered lower bound for the running time from the Biclique problem would also hold for the running time of approximation algorithms with ratio better than $$\frac{4}{3}\text {OPT}$$. That is, if for example there was no algorithm solving Biclique in $$2^{o(n)}$$ and U was the set of vertices in the Biclique problem, then there would be no approximation algorithm finding a solution for the introduced scheduling problem with $$\varTheta (|U|)$$ jobs, that finds a solution with a target makespan smaller than $$\frac{4}{3}$$ times the optimal makespan in time $$2^{o(n)}$$.

Klaus Jansen, Felix Land, Maren Kaluza
Grover’s Search with Faults on Some Marked Elements

Grover’s algorithm is a quantum query algorithm solving the unstructured search problem of size N using $$O(\sqrt{N})$$ queries. It provides a significant speed-up over any classical algorithm [2].The running time of the algorithm, however, is very sensitive to errors in queries. Multiple authors have analysed the algorithm using different models of query errors and showed the loss of quantum speed-up [1, 4].We study the behavior of Grover’s algorithm in the model where the search space contains both faulty and non-faulty marked elements. We show that in this setting it is indeed possible to find one of marked elements in $$O(\sqrt{N})$$ queries.

Dmitry Kravchenko, Nikolajs Nahimovs, Alexander Rivosh
Reachability Problems for PAMs

Piecewise affine maps (PAMs) are frequently used as a reference model to show the openness of the reachability questions in other systems. The reachability problem for one-dimensional PAM is still open even if we define it with only two intervals. As the main contribution of this paper we introduce new techniques for solving reachability problems based on p-adic norms and weights as well as showing decidability for two classes of maps. Then we show the connections between topological properties for PAM’s orbits, reachability problems and representation of numbers in a rational base system. Finally we show a particular instance where the uniform distribution of the original orbit may not remain uniform or even dense after making regular shifts and taking a fractional part in that sequence.

Oleksiy Kurganskyy, Igor Potapov
On the Effects of Nondeterminism on Ordered Restarting Automata

While (stateless) deterministic ordered restarting automata accept exactly the regular languages, it is known that nondeterministic ordered restarting automata accept some languages that are not context-free. Here we show that, in fact, the class of languages accepted by these automata is an abstract family of languages that is incomparable to the linear languages, the context-free languages, and the growing context-sensitive languages with respect to inclusion, and that the emptiness problem is decidable for these languages. In addition, it is shown that stateless ordered restarting automata just accept regular languages, and we present an infinite family of regular languages $$C_n$$ such that $$C_n$$ is accepted by a stateless ordered restarting automaton with an alphabet of size O(n), but each stateless deterministic ordered restarting automaton for $$C_n$$ needs $$2^{O(n)}$$ letters.

Kent Kwee, Friedrich Otto
Quantum Walks on Two-Dimensional Grids with Multiple Marked Locations

The running time of a quantum walk search algorithm depends on both the structure of the search space (graph) and the configuration of marked locations. While the first dependence has been studied in a number of papers, the second dependence remains mostly unstudied. We study search by quantum walks on the two-dimensional grid using the algorithm of Ambainis, Kempe and Rivosh [AKR05]. The original paper analyses one and two marked locations only. We move beyond two marked locations and study the behaviour of the algorithm for an arbitrary configuration of marked locations.In this paper, we prove two results showing the importance of how the marked locations are arranged. First, we present two placements of k marked locations for which the number of steps of the algorithm differs by a factor of $$\varOmega (\sqrt{k})$$. Second, we present two configurations of k and $$\sqrt{k}$$ marked locations having the same number of steps and probability to find a marked location.

Nikolajs Nahimovs, Alexander Rivosh
How to Smooth Entropy?

Smooth entropy of X is defined as possibly biggest entropy of a distribution Y close to X. It has found many applications including privacy amplification, information reconciliation, quantum information theory and even constructing random number generators. However the basic question about the optimal shape for the distribution Y has not been answered yet. In this paper we solve this problem for Renyi entropies in non-quantum settings, giving a formal treatment to an approach suggested at TCC’05 and ASIACRYPT’05. The main difference is that we use a threshold cut instead of a quantile cut to rearrange probability masses of X. As an example of application, we derive tight lower bounds on the number of bits extractable from Shannon memoryless sources.

Maciej Skorski
Bounded TSO-to-SC Linearizability Is Decidable

TSO-to-SC linearizability is a variant of linearizability for concurrent libraries on the Total Store Order (TSO) memory model. In this paper we propose the notion of k-bounded TSO-to-SC linearizability, a subclass of TSO-to-SC linearizability that concerns only bounded histories. This subclass is non-trivial in that it does not restrict the number of write, flush and cas (compare-and-swap) actions, nor the size of a store buffer, to be bounded. We prove that the decision problem of k-bounded TSO-to-SC linearizability is decidable for a bounded number of processes. We first reduce this decision problem to a marked violation problem of k-bounded TSO-to-SC linearizability, where specific $$\textit{cas}$$ actions are introduced to mark call and return actions. Then, we further reduce the marked violation problem to a control state reachability problem of a lossy channel machine, which is already known to be decidable. Moreover, we can show that the decision problem of k-bounded TSO-to-SC linearizability has non-primitive recursive complexity.

Chao Wang, Yi Lv, Peng Wu
Probabilistic Autoreductions

We consider autoreducibility of complete sets for the two common types of probabilistic polynomial-time reductions: RP reductions containing one-sided errors on positive input instances only, and BPP reductions containing two-sided errors. Specifically, we focus on the probabilistic counterparts of the deterministic many-one and truth-table autoreductions. We prove that non-trivial complete sets of NP are autoreducible for the RP many-one reduction. This extends the result by Glaßer et al. [9] that complete sets of NP are autoreducible for the deterministic many-one reduction. We also prove that complete sets of classes in the truth-table Polynomial Hierarchy, which is the polynomial hierarchy defined using the truth-table reduction instead of the general Turing reduction, are autoreducible with respect to the BPP truth-table reductions. This generalizes the result by Buhrman et al. [3] that truth-table-complete sets for NP are probabilistically truth-table autoreducible to multiple classes of higher complexity although for a weaker reduction.

Liyu Zhang, Chen Yuan, Haibin Kan

Software Engineering: Methods, Tools, Applications (Regular Papers)

Frontmatter
ABS: A High-Level Modeling Language for Cloud-Aware Programming

Cloud technology has become an invaluable tool to the IT business, because of its attractive economic model. Yet, from the programmers’ perspective, the development of cloud applications remains a major challenge. In this paper we introduce a programming language that allows Cloud applications to monitor and control their own deployment. Our language originates from the Abstract Behavioral Specification (ABS) language: a high-level object-oriented language for modeling concurrent systems. We extend the ABS language with Deployment Components which abstract over Virtual Machines of the Cloud and which enable any ABS application to distribute itself among multiple Cloud-machines. ABS models are executed by transforming them to distributed-object Haskell code. As a result, we obtain a Cloud-aware programming language which supports a full development cycle including modeling, resource analysis and code generation.

Nikolaos Bezirgiannis, Frank de Boer
Aspect, Rich, and Anemic Domain Models in Enterprise Information Systems

The research shows that maintenance of enterprise information systems consumes about 65–75 % of the software development time and about 40–60 % of maintenance efforts are devoted to software understanding. This paper compares the Anemic Domain Model used by the three-layered architecture followed by Java EE and .NET platforms and the Rich Domain Model often deployed into many conventional MVC-like web frameworks to a novel Aspect Domain Model followed by the Aspect-driven design. While all these models strive to avoid information restatement, they greatly differ in the underlying idea and resulting efficiency. This research compares considered models based on development efficacy, maintainability and their impact on the rest of the system. We evaluate qualities such as information cohesion, coupling and restatement, and discuss related maintenance efforts of the novel approach in the context of existing approaches.

Karel Cemus, Tomas Cerny, Lubos Matl, Michael J. Donahoo
Finding Optimal Compatible Set of Software Components Using Integer Linear Programming

Reusable components and libraries reduce costs in software development but also bring challenges like ensuring that application’s components form a consistent and working set. While dependency management and build tools provide assistance in creating the set, they can’t guarantee its correctness in terms of interoperability. On the other hand, the methods which detect component interoperability issues do not provide guidance in finding the proper set of components to fix any uncovered inconsistencies. In this work we present a method for finding such set of components which provides the required functionality, is free from type-level inconsistencies, and at the same time is optimal according to a given criterion. The method is based on pre-computed compatibility data and integer linear programming and allows to optimize the found solution set with respect to an arbitrary cost function.

Jakub Danek, Premek Brada
Effective Parallel Multicore-Optimized K-mers Counting Algorithm

For many bioinformatics applications it is crucial to know frequencies of all subsequences of length k (k-mers) constructed from reads (short-reads) that are obtained in process of DNA sequencing. We present an effective parallel algorithm for k-mers counting that is based on nested bucket sort algorithm, whereby sizes of partitions and number of buckets per partition are precomputed. The proposed algorithm is designed for multicore architecture and properly combines MPI framework (OpenMPI) with POSIX threads achieving very good performance. According to our experiments it overcomes existing solutions in running time when compared on the genome of Drosophila melanogaster (SRX040485).

Tomáš Farkaš, Peter Kubán, Mária Lucká
Meta-Evolution Style for Software Architecture Evolution

Changes over time are commonplace and inevitable for any software system if it is to remain effective. Since the system changes fairly frequently, it is essential that its architecture is restructured to keep abreast of these changes. Recently the term ’evolution style’ has emerged in some studies as a technique for modeling potential architecture evolution scenarios in a particular domain that can provide reusable knowledge that encapsulates the best practices in this domain. Analysis and comparison of these alternatives assists architects in planning and thinking about architecture evolution. Our approach endeavors to unify the solutions and standardize the modeling concepts in order to develop evolution styles library that exploits the best methods and elements in the existing approaches. To this end, the main contribution of this paper is a Meta-Evolution Style (MES) for software architecture evolution, which promotes mapping and comparing of evolution styles, as well as it will help in approaching issues like reuse and interchange elements among evolution styles.

Adel Hassan, Mourad Oussalah
The Simulation Relation for Formal E-Contracts

Relationships between entities in today’s increasingly interconnected context have grown in complexity and evolved from simple communication processes to more complicated distributed systems. Electronics contracts (e-contracts) are of general purpose and aimed to specify relationships in a wide variety of scenario, including web and cloud services, inter and intra organization, electronic banking, etc. It is in this context that we aim to develop a consistent definition for these relationships together with a set of techniques to check their proper use. In this paper we present a process algebra to describe these contract relationships and a set of formal machinery to determine whether an implementation follows the rules established by these contracts. The main formal technique used is a simulation relation where an implementation is checked step by step against a given contract. Several toy examples are provided to facilitate understanding of the formal definitions.

Luis Llana, María-Emilia Cambronero, Gregorio Díaz

Data, Information, and Knowledge Engineering (Regular Papers)

Frontmatter
Solving the Problem of Selecting Suitable Objective Measures by Clustering Association Rules Through the Measures Themselves

Many objective measures (OMs) were proposed since they are frequently used to discover interesting association rules. Therefore, an important challenge is to decide which OM to use. For that, one can: (a) reduce the number of OMs to be chosen; (b) aggregate OMs’ values in only one importance value as a mean of not selecting a suitable OM. The problem with (a) is that many OMs can remain. Regarding (b), the problem is that the obtained values cannot be well understandable. This work proposes a process to solve the problem related to the identification of a suitable OM to direct the users towards the interesting patterns. The goal is to find the same interesting patterns, as if the most suitable OM had been used, also trying to reduce the exploration space to minimize the user’s effort.

Veronica Oliveira de Carvalho, Renan de Padua, Solange Oliveira Rezende
Survey on Concern Separation in Service Integration

Ever-changing business processes in large software systems, integration of heterogeneous data sources as well as the desire for legacy service integration drive software design towards reusable, platform-independent, web-accessible microservices. Such independently deployable services provide an interface for retrieval and data manipulation in machine-readable formats. While this approach brings many advantages from the perspective of service integration aiming to separate data manipulation from business processing, the standard approaches provide only limited structural semantics and constraints provided through the interface. This leads to considerable information restatement and repeated decisions in integrating components, which considerably impacts development and maintenance efforts. Integration component operability becomes highly sensitive to interaction with underlying services, which are possibly composed of other services. The sensitivity is especially apparent in the structural semantics of produced and consumed information that must correlate on both sides of the interaction. This paper surveys service integration from the perspective of separation of concerns. In order to reduce the coupling and information restatement on the integration component side, it suggests introducing multiple communication channels with additional information that apply in the service interaction, extending the integration component’s ability to derive service expected information structural semantics, constraints or business rules. Finally, we consider the impact of this new approach from the development and maintenance perspectives.

Tomas Cerny, Michael J. Donahoo
Utilizing Vector Models for Automatic Text Lemmatization

In this paper we tackle the problem of lemmatization of inflectional languages. We introduce a new algorithm which utilizes vector models of words. Current approaches in this area are limited to knowing either full grammar rules or the translation matrix between the word and its basic form. However, this information is encoded in natural text. Our solution uses text corpora to build vector models of words and a small amount of user input to infer lemmas. We have evaluated our approach on the Slovak language and present interesting findings on its feasibility for real-world utilization.

Ladislav Gallay, Marián Šimko
Improving Keyword Extraction from Movie Subtitles by Utilizing Temporal Properties

In our work we aim at keyword extraction from movie subtitles. Keywords and key phrases although missing the context can be found very helpful in finding, understanding, organizing and recommending the media content. Generally, they are used by search engines to help find the relevant information. Movies and video content are becoming massively available and widespread. The ability to automatically describe and classify videos has a vast domain of application. In our work we select movie subtitles as a source of information to process. We proposed a method for keyword extraction from movie subtitles by analysing their temporal properties and detecting conversations. We evaluated our method by conducting two experiments (a priori synthetic experiment and a posteriori user experiment) involving 200 movies and show that conversation analysis can improve traditional approaches based on automatic term extraction algorithms.

Matúš Košút, Marián Šimko
Identification of Navigation Lead Candidates Using Citation and Co-Citation Analysis

Query refinement is an integral part of search, especially for the exploratory search scenarios, which assume that the users start with ill-defined information needs that change over time. In order to support exploratory search and navigation, we have proposed an approach of exploratory navigation in digital libraries using navigation leads. In this paper, we focus specifically on the identification of the navigation lead candidates using keyword extraction. For this purpose, we utilize the citation sentences as well as the co-citations. We hypothesize that they can improve the quality of the extracted keywords in terms of finding new keywords (that would not be otherwise discovered) as well as promoting the important keywords by increasing their relevance. We have quantitatively evaluated our method in the domain of digital libraries using experts’ judgement on the relevance of the extracted keywords. Based on our results, we can conclude that using the citations and the co-citations improves the results of extraction of the most relevant terms over the TF-IDF baseline.

Robert Moro, Mate Vangel, Maria Bielikova
Summarizing Online User Reviews Using Bicliques

With vast amounts of text being available in electronic format, such as news and social media, automatic multi-document summarization can help extract the most important information. We present and evaluate a novel method for automatic extractive multi-document summarization. The method is purely combinatorial, based on bicliques in the bipartite word-sentence occurrence graph. It is particularly suited for collections of very short, independently written texts (often single sentences) with many repeated phrases, such as customer reviews of products. The method can run in subquadratic time in the number of documents, which is relevant for the application to large collections of documents.

Azam Sheikh Muhammad, Peter Damaschke, Olof Mogren
Post-processing Association Rules: A Network Based Label Propagation Approach

Association rules are widely used to find relations among items in a given database. However, the amount of generated rules is too large to be manually explored. Traditionally, this task is done by post-processing approaches that explore and direct the user to the interesting rules. Recently, the user’s knowledge has been considered to post-process the rules, directing the exploration to the knowledge he considers interesting. However, sometimes the user wants to explore the rule set without adding his prior knowledge BIAS, exploring the rule set according to its features. Aiming to solve this problem, this paper presents an approach, named $$PAR_{LP}$$ (Post-processing Association Rules using Label Propagation), that explores the entire rule set, suggesting rules to be classified by the user as “Interesting” or “Non-Interesting”. In this way, the user is directed to analyze the rules that have some importance on the rule set, so the user does not need to explore the entire rule set. Moreover, the user’s classification is propagated to all the rules using label propagation approaches, so the most similar rules will likely be on the same class. The results show that the $$PAR_{LP}$$ succeeds to direct the exploration to a set of rules considered interesting, reducing the amount of association rules to be explored.

Renan de Padua, Veronica Oliveira de Carvalho, Solange Oliveira Rezende
Application of Multiple Sound Representations in Multipitch Estimation Using Shift-Invariant Probabilistic Latent Component Analysis

Probabilistic analysis has become one of the most important directions for development of new methods in Music Information Retrieval (MIR) field. Its ability to correctly find necessary information in the music audio recordings is especially useful in multipitch estimation, a vital task belonging to the MIR field. Since the multipitch estimation is still far from being resolved, it is important to enhance the existing state-of-the-art methods. Usually, a spectrogram, generated from the Constant-Q transform (CQT) is used as a basis for the SI-PLCA method. The new approach involves application of more than one method (cepstrum and CQT) in association of the shift-invariant probabilistic latent component analysis approach and additional processing of all the sound representations, in order to achieve better results.

Krzysztof Rychlicki-Kicior, Bartłomiej Stasiak, Mykhaylo Yatsymirskyy
Projection for Nested Word Automata Speeds up XPath Evaluation on XML Streams

We present an evaluator for navigational XPath on Xml streams with projection. The idea is to project away those parts of an Xml stream that are irrelevant for evaluating a given XPath query. This task is relevant for processing Xml streams in general since all Xml standard languages are based on XPath. The best existing streaming algorithm for navigational XPath queries runs nested word automata. Therefore, we develop a projection algorithm for nested word automata, for the first time to the best of our knowledge. It turns out that projection can speed up the evaluation of navigational XPath queries on Xml streams by a factor of 4 in average on the usual XPath benchmarks.

Tom Sebastian, Joachim Niehren
Evaluation of Static/Dynamic Cache for Similarity Search Engines

In large scale search systems, where it is important to achieve a high query throughput, cache strategies are a feasible tool to achieve this goal. A number of efficient cache strategies devised for exact query search in different application domains have been proposed so far. In similarity query search on metric spaces it is necessary to consider additional design requirements devised to produce good quality approximate results from the cache content. In this paper, we propose a Static/Dynamic cache strategy for metric spaces which takes advantage of results of static cache miss operations and their associated distance evaluations for increasing the overall performance of the cache. We present an experimental evaluation of the performance obtained with our strategy for different query selection/replacement strategies.

R. Solar, V. Gil-Costa, M. Marín
Backmatter
Metadaten
Titel
SOFSEM 2016: Theory and Practice of Computer Science
herausgegeben von
Rūsiņš Mārtiņš Freivalds
Gregor Engels
Barbara Catania
Copyright-Jahr
2016
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-49192-8
Print ISBN
978-3-662-49191-1
DOI
https://doi.org/10.1007/978-3-662-49192-8