Zum Inhalt

Implementation and Application of Automata

26th International Conference, CIAA 2022, Rouen, France, June 28 – July 1, 2022, Proceedings

  • 2022
  • Buch
insite
SUCHEN

Über dieses Buch

Dieses Buch stellt die Protokolle der 26. Internationalen Konferenz zur Implementierung und Anwendung von Automaten CIAA 2022 dar, die im Juni / Juli 2022 in Rouen, Frankreich, stattfand. Die 16 regelmäßigen Vorträge, die zusammen mit 3 eingeladenen Vorträgen in diesem Buch präsentiert wurden, wurden sorgfältig überprüft und aus 26 Einreichungen ausgewählt. Die Themen der Vorträge decken verschiedene Bereiche der Anwendung, Implementierung und Theorie von Automaten und verwandten Strukturen ab.

Inhaltsverzeichnis

Frontmatter

Invited Lectures

Frontmatter
On 25 Years of CIAA Through the Lens of Data Science
Abstract
We investigate the structure of the co-authorship graph for the Conference on Implementation and Application of Automata (CIAA) with techniques from network sciences. This allows us to answer a broad variety of questions on collaboration patterns. Our findings are in line with (statistical) properties of other co-authorship networks from biology, physics and mathematics as conducted earlier by pioneers of network sciences.
Hermann Gruber, Markus Holzer, Christian Rauch
Manipulation of Regular Expressions Using Derivatives: An Overview
Abstract
The notions of derivative and partial derivative of regular expressions revealed themselves to be very powerful and have been successfully extended to many other formal language classes and algebraic structures. Although the undisputed elegance of this formalism, its efficient practical use is still a challenging research topic. Here we give a brief historical overview and summarise some of these aspects.
Nelma Moreira, Rogério Reis
How to Settle the ReDoS Problem: Back to the Classical Automata Theory
Abstract
Most regular-expression matching engines in practice are based on the Thompson construction and the Spencer matching algorithm. While these engines work fast and efficiently, a serious problem, the regular expression denial-of-service (ReDoS), has been reported recently. ReDoS is an algorithm complexity attack, which exploits the backtracking feature of the engine, and makes the service unresponsive indefinitely. Researchers suggested a few remedies to cope with the ReDoS problem, yet they are often ad-hoc or undesirable in practice. We instead propose a hybrid matching scheme that selects between the Thompson and the Spencer matching algorithms depending on the needed features. We also suggest to use the position construction for its intrinsic characteristics for fast matching. We evaluate the proposed approach using a benchmark dataset collected from various open-source projects, and compare the performance with the current approach. The experimental results show that a hybrid matcher reduces the ReDoS-vulnerability by 96% and 99.98% in full and partial matching, respectively. Moreover, 55% of the most problematic regular expressions become invulnerable to ReDoS by the position construction.
Sicheol Sung, Hyunjoon Cheon, Yo-Sub Han

Conference Papers

Frontmatter
Ordered Context-Free Grammars
Abstract
We propose a new unambiguous grammar formalism, referred to as ordered context-free grammars, which is identical to context-free grammars, apart from the property that it also places an order on parse trees. Since only a minor modification to ordered context-free grammars is required to obtain parsing expression grammars, the relationship between context-free grammars and parsing expression grammars becomes more evident. By preserving how ordered context-free grammars support left-recursion, parsing expression grammars is modified to support left recursion in ways much more natural than current approaches.
Brink van der Merwe, Martin Berglund
Symbolic Weighted Language Models, Quantitative Parsing and Automated Music Transcription
Abstract
We study several classes of symbolic weighted formalisms: automata (swA), transducers (swT) and visibly pushdown extensions (swVPA, swVPT). They combine the respective extensions of their symbolic and weighted counterparts, allowing a quantitative evaluation of words over a large or infinite input alphabet.
We present properties of closure by composition, the computation of transducer-defined distances between nested words and languages, as well as a PTIME 1-best search algorithm for swVPA. These results are applied to solve in PTIME a variant of parsing over infinite alphabets. We illustrate this approach with a motivating use case in automated music transcription.
Florent Jacquemard, Lydia Rodriguez de la Nava
A Similarity Measure for Formal Languages Based on Convergent Geometric Series
Abstract
We present a distance metric on formal languages based on the accumulated weight of words in their symmetric difference. The contribution of an individual word to this weight decreases exponentially in its length, guaranteeing the distance between languages to be a real value between 0 and 1. We show that this distance is computable for regular languages. As an application, we show how the similarity measure derived from a modification of this metric can be used in automatic grading of particular standard exercises in formal language theory classes.
Florian Bruse, Maurice Herwig, Martin Lange
Hybrid Tree Automata and the Yield Theorem for Constituent Tree Automata
Abstract
We introduce an automaton model for recognizing sets of hybrid trees, the hybrid tree automaton (HTA). Special cases of hybrid trees are constituent trees and dependency trees, as they occur in natural language processing. This includes the cases of discontinuous constituent trees and non-projective dependency trees. In general, a hybrid tree is a tree over a ranked alphabet in which symbols can additionally be equipped with an index, i.e., a natural number which indicates the position of that symbol in the yield of the hybrid tree. As a special case of HTA, we define constituent tree automata (CTA) which recognize sets of constituent trees. We show that the set of yields of a CTA-recognizable set of constituent trees is an LCFRS language, and vice versa.
Frank Drewes, Richard Mörbitz, Heiko Vogler
Some Results Concerning Careful Synchronization of Partial Automata and Subset Synchronization of DFA’s
Abstract
The goal of this paper is to present a family of partial automata that achieve length \(\varTheta (3^{\frac{n}{3}})\) of the shortest carefully synchronizing words, but using \(\frac{2}{9}n + 2\) letters, thus substantially improving the result obtained in [19], which is \(\frac{1}{3}n + 1\) letters. Additionally, modifying our idea we obtain a family of automata over a three letter alphabet and a subexponential length of the shortest carefully synchronizing words and, as a corollary of that construction, a series of binary automata with a subexponential length of word reducing set of states to a particular subset.
Jakub Ruszil
A Toolkit for Parikh Matrices
Abstract
The Parikh matrix mapping is a concept that provides information on the number of occurrences of certain (scattered) subwords in a word. Although Parikh matrices have been thoroughly studied, many of their basic properties remain open. In the present paper, we describe a toolkit that has been developed to support research in this field. Its functionality includes elementary and advanced operations related to Parikh matrices and the recently introduced variants of \(\mathbb {P}\)-Parikh matrices and \(\mathbb {L}\)-Parikh matrices.
Laura K. Hutchinson, Robert Mercaş, Daniel Reidenbach
Syntax Checking Either Way
Abstract
We consider parsers of deterministic context-free languages and study the sizes of their syntax checking components. More precisely, we allow the input processing from left to right or, alternatively, from right to left, whatever is best for the given language. We establish an infinite sequence of deterministic context-free languages \(L_k\), for \(k\ge 1\), such that there is an exponential size trade-off between a deterministic pushdown automaton that reads its input from right to left and another one that reads its input from left to right. Concerning the constructibility of such a parser out of a given deterministic context-free language, it is shown that it is undecidable whether the reversal of a given deterministic context-free language is again deterministic context free. Furthermore, we study the expressive capacity of the family of languages whose reversals are deterministic context free. Finally, we turn to the family of deterministic context-free languages whose reversals are also deterministic context free and collect several of their closure properties.
Martin Kutrib, Uwe Meyer
On the Power of Pushing or Stationary Moves for Input-Driven Pushdown Automata
Abstract
Input-driven pushdown automata (\(\text {IDPDA}\)s) are pushdown automata where the next action on the pushdown store (push, pop, nothing) is solely governed by the input symbol. Nowadays such devices are usually defined such that every push operation pushes exactly one additional symbol on the pushdown store and, in addition, the devices work in real time so that stationary moves are not allowed. Here, we relax this strong definition and consider \(\text {IDPDA}\)s that may push more than one symbol in one step (\(\text {push-IDPDA}\)) or may perform stationary moves (\(\text {stat-IDPDA}\)). We study the computational power of the extended variants both in the deterministic and nondeterministic case, we investigate several decidability questions for the new automata classes, and we obtain useful interesting representations by inverse homomorphisms.
Martin Kutrib, Andreas Malcher, Matthias Wendlandt
The Cut Operation in Subclasses of Convex Languages (Extended Abstract)
Abstract
The cut of two languages is a subset of their concatenation given by the leftmost maximal substring match. We study the state complexity of the cut operation assuming that both operands belong to some, possibly different, subclasses of convex languages, namely, right, left, two-sided, and all-sided ideal, prefix-, suffix-, factor-, and subword-closed, and -free languages. For all considered pairs of classes, we get the exact state complexity of cut. We show that it is m whenever the first language is a right ideal, and it is \(m+n-1\) or \(m+n-2\) if the first language is prefix-closed or prefix-free. In the other cases, the state complexity of cut is between \(mn-2n-m+4\) and \(mn-n+m\), the latter being the known state complexity of cut on regular languages. All our witnesses are described over a fixed alphabet of size at most three, except for three cases when the witness languages are described over an alphabet of size m or \(m-1\).
Michal Hospodár, Viktor Olejár
Variations of the Separating Words Problem
Abstract
The separating words problem seeks to determine the asymptotic growth of the minimum number of states of a deterministic finite automaton that accepts x but rejects y, where x and y are given strings. We study three natural variants of this problem which impose additional constraints on the start and/or end states: \(\forall \)-separation requires different end states for every common start state; \(\forall ^2\)-separation requires different end states for every pair of start states; and \(\forall ^201\)-separation requires fixed different end states for every pair of start states.
For distinct strings of the same length, we establish exact bounds on the number of states for \(\forall ^2\)- and \(\forall ^201\)-separation, as well as a logarithmic lower bound and a linear upper bound for \(\forall \)-separation.
Nicholas Tran
Homomorphisms on Graph-Walking Automata
Abstract
Graph-walking automata (GWA) analyze an input graph by moving between its nodes, following the edges. This paper investigates the effect of node-replacement graph homomorphisms on recognizability by these automata. The family of graph languages recognized by GWA is closed under inverse homomorphisms. The main result of this paper is that, for n-state automata operating on graphs with k labels of edge end-points, the inverse homomorphic images require GWA with \(kn+O(1)\) states in the worst case. The second result is that already for tree-walking automata, the family they recognize is not closed under injective homomorphisms; here the proof is based on a homomorphic characterization of regular tree languages.
Olga Martynova, Alexander Okhotin
Nondeterministic State Complexity of Site-Directed Deletion
Abstract
Site-directed deletion is a biologically inspired operation that removes a contiguous substring from the host string guided by a template string. The template string must match the prefix and suffix of a substring. When this occurs the middle section of the substring not contained in the prefix or suffix is removed. We consider the nondeterministic state complexity of the site-directed deletion operation. For regular languages recognized by nondeterministic finite automata with N and M states, respectively, we establish a new upper bound of \(2NM + N\) and a new worst case lower bound of 2NM. The upper bound improves a previously established upper bound, and no non-trivial lower bound was previously known for the nondeterministic state complexity of site-directed deletion.
Oliver A. S. Lyon, Kai Salomaa
Energy Complexity of Regular Language Recognition
Abstract
The erasure of each bit of information by a computing device has an intrinsic energy cost. Although any Turing machine can be rewritten to be thermodynamically reversible without changing the recognized language, finite automata that are restricted to scan their input once in “real-time” fashion can only recognize the members of a proper subset of the class of regular languages in this reversible manner. We use a general quantum finite automaton model to study the thermodynamic cost per step associated with the recognition of different regular languages. We show that zero-error quantum finite automata have no energy cost advantage over their classical deterministic counterparts, and prove an upper bound for the cost that holds for all regular languages. We also demonstrate languages for which “error can be traded for energy”, i.e. whose zero-error recognition is associated with provably bigger energy cost per step when compared to their bounded-error recognition by real-time finite-memory quantum devices.
Öykü Yılmaz, Fırat Kıyak, Meriç Üngör, A. C. Cem Say
Real-Time, Constant-Space, Constant-Randomness Verifiers
Abstract
We study the class of languages that have membership proofs which can be verified by real-time finite-state machines using only a constant number of random bits, regardless of the size of their inputs. Since any further restriction on the verifiers would preclude the verification of nonregular languages, this is the tightest computational budget which allows the checking of externally provided proofs to have meaningful use. We show that all languages that can be recognized by two-head one-way deterministic finite automata have such membership proofs. For any \(k>0\), there exist languages that cannot be recognized by any k-head one-way nondeterministic finite automaton, but that are nonetheless real-time verifiable in this sense. The set of nonpalindromes, which cannot be recognized by any one-way multihead deterministic finite automaton, is also demonstrated to be verifiable within these restrictions.
Özdeniz Dolu, Nevzat Ersoy, M. Utkan Gezer, A. C. Cem Say
Constrained Synchronization for Monotonic and Solvable Automata and Automata with Simple Idempotents
Abstract
For general input automata, there exist regular constraint languages such that asking if a given input automaton admits a synchronizing word in the constraint language is PSPACE-complete or NP-complete. Here, we investigate this problem for the following classes of input automata: monotonic automata, solvable automata and automata with simple idempotents over a binary alphabet. The latter class contains, for example, the Černý family of automata, an infinite family of n-state automata whose shortest synchronizing words have length \((n-1)^2\). Solvable automata generalize both commutative automata and weakly acyclic automata. We find that for monotonic automata, the problem is solvable in nondeterministic logarithmic space, for every regular constraint language. We identify a subclass of solvable automata for which the problem is in NP. This subclass strictly contains the weakly acyclic automata. In the course of our investigation we derive a sharp linear upper bound for the length of a shortest synchronizing word in a given constraint language, improving previous quadratic bounds for weakly acyclic and commutative automata. We also give structural characterizations of solvable automata and their recognized languages that imply that they recognize languages for which partial commutation closures give regular languages. Lastly, we show that for input automata with simple idempotents over a binary alphabet and with a constraint language given by a partial automaton with up to three states, the constrained synchronization problem is in P.
Stefan Hoffmann
An Ambiguity Hierarchy of Weighted Context-Free Grammars
Abstract
Weighted context-free grammar (WCFG) is a quantitative extension of context-free grammar (CFG). It is known that unambiguous weighted automata (WA), finitely-ambiguous WA, polynomially-ambiguous WA and general WA over the tropical semiring have different expressive powers. We prove that there exists a similar ambiguity hierarchy of WCFG over the tropical semiring, using an extended Ogden’s lemma. Furthermore, we show that the hierarchy we proved is different from the known ambiguity hierarchy of unweighted CFG.
Yusuke Inoue, Kenji Hashimoto, Hiroyuki Seki
Backmatter
Titel
Implementation and Application of Automata
Herausgegeben von
Pascal Caron
Ludovic Mignot
Copyright-Jahr
2022
Electronic ISBN
978-3-031-07469-1
Print ISBN
978-3-031-07468-4
DOI
https://doi.org/10.1007/978-3-031-07469-1

Informationen zur Barrierefreiheit für dieses Buch folgen in Kürze. Wir arbeiten daran, sie so schnell wie möglich verfügbar zu machen. Vielen Dank für Ihre Geduld.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, Ferrari electronic AG/© Ferrari electronic AG