New Perspectives on Classical Automata Constructions

<< Volver atrás

Tesis:

New Perspectives on Classical Automata Constructions

Autor: GUTIÉRREZ VIEDMA, Elena

Título: New Perspectives on Classical Automata Constructions

Fecha: 2020

Materia: Sin materia definida

Escuela: E.T.S DE INGENIEROS INFORMÁTICOS

Departamentos: LENGUAJES Y SISTEMAS INFORMATICOS E INGENIERIA DE SOFTWARE

Acceso electrónico: http://oa.upm.es/64479/

Director/a 1º: GANTY, Pierre

Resumen: The Theory of Automata is one of the most fundamental and longstanding mathematical theories, that deals with the study of abstract machines, or automata, and their computational capabilities. In this thesis, we focus on the automata models of finite-state automata and pushdown automata, as well as context-free grammars. These models have been used in numerous applications: from the synthesis and formal verification of hardware and software systems to natural language processing applications or digital-image compression techniques. These applications strongly rely on language-theoretical notions where, still today, many questions are open. The goal of this thesis is to give new theoretical perspective on a collection of open problems, that are at the core of classical automata constructions and well-established algorithms. The underlying mathematical tool to approach these questions are equivalence relations on words as abstractions of languages. First, we focus on algorithms for obtaining the deterministic automaton with the minimal number of states for some given regular language. This is an essential question that arises in many applications such as text processing, image analysis and program verification. While most minimization techniques rely on either the fusion of equivalent states or an iterative refinement of an initial partition of the set of states, like Hopcroft’s or Moore’s algorithm, the double-reversal method, proposed by Brzozowski, stands isolated from these methods as it simply combines two classical automata constructions twice in order to obtain the minimal deterministic automaton. In this thesis, we aim to understand the language-theoretical basis of the double-reversal method and its connection with the partition-based techniques, a longstanding question that, still today, attracts interest. As a result, we provide a uniform framework of deterministic automata constructions based on finite-index equivalences on words that allows us to give a new simple proof of the double-reversal method and shed light on the relation between this algorithm and the partition-based techniques. Second, we address the question of comparing the descriptional complexity of pushdown automata against context-free grammars and finite-state automata for Parikh equivalence, a weaker notion than the standard language equivalence under which the ordering of symbols in the words is not important. The interest on this weaker notion of equivalence is in good part due to the celebrated Parikh’s Theorem, which shows that every context-free language is Parikh-equivalent to some effectively computable regular language. Our main contribution is to provide an infinite family of pushdown automata defined over a singleton alphabet that allows us to give lower bounds on the number of grammar variables (resp. states) of the smallest context-free grammar (resp. finite-state automaton) that is language-equivalent. Since Parikh equivalence and language equivalence coincide when the alphabet is a singleton, we achieve our goal by deriving lower bounds for Parikh equivalence as well. As these lower bounds match existing upper bounds, we conclude the optimality of known translation procedures in the unary case and for Parikh equivalence, such as the textbook translation procedure that converts any pushdown automaton into a language-equivalent context-free grammar. Finally, we address the question of extending Parikh’s Theorem to the more general model of weighted automata. It is well-known that Parikh’s Theorem does not hold for weighted languages. Thus, we study sufficient conditions under which the so-called Parikh property holds for weighted automata. We show that every nonexpansive weighted context-free grammar over a commutative semiring satisfies the Parikh property. Furthermore, we give a decision procedure for the Parikh property for weighted context-free grammars over the rational numbers that relies on the use of Groebner bases. ----------RESUMEN---------- La Teoría de Autómatas es una de las teorías matemáticas más fundamentales, que se encarga del estudio de máquinas abstractas, o autómatas, y sus capacidades computacionales. En esta tesis nos centramos en los modelos de autómatas de estados finitos y de pila, así como en las gramáticas libres de contexto. Estos modelos encuentran aplicación, por ejemplo, en el diseño y verificación formal de software y hardware, en técnicas de proceso de lenguaje natural o en compresión de imágenes digitales. Todas estas aplicaciones están basadas en nociones de Teoría de Lenguajes y Autómatas sobre las que existen numerosas cuestiones sin resolver a día de hoy. El objetivo de esta tesis es dar una nueva perspectiva teórica a un conjunto de cuestiones abiertas que tratan fundamentalmente sobre construcciones clásicas de autómatas y algoritmos conocidos. La herramienta matemática subyacente para resolver estas cuestiones son las relaciones de equivalencia sobre palabras, interpretadas como abstracciones del lenguaje. Primero, estudiamos algoritmos de minimización de autómatas de estados finitos, esto es, métodos para obtener el autómata determinista con el mínimo número de estados dado un lenguaje regular. Estos algoritmos juegan un papel crucial en aplicaciones de procesado de texto y diálogo, análisis de imágenes y verificación de programas. Mientras que la mayoría de las técnicas de minimización se basan en fusionar estados equivalentes o retinar iterativamente una partición inicial del conjunto de estados del autómata, como los algoritmos de Hopcroft o Moore, el conocido algoritmo de Brzozowski se aleja de estas técnicas, ya que simplemente combina dos conocidas operaciones sobre autómatas para obtener el autómata mínimo. En esta tesis, buscamos entender la base teórica a nivel del lenguaje del algoritmo de Brzozowski y su conexión con los algoritmos que se basan en retinar una partición inicial de estados, una cuestión que a día de hoy sigue despertando interés. Nuestra contribución principal es ofrecer un marco uniforme de construcciones de autómatas deterministas definidas a partir de equivalencias sobre palabras que permite dar una prueba más simple del algoritmo de Brzozowski y clarificar la relación entre este método y las otras técnicas de minimización. En segundo lugar, comparamos los autómatas de pila con las gramáticas libres de contexto y los autómatas de estados finitos en cuanto a su complejidad descriptiva cuando todos estos formalismos describen lenguajes Parikhequivalentes. La equivalencia de Parikh es una noción más débil que la usual equivalencia de lenguajes bajo la cual el orden de los símbolos en las palabras no es importante. Su interés se debe al célebre Teorema de Parikh que establece que todo lenguaje libre de contexto es Parikh-equivalente a un lenguaje regular. Nuestra contribución principal es dar una familia infinita de autómatas de pila definidos sobre un alfabeto con un único símbolo que permite dar cotas inferiores en el número de variables (resp. de estados) de la gramática (resp. autómata) más pequeña con el mismo lenguaje. Como la equivalencia de Parikh coincide con la de lenguajes cuando el alfabeto sólo tiene un símbolo, cumplimos el objetivo planteado obteniendo cotas inferiores para equivalencia de Parikh también. Al comparar estas cotas con cotas superiores conocidas, concluimos que algoritmos ya existentes de conversión entre estos formalismos son óptimos. Por último, buscamos extender el Teorema de Parikh al modelo de autómatas con pesos. Es bien conocido que este teorema no se cumple para lenguajes con pesos. Por ello, estudiamos condiciones suficientes bajo las cuales la propiedad de Parikh se cumpla. Demostramos que toda gramática libre de contexto con pesos sobre un semianillo conmutativo que sea no-expansiva satisface la propiedad de Parikh. Además, damos un procedimiento de decisión para dicha propiedad sobre el semianillo de los racionales que se basa en el uso de bases de Groebner.