Doctoral Thesis Measuring Defects in Finite Automata Katja Meckel 1st Reviewer (Supervisor): Prof. Dr. Martin Kutrib 2nd Reviewer: Prof. Dr. Markus Holzer Fachbereich 07 - Mathematik und Informatik, Physik, Geographie Institut für Informatik Justus-Liebig-Universität Gießen Disputation: 12.05.2016 Danksagung An dieser Stelle möchte ich mich bei meinen Betreuern, den Herren Prof. Dr. Mar- tin Kutrib und Prof. Dr. Markus Holzer, bedanken. Ihre Ratschläge und Anregun- gen waren eine unschätzbare Hilfe bei der Erarbeitung der vorliegenden Ergebnisse. Zusätzlich möchte ich auch noch einen Dank an Dr. Matthias Wendlandt und Dr. Se- bastian Jakobi aussprechen. Die beiden haben mich in meinem Forschungsvorhaben ebenfalls sehr unterstützt und bei Problemen, die während der Forschung auftraten, mit ihren Ideen neue Perspektiven und Blickwinkel auf das Problem eröffnet. Selbstverständlich möchte ich mich hiermit ebenfalls bei all jenen bedanken, die mit kritischen Bemerkungen zu einer Verbesserung dieser Arbeit beigetragen haben. Ich bedanke mich auch für die Geduld und das Verständnis, das meine Familie und Freunde während der letzten fünf Jahre mir gegenüber aufgebracht und damit eben- falls einen Beitrag zum Gelingen dieser Arbeit geleistet haben. Contents 1 Introduction 1 2 Definitions 8 2.1 Words and Languages . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Regular Languages and Regular Expressions . . . . . . . . . . . . . . 9 2.3 Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Defects in Finite Automata . . . . . . . . . . . . . . . . . . . . . . . 15 3 State Complexity of DFA for Defects 19 3.1 Defects Occurring on the Transitions . . . . . . . . . . . . . . . . . . 19 3.2 Defects Occurring on the States . . . . . . . . . . . . . . . . . . . . . 34 3.3 Defects in DFA Accepting Finite Languages . . . . . . . . . . . . . . 40 4 Recognition and Correction of Defects in DFA – By Example 57 4.1 Recognition of Defects in a Finite Automaton . . . . . . . . . . . . . 58 4.2 Correction of Defects in a Finite Automaton . . . . . . . . . . . . . . 60 4.3 Languages Related to Defective Automata . . . . . . . . . . . . . . . 69 4.3.1 Construction of DFA for L+, L−, and L? . . . . . . . . . . . 70 4.3.2 Analysis of the Accepting DFA for the Three Languages . . . 71 5 Distances 75 5.1 Introduction to Distances . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 The Paramerized Prefix Distance . . . . . . . . . . . . . . . . . . . . 80 5.2.1 Upper and Lower Bounds for the Prefix Distance . . . . . . . 80 5.2.2 Distances Below the Upper Bound . . . . . . . . . . . . . . . 84 5.2.3 Decidability of the Order of the Distances . . . . . . . . . . . 87 5.2.4 Computation of the Precise Distance . . . . . . . . . . . . . . 93 5.2.5 Distinguishing Different Computations of the Prefix Distance 95 5.2.6 Criteria for the Cases . . . . . . . . . . . . . . . . . . . . . . 99 5.2.7 Construction of an Automaton to Determine the Prefix Distance104 5.2.8 Detailed Differentiation of the Cases Based on Automaton H 114 5.2.9 Calculation of the Prefix Distance Based on Automaton H . 120 5.2.10 Analysis of the Sum for the Prefix Distance on Base of H . . 134 5.3 The Parameterized Suffix Distance . . . . . . . . . . . . . . . . . . . 143 5.3.1 Upper and Lower Bounds . . . . . . . . . . . . . . . . . . . . 144 5.3.2 Distances Below the Upper Bound . . . . . . . . . . . . . . . 144 5.3.3 Decidability of the Order of the Distances . . . . . . . . . . . 145 i ii Contents 5.3.4 Calculation of the Suffix Distance . . . . . . . . . . . . . . . . 145 6 Concluding Remarks 146 6.1 Parameterized Distances and Defects . . . . . . . . . . . . . . . . . . 146 6.2 A Different Definition of L+, L− and L? . . . . . . . . . . . . . . . . 148 6.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Measuring Defects in Finite Automata List of Figures 2.1 NFA that is converted into a DFA in Example 2.3.1. . . . . . . . . . 13 2.2 DFA that is constructed by the power set construction from the NFA depicted in Figure 2.1. . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Minimal DFA accepting the language {a, b}2{a, b}∗ ∪ {λ}. . . . . . . 16 2.4 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect exchange symbol affecting the transition on a for state q0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect flip transition affecting the transition on b for state q0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect delete transition affecting the transition on b for state q0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.7 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect insert transition that inserts the transition δ(q1, b) = q0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.8 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 by the defect delete accepting state affecting state q2. . . . . 17 2.9 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 by the defect delete non-accepting state affecting state q1. . . 18 2.10 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect remove acceptance affecting state q0. . . . . . . 18 2.11 Defective finite automaton derived from the DFA depicted in Fig- ure 2.3 for the defect add acceptance affecting state q1. . . . . . . . . 18 3.1 DFA that proves the optimality of the upper bound for the defect exchange symbol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 NFA of Moore whose equivalent minimal DFA needs precisely 2n states for n ≥ 2 [54]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 DFA that proves the tightness of the upper bound for the defect flip transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 NFA of Jirásková and Šebej for which the equivalent minimal DFA has exactly 2n states for n ≥ 2. . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 DFA A for the defect flip transition for unary alphabets. . . . . . . . 22 3.6 DFA for which the defect flip transition can result in any number of states between 2 and n− 1 for the equivalent minimal DFA. . . . . . 24 iii iv List of Figures 3.7 An incomplete DFA whose equivalent minimal DFA needs exactly n+ 1 states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.8 DFA to prove the existence of minimal DFA equivalent to defective automata for the defect delete transition with a number of states between 1 and n− 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.9 Incomplete DFA whose equivalent minimal DFA needs exactly i states. 27 3.10 DFA that proves the existence of a minimal DFA with n states, that is equivalent to a defective automaton received for the defect delete transition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.11 Incomplete DFA whose equivalent minimal DFA needs exactly n states. 27 3.12 NFA whose equivalent minimal DFA needs exactly 2n − 1 states for n ≥ 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.13 DFA for which the equivalent minimal DFA of the defective automa- ton with inserted transition from q1 to q2 on a needs 2n − 1 states. . 30 3.14 General structure of a DFA that accepts a unary regular language. The acceptance property is omitted. . . . . . . . . . . . . . . . . . . 30 3.15 DFA for which the defect insert transition inserting a transition on a from q2 to qn results in a minimal DFA with n2 − 2n+ 2 states. . . . 30 3.16 DFA for the proof of the optimality of the upper bound for the defect delete accepting state. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.17 DFA used in the proof for the upper bound for the defect of inserting the accepting property to a non-accepting state. . . . . . . . . . . . 38 3.18 Minimal DFA with unary alphabet and i states. . . . . . . . . . . . . 39 3.19 DFA for which the defect exchange symbol affecting the transition of q1 on b by exchanging it into an a results in a minimal DFA with O(n2) states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.20 DFA for which the defect flip transition affecting the transition of q1 to q2 for a results in a minimal DFA with Ω(2n) states. . . . . . . . 46 3.21 DFA used in the proofs of several defects. . . . . . . . . . . . . . . . 48 3.22 DFA for which the defect insert transition inserts a transition from q1 to q2 on a results in a minimal DFA with Ω(2n) states. . . . . . . 50 3.23 DFA for which the defect delete non-accepting state affecting state q2 results in a minimal DFA with n− 1 states. . . . . . . . . . . . . . . 52 3.24 DFA for which the defect of insertion of the accepting property for state qn results in a minimal DFA with i states, where 1 ≤ i ≤ n. . . 55 4.1 Minimal DFA used in Example 4.2.1 for the defect exchange symbol. 61 4.2 Defective automaton used in Example 4.2.2 for the defect exchange symbol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 Not minimal DFA used in Example 4.2.2 for the defect exchange symbol. 61 4.4 Minimal DFA used in Example 4.2.5 for the defect delete transition. 63 4.5 Minimal DFA used in Example 4.2.6 for the defect insert transition. 64 4.6 Minimal DFA used in Example 4.2.7 for the defect insert transition. 64 Measuring Defects in Finite Automata List of Figures v 4.7 Minimal DFA used in Example 4.2.8 for the defect delete accepting state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.8 Minimal DFA used in Example 4.2.8 for the defect delete accepting state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.9 Minimal DFA used in Example 4.2.9 for the defect delete non-accepting state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.10 Minimal DFA used in Example 4.2.9 for the defect delete non-accepting state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.11 Minimal DFA used in Example 4.2.13 for the defect remove acceptance. 68 5.1 Structure of minimal DFA used in the proof of Theorem 5.2.2. . . . 91 5.2 Two minimal DFA that accept the languages L(A1) = {a, b}{a}∗ respectively L(A2) = {a}{a, b}∗. . . . . . . . . . . . . . . . . . . . . 108 5.3 An example for a history automatonH, based on the minimal DFA de- picted in Figure 5.2. Here only some part of the automaton is depicted for words starting with a. The missing transitions and states can be added like described in the definition of H. . . . . . . . . . . . . . . 109 5.4 An example for a history automatonH, based on the minimal DFA de- picted in Figure 5.2. Here only some part of the automaton is depicted for words starting with b. The missing transitions and states can be added like described in the definition of H. . . . . . . . . . . . . . . 110 5.5 One possible path leading into a state s = (s1, s2, reads, hists, accept2), where s2 is productive and hists consists of k triples. Then, the in- formation in position i of hists was inserted by state rk−i+1. . . . . 127 5.6 One possible path leading into a state s = (s1, s2, reads, hists, accept2), where s2 is non-productive, and the only state on this path storing a productive state of A2 is s′. Then, the information in position 1 of hists was inserted by s′, and the information on position i ≥ 2 is already stored in position i− 1 in the history of s′. . . . . . . . . . 128 Measuring Defects in Finite Automata List of Tables 3.1 Bounds for defects affecting the transitions of a minimal DFA. For unary alphabets, these bounds are tight. For arbitrary regular lan- guages, these bound are also tight already for binary alphabets, except for the defect insert transition. For this defect, the alphabet is at least ternary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Tight bounds for defects affecting the states of a minimal DFA with n states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Bounds for minimal DFA that are equivalent to a defective automa- ton received from a minimal DFA that accepts a finite language. The bounds for the defects flip or insert transition, and for the defect ex- change symbol are tight in the order of magnitude for at least ternary alphabets. All of the other bounds are tight even for unary alphabets. 56 4.1 Characteristic properties of a defective automaton resulting from a minimal DFA for precisely one defect of one of the considered types. 59 4.2 Upper bounds for the numbers of states of minimal DFA accepting the languages L+, L−, respectively L? for the different defects. Here, n is the number of states of the original automaton. . . . . . . . . . 74 6.1 Upper bounds for the number of states of minimal DFA accepting the languages L+, L−, and L? for the different defects, where n is the number of states of the original DFA, Σ is its alphabet, and F is the set of accepting states of the original DFA. . . . . . . . . . . . . . . 150 vi 1 Introduction Back in the 1950s, Chomsky started to research for formalisms to decribe the English language and its grammar. In the papers [14] and [15], he characterised special types of grammars, where a grammar is given by rules that describe and build words and sentences of the described language. Nowadays, the introduced hierarchy for these grammars is well known as the Chomsky hierarchy. Each level of this hierachy has to obey certain restrictions for the structure of the grammar rules. The grammars with less and weaker restrictions have a greater power of description than the ones with stronger restrictions. In the theory of formal languages, the grammars describing a certain class of languages in this hierarchy are of special interest. The grammars obeying the weakest restrictions are the type 0 grammars. They describe the class of recursively enumerable languages. The next level is given by the type 1 grammars. They describe the context-sensitive languages. This type of grammar needs to fulfill stronger restrictions than the type 0 grammars. The context-free languages are generated by the so called type 2 grammars. The lowest level of this hierarchy describes the regular languages. The grammars of this level are also called type 3 grammars. These grammars underly the strongest restrictions in the whole hierarchy defined by Chomsky. The language classes defined in terms of grammars in the Chomsky hierarchy can also be described by different formal models. For example, there exist automata models for each of the classes. Already in the 1930s, Turing [66] introduced a model of an abstract machine, long before there were computers. These machines are called Turing machines. The aim of Turing was to find a model that describes all that is computable, which comprises all computable numbers, functions, and so on. The Turing machines can be considered to be an abstract model of todays computers, so the Turing machines also describe the computational power of computers. A Turing machine can also be regarded as a recognition model for formal lan- guages. Chomsky proved that this automaton model describes precisely the class of recursively enumerable languages [15]. This means, the class of languages that are described by the type 0 grammars is the same one that is recognised by the Turing machines. In general, a Turing machine is a device having a finite central processing unit that converts an input into an output using some internal memory. Its input is given on a tape. The memory is also considered to be a tape that the machine may use to store intermediate results needed for the computation of the output. Usually, this tape is called working tape. Genarally, these two tapes are considered to be unbounded by any size to the left and to the right. In a standard Turing 1 2 machine, the transformation of an input into an output is done deterministically. This means each step of the conversion is uniquely determined by the currently read input symbol, the content of the working tape, and the current state of the machine. The output may be given on a separate tape, but, more conveniently, all tapes, the input tape, the working tape, and the output tape are considered to be only one tape. Due to the unbounded size of the tapes, this is possible. Also modifications of Turing machines were considered over the past decades, like Turing machines with more than only one working tape, or with a working tape that is bounded on one side, or non-deterministic Turing machines. All of these modified Turing machines can be simulated by a deterministic Turing machine with only one unbounded tape. This means, the mentioned modified Turing machines also recognise the class of recursively enumerable languages (see for example [29]). Also for the other classes of languages described in the Chomsky hierarchy, there exist recognising automata models. Just like for the grammars, the restrictions of the automata models grow for lower levels in the hierarchy. All of these automata models may be considered to be restricted Turing machines. The following three automata models are restricted Turing machines describing the other language classes of the Chomsky hierarchy. For example, a linear bounded automaton (LBA) is a non-deterministic Turing machine, that may only access a limited number of cells of the tape. This number is a linear function in the size of the machines initial input. This automaton model was shown to recognise precisely the class of context-sensitive languages [43]. An- other type of linear bounded automata are the deterministic ones (DLBA). They were introduced by Myhill [56], and were shown to recognise a proper subset of the context-sensitive languages [46]. When starting from a non-deterministic Turing machine with a one-way and read- only input tape, and one working tape that is bounded on one side, another possible restriction concerns the accessability of the content on the working tape. If the last inserted symbol needs to be processed first, that is the working tape is restricted to a last-in-first-out (LIFO) mode, the class of languages recognised by this automaton model is the class of context-free languages. This new type of automaton is called pushdown automaton (PDA) [62]. Also the finite automata (FA) that recognise the regular languages (see for ex- ample [29]) can be interpreted as restricted Turing machines. A finite automaton only consists of the finite processing unit of the Turing machine, and a read-only tape. The input of the automaton is contained on this tape in the beginning of a computation. While processing this word, the finite automaton may read the input only from left to right, and may not access any other cell. This especially means, every symbol of the input is processed only once. Like for the Turing machines, modifications of the other automata models were introduced and investigated. Some of those new models recognise language classes that are different from the ones defined in the Chomsky hierarchy. For example the deterministic pushdown automata were shown to be strictly weaker than non- deterministic PDA [22, 23]. Also the language class accepted by DLBA mentioned Measuring Defects in Finite Automata 3 before differs from the ones described in the Chomsky hierarchy. From all of the classes of languages yet known, this thesis concentrates on the regular languages. This language class is perhaps the best researched of all yet known classes of languages. This is no surprise, since this type of language and its representations have several applications. The regular languages are used, for example, in the lexical analysis in programming language compilation [1, 2], circuit design [10], or text editing and pattern matching [42]. The class of regular languages is still one of the most intensely researched lan- guage classes in the theory of formal languages. This is particularly interesting since already in the late seventies, it was a wide spread belief that all important and everything of interest has already been known about regular languages. The only exceptance were some very hard problems, that were summarised by Brzozowski in 1979 at the International Symposium on Formal Language Theory [9]. Over the last few decades, this belief was proven to be wrong. First of all, at least three of the six very hard open problems have been solved: the restricted star height problem [24], the regularity of noncounting classes problem [18], and the problem of the optimality of prefix codes [63]. Additionally, new problems have been stated and solved for this language class. For example, the complexities for operations on regular languages respectively finite automata were introduced [26, 39, 59, 60, 68]. Some recent surveys about descriptional complexities are given by Gao, Moreira, Reis and Yu [19] and by Holzer and Kutrib [27, 28]. Since the beginnigs of the theory of formal languages and automata theory, a lot of different representations for the regular languages have been defined. Besides the type 3 grammars and finite automata, the regular languages were shown to be representable also by so called regular expressions. This model was presented by Kleene in 1956 [41]. A regular expression consists of symbols from a given alphabet, that are combined by union, concatenation, and Kleene star. Additionally, also ∅ and λ are regular expressions. Over the years, also different models of automata were introduced, like two-way finite automata, alternating automata, and many more (see, for example, [29]). All of these models were shown to be equivalent to finite automata, and, therefore, they also describe the class of regular languages. Two somehow different types of finite automata were introduced by Mealy [50] and Moore [53]. In their definitions, the automata produce an output when processing some sequence of symbols. This is why they can be interpreted to be translaters or transducers. Most of the equivalence results are proven by simulating one model by another one. This naturally leads to the question for a possibility to compare the different models. One such possiblity for automata recognising the regular languages is to compare the numbers of states. This number is called the size of such an automaton. The size of finite automata is also called their state complexity. For example, the size of a DFA that is equivalent to an NFA with n states can at most be 2n. This was first shown by Myhill [55] and Nerode [57]. Their famous algorithm for this conversion is called powerset construction. There exist several examples that prove the tightness of this bound, which means there exist NFA with n states that cannot Measuring Defects in Finite Automata 4 be transformed into a DFA having strictly less than 2n states [48, 51, 54]. Other costs for conversions are for example the one between alternating finite automata and deterministic finite automata. For an AFA with n states there exist DFA with 22n states. There exist examples such that any DFA needs not less than this many states [12]. The costs for the conversion of a 2NFA with n states into a DFA are shown in [40] to be n(nn − (n− 1)n). The mentioned numbers for the costs of conversions are so called trade offs. In general, a trade off is a number, that enables the comparison of the size of one automaton model with another model simulating the first model. Trade offs between models describing the same family of languages can normally be expressed by a function depending on the size of the simulated model. Such a trade off is called recursive. If there does not exist such a function, the trade off is non-recursive. This type of trade off often occurs when simulating one model by another model, where the models do not recognise the same class of languages. This phenomenon is discussed in detail by Kutrib in [44]. One such non-recursive trade off occurs when simulating a pushdown automaton by a deterministic finite automaton. This was already shown by Meyer and Fischer [51]. The trade offs for the conversions of automata recognising the regular languages often differ for unary and at least binary alphabets. Chrobak showed in [16, 17] that the trade off between non-deterministic and deterministic finite automata for unary alphabets is given by eΘ( √ n·lnn). Also finite languages often differ in the state complexity. For finite languages over arbitrary alphabets Salomaa and Yu showed tight bounds in [61]. Due to this, also in this thesis finite and unary regular languages are considered separately for some of the considered problems. But not only for language classes belonging to the hierarchy defined by Chomsky trade offs were considered. Also for several classes of subregular languages trade offs for the conversion of non-deterministic to deterministic finite automata were considered for example by Bordihn, Holzer, and Kutrib [3]. In addition to trade offs for subregular languages, also the state complexity for operations on subregular languages and the finite automata recognising these types of languages were subject of research. Results on star-free languages were shown by Brzozowski and Liu in 2012 [8]. In this work, another definition for the size of deterministic finite automata was used. The used quotient complexity is not defined on the states of the automata but is related to the quotients of an automaton. Even though the approach differs, the quotient and the state complexities are equivalent. For more details on the definition of quotient complexity, the reader is referred to [5]. Other result on quotient complexities are found for example in [6] and [7]. Besides the state complexity for regular languages respectively their recognising finite automata, there exist several other measures. The state complexity often only compares the size of different automata recognising the same fixed regular language. A measure for the comparison of different regular languages is the distance. Intro- duced by Choffrut and Pighizzini [13], the already existing measures between words were extended to measures between regular languages. Examples for such word dis- tances are the prefix, suffix, subword, Hamming, and the edit distance. All of these Measuring Defects in Finite Automata 5 distances can be extended in the way Choffrut and Pighizzini did. They defined the distance between two regular languages to be the maximum of the suprema of the minimal distances of all words of one language to all words of the other language. This may be difficult to use in pratice, since, in general, a regular language consists of infinitely many words. In practice, there exist a lot of applications using finite automata. Especially in language processing, finite automata with millions of states are used [52]. Like for circuits, also in finite automata defects may occur. In such big automata, it is more likely that a defect occurs, especially when changing between different representation models. Such defects may affect the states, or the transitions, or both. For a defective automaton with that many states it may be of interest to know, if the defect results in a large or small blow up in the size of an equivalent minimal deterministic automaton. If may be uselful, if the size does not change too much. Another question concerns the accepted languages. Do the languages of the defective respectivley the original automaton differ much? A practical example for such a comparison is given in the production of flat screens. A limited number of defective pixels is acceptable, in contrast to many such defects. Another example in practice is the comparison of reproduced parts with a sample. In case that there exist too many differences, the reproduced parts will not be used or selled. For this, there needs to exist a sample. In our case for defective automata, this means the original automaton is reproducible in some way. One possibility is by its accepted language. Another question arises naturally for defective automata: is the defect recognis- able? And if it can be recognised, is possible to repair the automaton? Then the whole problem concerning blow ups for determinising and minimising such automata does not bother at all. Unfortunately, only one of the considered defects in this the- sis is possible to be repaired. But for all of the defects, we will consider languages related to the languages to the defective and the original finite automaton. Depend- ing on the sizes of the these languages, the language of the defective automaton may not differ too much from the one of the original deterministic finite automaton. But also the opposite may happen. The second chapter of this thesis gives a short introduction to the fundamentals of formal languages and automata theory used in the subsequent chapters. This introduction focusses on the regular languages and their representation by finite au- tomata. Also the defects investigated in most of the chapters are defined. Examples are given for the possible defects and their consequences. In the end of this chapter we explain by example the reason for the decision to only investigate such defective automata that are received by only one defect of one of the defined types. In Chapter 3, for all defined defects, bounds for the size of minimal deterministic automata equivalent to the defective automata are given. For most of the defects, the bounds for at least binary alphabets differ from the one for unary alphabets. The results for the unary case are stated separately in this chapter. For some defects, we also consider the possibility of the existence of defective finite automata of size n such that the equivalent minimal deterministic finite automaton Measuring Defects in Finite Automata 6 has a fixed size less than or equal to the upper size bound. This question is related to the magic number problem. This problem asks for the existence of minimal deterministic automata equivalent to non-deterministic automata of size n, having a fixed size i between n and the upper bound of 2n. If there exists a number i such that there exists no non-deterministic finite automaton having an equivalent deterministic finite automaton of size i, then this number is called magic [32]. The last section of the third chapter covers the question for the state complexity of deterministic finite automata equivalent to defective automata for finite languages. It is shown that most of the bounds for arbitrary regular languages and the finite languages differ massively. Chapter 4 covers the questions for the recognisability of defects and the possibility of correction. Assuming that only one occurrence of one type of defect results in a given defective automaton, properties of these automata to recognise the type of defect are summarised. It is proven by example that for most of the defects it is im- possible to fix the defective automaton to receive the original minimal deterministic automaton. In this chapter, also three types of languages related to the defective and the orig- inal automaton are defined and researched. For the definition of these languages, the defective automaton, the type of defect occurring in this automaton, and the affected state is assumed to be provided. One of these languages collects all words that are accepted by the defective automaton without using the defect. Another language collects all those words that are rejected by the automaton that do not use the defect while being processed by the defective automaton. These two languages are subsets of the languages accepted respectively rejected by the original automa- ton. The last considered language collects all words that use the defect while being processed by the defective automaton. This includes all such accepted or rejected words. The union of all the three languages is Σ∗, where Σ denotes the underlying alphabet. For these three defined languages, upper bounds for the state complexity of minimal deterministic finite automata are considered. In Chapter 5, a distance between two regular languages is introduced. In the first part of this chapter, this distance is based on the prefix distance between two words, and in the second part on the suffix distance. The introduced distances are parameterised. Since they sum up the word distances for words in one language to the other language and vice versa, only words up to a certain length are considered in these sums. This length is the parameter in the language distance. Properties of this distance are investigated. It is proven that the defined distances can only be constant, polynomial, or exponential. A construction of an automaton assisting to compute the precise distance between two languages is given. Different cases are separated that may occur when calculating the distance between a word and a regular language. Based on these criteria and the constructed automaton, the precise distance for two languages can be computed. This computation is also analysed in detail which gives criteria for the decision if the distance is constant, polynomial, or exponential. All these results are proven for the parameterized prefix distance and are inherited for the parameterized suffix distance in the last part of Measuring Defects in Finite Automata 7 this chapter. For this, the relation between the two distances is also proven. The last chapter concludes this thesis. In one section, the languages related to the defective automata and the distances for defective automata are linked. It is shortly summarised that the combination of two possibilities to measure the influence of a defect may give better criteria for the decision if a defective automaton differs much or not much from the original automaton. Another section covers another definition for the languages related to the defective automaton. This definition is based only on the given defective automaton. It does not presume the knowledge of the affected state or transition. Upper bounds for minimal deterministic finite automata recognising these languages are given. The last part of Chapter 6 gives perspectives of possible future research. Measuring Defects in Finite Automata 2 Definitions In this chapter, nearly all of the formalisms used in the following chapters will be defined. Most of them were already introduced by other researchers. One source for these definitions is [29], others for example are [67] or [47]. 2.1 Words and Languages A non-empty and finite set of symbols is called an alphabet. This is the base for the definition of languages. The symbols belonging to an alphabet are also called letters. A word over an alphabet Σ is a finite sequence of symbols from Σ. For example, the sequence ababba is a word over the alphabet Σ = {a, b}. The sequenc abc is not a word over Σ, since c does not belong to Σ, but it is a word over the alphabet {a, b, c, d}. This also shows, that it is not necessary for a word to contain all the symbols of the alphabet. The length of a word w is the number of its not necessarily different symbols and is denoted by |w|. Let w = a1a2 . . . an be a word that consists of symbols a1, a2, . . . , an from a fixed alphabet. Then the length of w is |w| = n. The word of length zero, that does not contain any symbols, is called empty word and is denoted by λ. The reversal of a word w is denoted by wR. If w = a1a2 · · · ak for some k ≥ 1, its reversal is wR = ak · · · a2a1. Let u = a1a2 · · · an and v = b1b2 · · · bm be two words, where all the symbols a1, a2, . . . , an and b1, b2, . . . , bm belong to some alphabet. The concatenation of the words u and v is defined to be u · v = a1a2 · · · anb1b2 · · · bm. This means, the concatenation of two words is received by stringing them together. The concatenation symbol · between the two words is often omitted to simplify the expression. For every word w, its concatenation with the empty string is given by w · λ = λ · w = w. For two words u, v over some alphabet Σ, v is called a prefix of u, if there exists a word w over Σ such that u = vw. If v 6= u, and v 6= λ, v is said to be non-trivial, otherwise it is a trivial prefix. In the same way, the suffix of a word is defined to be a word v such that the word u can be split up into u = wv, where w is another word over Σ. The suffix v is also called non-trivial, if v 6= u and v 6= λ. The set of all words over a fixed alphabet Σ is denoted by Σ∗. This set also includes the empty word λ. The set Σ∗ \ {λ} is denoted by Σ+. It consists of all words of length at least one over Σ. The set containing all words over Σ up to a fixed 8 2.2. Regular Languages and Regular Expressions 9 length n ≥ 0 is denoted by Σ≤n, and the set of words of length precisely n is Σ=n. The size or cardinality of a set A is denoted by |A|. For finite sets, this number is the number of elements belonging to A. For infinite sets, this number is ∞. The power set of a given finite set A is denoted by 2A, and consists of ∣∣2A∣∣ = 2|A| many elements. Each of its elements is a subset of A. This also includes the empty set ∅, since this always is a subset. The set A itself also belongs to its power set. A subset L of Σ∗ is a collection of certain words. Such a set is called a language. The trivial languages are the empty set ∅ and the whole set Σ∗. For languages L,L1, L2 ⊆ Σ∗, the union L1 ∪ L2 = {w ∈ Σ∗ | w ∈ L1 or w ∈ L2}, intersection L1 ∩ L2 = {w ∈ Σ∗ | w ∈ L1 and w ∈ L2}, difference L1 \ L2 = {w ∈ Σ∗ | w ∈ L1 and w 6∈ L2}, and symmetric difference L14L2 = (L1 \ L2) ∪ (L2 \ L1) are defined in the usual way. Also the Kleene star of a language L∗ = {w ∈ Σ∗ | w = w1 · w2 · · · · · wk for some k ≥ 0 and some w1, w2, . . . , wk ∈ L} is defined as usual, where k = 0 means w = λ, even though λ may not belong to L. Notice that Kleene star applied to the empty language ∅ yields the language {λ}. The Kleene Star for a language can also be expressed by L∗ = ⋃ i≥0 Li, where the languages Li are inductively defined by L0 = {λ}, Li+1 = L · Li. 2.2 Regular Languages and Regular Expressions A lot of different models are existent that represent the regular languages. This class of languages, denoted by REG, is important and well studied. In the Chomsky hierarchy, described by Chomsky [14, 15], the regular languages represent the lowest level. A common and quite natural way to describe this class of languages are the regular expressions. This model was introduced by Kleene [41]. A regular expression is a finite combination of concatenation, union, and Kleene star over the symbols of an alphabet and the empty language ∅. Formally, a regular expression over an alphabet Σ is defined as follows: Measuring Defects in Finite Automata 10 2.3. Finite Automata • For each a ∈ Σ, λ, ∅, and a are regular expressions. • If r and s are regular expressions, so are (r · s), (r + s), and r∗. The language described by regular expressions r and s is given by • L(λ) = {λ}, L(∅) = ∅, and L(a) = {a} for all a ∈ Σ. • L(r · s) = L(r) · L(s), L((r + s)) = L(r) ∪ L(s), and L(r∗) = (L(r))∗. To simplify the notation, it is common to give highest priority to the star op- erator ∗, followed by ·, and then +. Due to this priority and the associativity of concatenation and union, superfluous parentheses can be omitted. Usually, also the operator · is omitted. It is conventional to write ri instead of r · r · . . . · r (i times), and r+ for r · r∗. Example 2.2.1. Let r = (( (a+ b) + c )∗ · ((((a+ c) · (a) ) · ( a+ (b+ c) )∗) · (a · b))) · (c) be a regular expression. This expression can be simplified by omitting unnecessary braces, and the · for the concatenation. So, r can also be denoted by the expression (a + b + c)∗(a + c)a(a + b + c)∗abc. If Σ = {a, b, c}, the expression can be further simplified to r = Σ∗(a + c)aΣ∗abc. The language L(r) described by the regular expression r consists of all words over {a, b, c} that end on abc and contain at least one of the sequences aa, or ca before the last sequence of abc. This is precisely the language L(r) = Σ∗{aa, ca}Σ∗{abc}. _ ^ ^ _ Regular expressions can also be used to prove some closure properties for REG. A language class L is said to be closed under some language operation, if all possible ap- plications of this operation to a language belonging to L leads to a language that also belongs to L. From the definition of regular expressions it is derived immediately, that the class REG is closed under the operations concatenation, union, and Kleene star. Also under the operation of reversal, REG is closed. The language L(r)R for some regular expression r can be described by its reversal rR. 2.3 Finite Automata Another possibility to describe the regular languages is given in form of finite au- tomata. A finite automaton can be considered to be a device with a finite input tape, that either accepts or rejects an input. The device changes its state while reading symbols from left to right provided on the input tape, but it does not produce any output. The only information obtained from finite automata after processing some word, is the type of the state that is entered at the end of the computation. The type of a state can be accepting or non-accepting. This means, finite automata can be regarded to be some kind of recognition device for regular languages. Measuring Defects in Finite Automata 2.3. Finite Automata 11 The following definition for non-deterministic finite automata was first introduced by Rabin and Scott [58]. They were the first to introduce non-determinism for finite automata. A non-deterministic finite automaton (NFA) is a quintuple A = 〈Q,Σ, δ, q0, F 〉, where • Q is a finite set of states, • Σ is an alphabet, • δ : Q×Σ→ 2Q is the state transition function (or simply transition function), • q0 ∈ Q is the initial state, and • F ⊆ Q is the set of accepting states. The rules defined by the state transition function determine the behaviour of the automaton. If the NFA is in some state q ∈ Q, and reads a symbol a ∈ Σ on its input tape, the NFA can change its state into some state p ∈ δ(q, a). Since there may exist more than only one such state p, the automaton is called non-deterministic. It is also possible, that such a state p does not exist at all. In this case, the transition for q on a is said to be undefined. In each step of the automaton, such an NFA consumes one symbol of the in- put and changes its state according to its transition function. If more than one state is existent, into which the automaton may change, one of them is chosen non- deterministically. If |δ(q, a)| ≤ 1 for all q ∈ Q and all a ∈ Σ, the automaton is called deterministic. In such a deterministic finite automaton (DFA), there exists no possibility for the choice of the next state for each combination of a state and a symbol in the input. In this thesis, all DFA are considered to be complete, which means there exist no undefined transitions. Every incomplete DFA can be converted into a complete DFA. This is done by letting each undefined transition lead into a rejecting sink state, which is a non-leaving state. The state transition function δ of a finite automaton 〈Q,Σ, δ, q0, F 〉 is defined only for single symbols. For all states q ∈ Q, it can be extended to a transition function for words w ∈ Σ∗ by δ(q, w) = ⋃ p∈δ(q,a) δ(p, w), if w = av for some symbol a ∈ Σ and a word v ∈ Σ∗, respectively by δ(q, w) = {q}, if w = λ. To simplify the notation of the transition function for DFA, the set braces are omitted for the successing state. While processing a word, a finite automaton enters several states. Such a sequence of states is called a path of the automaton. In this sequence, it is possible that one state appears several times. The length of a path is defined to be the number of states contained in the path. If a state occurs several times, each occurrence is counted separately. Let for example s1s2s1s3 denote a path. Then its length is four, even though s1 appears twice. A subsequence of the path that begins with a state s, ends with the same state s, with s not appearing elsewhere in this subsequence, is Measuring Defects in Finite Automata 12 2.3. Finite Automata called a cycle or ring of the DFA. The length of a cycle is defined as the number of states within the cycle minus one, since state s builds the beginning and the end of the cycle. If a cycle is of length one, it is also called a loop. A loop is nothing else but a single transition that does not change the state. For every word over the alphabet of the automaton there exists a path that begins with the initial state. Let s0s1 . . . sk be a path for some word, where s0 = q0 and k ≥ 0, and let si be one of these states. All the states, that are visited before entering si are called its predecessors, and all the states that are visited after this state are the successors of si. In the path, this means thart all states sj , where j < i are predecessors, and that all states sl with l > i are the successors. Let A be a finite automaton, either deterministic or non-deterministic. The ac- cepted language of A, in terms L(A), is the set of words that lead the automa- ton into acceptance, when starting the processing in the initial state. Formally, for a finite automaton A = 〈Q,Σ, δ, q0, F 〉, its accepted language is defined to be L(A) = {w ∈ Σ∗ | δ(q0, w) ∩ F 6= ∅}. Two finite automata A and B are said to be equivalent, in terms A ≡ B, if they accept the same language, which means L(A) = L(B). Kleene showed that the language class that is recognised by finite automata is precisely the class REG [41]. Theorem 2.3.1 (Kleene [41]). A language L is regular if and only if there exists a finite automaton A with L(A) = L. Conversions of NFA into DFA Since NFA and DFA are both models that describe the class REG, it should be possible to convert one model into the other. Indeed, this is possible. It was shown by Rabin and Scott in 1959 [58]. The conversion is done by the so-called power set construction. Let A = 〈Q,Σ, δ, q0, F 〉 be an arbitrary NFA. An equivalent DFA is given by B = 〈2Q,Σ, δ′, {q0}, F ′〉, where δ′(P, a) = ⋃ p∈P δ(p, a) for all P ⊆ Q and a ∈ Σ, and F ′ = {P ∈ 2Q | P ∩ F 6= ∅}. The following example gives an idea of how the power set construction works. It also shows how finite automata are represented by state transition diagrams. Example 2.3.1. Let A = 〈{q0, q1, q2, q3}, {a, b}, δ, q0, {q3}〉 be the NFA with state transition function δ(q0, a) = {q1, q2}, δ(q0, b) = {q2}, δ(q1, a) = δ(q1, b) = {q2}, δ(q2, a) = {q3}, δ(q3, a) = δ(q3, b) = {q3} that is depicted in Figure 2.1. An equivalent DFA is given by B = 〈Q, {a, b}, δ′, {q0}, F 〉, where its state set is Q = {∅, {q0}, {q2}, {q3}, {q1, q2}, {q2, q3}}, its set of accepting states is given by Measuring Defects in Finite Automata 2.3. Finite Automata 13 q0 q1 q2 q3 a a, b a a, b a, b Figure 2.1: NFA that is converted into a DFA in Example 2.3.1. {q0} {q1, q2} {q2, q3} {q3} {q2}∅ a a a, b b b a b a, b a, b Figure 2.2: DFA that is constructed by the power set construction from the NFA de- picted in Figure 2.1. F = {{q3}, {q2, q3}}, and its transition function is defined by δ′({q0}, a) = {q1, q2}, δ′({q0}, b) = {q2}, δ′({q2}, a) = {q3}, δ′({q2}, a) = ∅, δ′({q3}, a) = δ′({q3}, b) = {q3}, δ′({q1, q2}, a) = {q2, q3}, δ′({q1, q2}, b) = {q2}, δ′({q2, q3}, a) = δ′({q2, q3}, b) = {q3}, δ′(∅, a) = δ′(∅, b) = ∅. This automaton is depicted in Figure 2.2 _ ^ ^ _ The formal description of the DFA constructed by the power set construction contains all possible states. The states that are not depicted in Figure 2.2 cannot be reached from the initial state {q0} by any word over {a, b}. It is a general convention to omit such states for finite automata. Minimal Deterministic Finite Automata Taking a closer look at the constructed DFA from Example 2.3.1, one can recognise that by the states {q2, q3} and {q3} the same words are accepted. This leads to the formal definition of equivalent states: Measuring Defects in Finite Automata 14 2.3. Finite Automata Given a DFA with state transition function δ. Two states p and q are said to be equivalent, if for any word w, δ(p, w) is accepting if and only if δ(q, w) is accepting. Otherwise, the states p and q can be distinguished and are inequivalent. Equivalent states can be merged. This means, for a DFA A = 〈Q,Σ, δ, q0, F 〉, and two equivalent states p, q ∈ Q, a new DFA A′ = 〈Q \ {p},Σ, δ′, q′0, F \ {p}〉 is constructed, where δ′(s, a) = { q if δ(p, a) = p, δ(s, a) else , and q′0 = { q if q0 = p, q0 else . Merging equivalent states always leads to a new DFA. If a DFA only contains states that are not equivalent, it is called minimal. There exist several algorithms to minimise a DFA, for example one by Brzo- zowski [4], one by Hopcroft and Ullman [29], one by Huffman [30], or one by Moore [53]. One well-known algorithm to minimise a DFA is the so called table-filling algo- rithm, see for example [29]. This algorithm determines recursively all inequivalent states by marking pairs of states. For a given DFA A = 〈Q,Σ, δ, q0, F 〉, at first all state pairs of states are marked, where only one of them is accepting. Then, for all non-marked pairs of states p and q, and symbols a ∈ Σ, this pair is marked, if and only if the state pair δ(p, a) and δ(q, a) is already marked. This algorithm stops, if there either exist no more unmarked pairs of states, or if there exists no more possibility to mark the remaining unmarked pairs. The unmarked pairs are exactly the equivalent states. Theorem 2.3.2. If two states are not distinguished by the table-filling algorithm, then the states are equivalent. The following important property of deterministic finite automata results from this. Theorem 2.3.3. If A is a DFA, and B the DFA constructed from A by the table- filling algorithm, then B has as few states as any DFA equivalent to A. From all these properties of deterministic finite automata, the following theorem can be derived, that gives a characterisation of minimal DFA. Theorem 2.3.4. A DFA is minimal if and only if all states are reachable from its initial state, and if all of its states are pairwise inequivalent. This especially means, that for any regular language, there exists a DFA that accepts this language, and has a minimal number of states. This also means, all minimal DFA are the same except for a renaming of the states. Therefore, this Measuring Defects in Finite Automata 2.4. Defects in Finite Automata 15 minimal number of states can be used to measure the size of DFA and indirectly the size of regular languages. This measure is called state complexity. This measure can also be extended to non-deterministic finite automata. A min- imal NFA accepting a certain language is defined to be an NFA, for which there exists no equivalent NFA that has fewer states. Now it is possible to use this measure to express the compactness of the repre- sentation of regular languages by NFA instead of DFA. It was shown by Rabin and Scott [58] that an upper bound for the conversion of NFA into DFA is exponen- tial. The tightness of this bound was shown later independently by Lupanov [48], Moore [54], and Meyer and Fischer [51]. Theorem 2.3.5. There exist regular languages, that are accepted by NFA with n states, and by minimal DFA with 2n states. 2.4 Defects in Finite Automata Finite automata can be manipulated. Such a manipulation can affect either the transitions or the states. In the following, the manipulations are called defects. A finite automaton that contains a defect will be called defective. The automaton, from which the defective automaton is received by some defect, is called original automaton. The investigations in this thesis concentrate on the following defects. Definition 2.4.1. The following defects can affect a deterministic finite automaton. Their definitions will be reduced to only one occurrence of the defect. Exchange Symbol: The symbol of a transition is exchanged by another one. Flip Transition: The source and target of a transition are interchanged. Delete Transition: A transition is deleted. Insert Transition: A new transition is inserted. Delete Accepting State: An accepting state is deleted, including all transitions having this state as source or target. Delete Non-Accepting State: A non-accepting state is deleted, including all tran- sitions having this state as source or target. Remove Acceptance: The property of acceptance is removed for a state. Add Acceptance: The property of acceptance is added for a state. Example 2.4.1. Let A = 〈Q,Σ, δ, q0, F 〉 be the DFA depicted in Figure 2.3, where Q = {q0, q1, q2}, Σ = {a, b}, F = {q0, q2}, and δ(q0, a) = δ(q0, b) = q1, δ(q1, a) = δ(q1, b) = q2, δ(q2, a) = δ(q2, b) = q2. Measuring Defects in Finite Automata 16 2.4. Defects in Finite Automata q0 q1 q2 a, b a, b a, b Figure 2.3: Minimal DFA accepting the language {a, b}2{a, b}∗ ∪ {λ}. q0 q1 q2 b, b a, b a, b Figure 2.4: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect exchange symbol affecting the transition on a for state q0. This automaton accepts the language {a, b}2{a, b}∗ ∪ {λ}. The defect exchange symbol affecting the transition from state q0 to state q1 by replacing a by the symbol b in DFA A results in an incomplete non-deterministic finite automaton. This defective automaton is depicted in Figure 2.4, and accepts the language {b}{a, b}+ ∪ {λ}. If the defect flip transition affects the transition from state q0 to q1 on symbol b for DFA A, the resulting defective DFA is the one given in Figure 2.5 which accepts the language {a}{ba}∗{a, b}+ ∪ {λ}. The automaton depicted in Figure 2.6 accepts the language {a}{a, b}+ ∪{λ}, but is the result of the defect delete the transition affecting the transition on b for state q0 in DFA A. If the defect insert transition affects the DFA A by inserting a transition on b from state q1 to state q0, the accepted language does not change. The resulting defective automaton is depicted in Figure 2.7. q0 q1 q2 a a, b a, b b Figure 2.5: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect flip transition affecting the transition on b for state q0. Measuring Defects in Finite Automata 2.4. Defects in Finite Automata 17 q0 q1 q2 a a, b a, b Figure 2.6: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect delete transition affecting the transition on b for state q0. q0 q1 q2 a, b a, b a, b b Figure 2.7: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect insert transition that inserts the transition δ(q1, b) = q0. The deletion of the accepting state q2, or the non-accepting state q1 both result in defective finite automata, that are both incomplete DFA. These two automata are depicted in Figures 2.8 respectively 2.9. Both automata only accept the lan- guage {λ}. If the acceptance property of state q0 for DFA A is removed, the language accepted by the defective automaton only differs by the empty word from language L(A). The defective automaton is depicted in Figure 2.10. The last remaining defect of adding the acceptance for some state of A results in the automaton from Figure 2.11. This automaton accepts the language {a, b}∗. _ ^ ^ _ In general, all defects change the accepted language of the affected automaton. If the DFA is affected by more than only one defect, it is possible that the accepted language does not change at all, which means in detail that the defective DFA is equivalent to the original DFA. This is the case if, for example, the defects delete transition and insert transition affect the same transitions. If is also possible that one defect applied several times results in a massive change q0 q1 a, b Figure 2.8: Defective finite automaton derived from the DFA depicted in Figure 2.3 by the defect delete accepting state affecting state q2. Measuring Defects in Finite Automata 18 2.4. Defects in Finite Automata q0 q2 a, b Figure 2.9: Defective finite automaton derived from the DFA depicted in Figure 2.3 by the defect delete non-accepting state affecting state q1. q0 q1 q2 a, b a, b a, b Figure 2.10: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect remove acceptance affecting state q0. q0 q1 q2 a, b a, b a, b Figure 2.11: Defective finite automaton derived from the DFA depicted in Figure 2.3 for the defect add acceptance affecting state q1. of the accepted language. For example, if the defect of removing acceptance for some states affects a minimal DFA several times, the accepted language may change massively. This can be seen by removing the acceptance of the states q0 and q2 of the minimal DFA depicted in Figure 2.3. The resulting defective automaton does not accept any word, which means that infinitely many words are not accepted any- more. Similar things may happen when combining different defects. Combinations of defects may also lead to other types of defects. For example, the combination of inserting and deleting one transition each can result in the defect flip transition. If the transition of DFA A depicted in Figure 2.3 on b is deleted for state q0, but a transition on b is added to q1 that leads into state q0, the resulting defective finite automaton is the one we received in Example 2.4.1 for the defect flip transition. This is the reason why, in this thesis, only one occurrence of a single defect will be considered. Measuring Defects in Finite Automata 3 State Complexity of DFA for Defects This chapter deals with the state complexity of finite automata resulting from de- fects. These automata are retrieved from minimal and complete DFA which are af- fected by a defect. Those defects always lead to at least incomplete DFA or even to non-deterministic finite automata. These automata will be converted into complete and minimal DFA again. The size of these automata will be given in dependency of the size of the original, non-defective DFA. 3.1 Defects Occurring on the Transitions In this section, we concentrate on defects on the transitions of a DFA. The possible defects on transitions are exchange symbol, flip transition, delete transition, and insert transition. The first defect to be examined is exchange symbol. The following theorem gives a tight upper bound for the state complexity of a complete and minimal DFA accepting the language of the defective automaton. Its size will be determined in terms of the size of the original DFA. Theorem 3.1.1. Let A be a minimal DFA with n ≥ 2 states and let the NFA, which is received by the defect exchange symbol affecting DFA A be denoted by A′. Then the equivalent minimal DFA of A′ needs at most 2n states. This bound is tight even for binary alphabets. Proof. In the worst case the considered defect implements ambiguity in the automa- ton and leads to undefined transitions. An equivalent minimal DFA may need all states generated by the power set construction. This means a minimal DFA has at most 2n states. We now show that this bound is reachable by a defect on a single transition in a DFA with binary alphabet. Let A = 〈Q, {a, b}, δ, q1, {qn}〉 be the DFA depicted in Figure 3.1 with Q = {q1, q2, . . . , qn} and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, q2 if j = n , δ(qj , b) = { q1 if j = 1 or j = n, qj+1 if j ∈ {2, 3, . . . , n− 1} . This DFA is complete. It is also minimal since for x, y ∈ {1, 2, . . . , n − 1}, x < y, we have that δ(qx, a n−y) 6= qn but δ(qy, a n−y) = qn. 19 20 3.1. Defects Occurring on the Transitions q1 q2 q3 qn−1 qn a a, b a, b a, b a, b b b a Figure 3.1: DFA that proves the optimality of the upper bound for the defect ex- change symbol. q1 q2 q3 qn−1 qn a a, b a, b a, b a, b b a a Figure 3.2: NFA of Moore whose equivalent minimal DFA needs precisely 2n states for n ≥ 2 [54]. Modifying the transition δ(qn, b) = q1 by exchanging the letter b by a results in the NFA given in Figure 3.2. For this automaton it was already shown by Moore in [54] that the equivalent minimal DFA needs precisely 2n states for n ≥ 2. This defect has no effect for unary alphabets since in such an alphabet there do not exist enough symbols to exchange one by another. This gives a tight bound of n states for unary alphabets. The theorem below provides the upper bound for the state complexity of a DFA af- feted by the defect flip transition. Theorem 3.1.2. Let A be a minimal DFA with n ≥ 2 states and let the NFA A′ be the automaton that is received from A by the defect flip transition. Then the equivalent minimal and complete DFA of NFA A′ needs at most 2n states. This bound is tight even for binary alphabets. Proof. In the worst case, the modification implements ambiguity in the automaton and leads to undefined transitions. An equivalent complete and minimal DFA may need all states generated by the power set construction. This means such a DFA needs at most 2n states. In the following we show that this bound can be reached by manipulating a single transition in a DFA with a binary alphabet. Let A = 〈Q, {a, b}, δ, q1, {qn}〉 be the Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 21 q1 q2 qn−5 qn−4 qn−3 qn−2 qn−1 qn a a a a a a a, b b b b b b a b a b Figure 3.3: DFA that proves the tightness of the upper bound for the defect flip transition. DFA from Figure 3.3 with state set Q = {q1, q2, . . . , qn} and δ(qj , a) =  qj+1 if j ∈ {1, 2, . . . , n− 4, n− 2, n− 1} q1 if j = n− 3, qn−2 if j = n , δ(qj , b) =  qj if j ∈ {1, 2, . . . , n− 4, n}, qj+1 if j ∈ {n− 3, n− 1}, qn−3 if j = n− 2 . This DFA is complete. The following considerations prove the minimality of this automaton. In case state qi belongs to the set {q1, q2, . . . , qn−3} it is reachable from state q1 by the word ai−1. If qi is one of the states qn−2, qn−1, or qn it can be reached from the initial state by the word an−4bai−(n−2). For indices x, y ∈ {1, 2, . . . , n − 3}, x < y, we have that δ(qx, a n−3−xba2) = qn but δ(qy, a n−3−xba2) 6= qn. It also holds that δ(qn−2, ab) = qn but δ(qx, ab) 6= qn, δ(qn−1, a) = qn but δ(qx, a) 6= qn and δ(qn−2, a) 6= qn. This proves that all the states of automaton A are reachable and inequivalent. In case the considered defect flip transition affects the transition δ(qn−1, b) = qn it is replaced by the transition δ(qn, b) = qn−1. This leads to the NFA given in Figure 3.4. For this automaton Jirásková and Šebej have already shown in [38] that the equivalent minimal DFA needs exactly 2n states for n ≥ 2. Theorem 3.1.2 gives a tight bound for at least binary alphabets. For unary alpha- bets, this bound differs. The following theorem covers the unary case. Theorem 3.1.3. Let A be a minimal DFA with n ≥ 2 states and unary alphabet, and let A′ be the NFA that is received by the defect flip transition affecting A. Then the equivalent minimal DFA for A′ needs at most n+ 1 states. This bound is tight. Proof. It is well known since Chrobak [16, 17] that a complete and minimal DFA over a unary alphabet consists of a connected chain of states followed by a ring of states. Measuring Defects in Finite Automata 22 3.1. Defects Occurring on the Transitions q1 q2 qn−5 qn−4 qn−3 qn−2 qn−1 qn a a a a a a b b b b b a b b a b a Figure 3.4: NFA of Jirásková and Šebej for which the equivalent minimal DFA has exactly 2n states for n ≥ 2. q1 q2 qn−1 qn a a a a a Figure 3.5: DFA A for the defect flip transition for unary alphabets. This ring may be a single (accepting or non-accepting) state having a transition to itself on the only symbol of the alphabet. It can also contain more than one state. It is also possible that the chain of states does not exist at all and the automaton only consists of a ring of states. In a DFA, where the ring is a single accepting state, or consists of more than only one state, a rejecting sink state does not exist. In the following, only DFA of this structure will be considered. The structure of such DFA especially means that there exists only one transition leaving each state and for most states only one transition enters this state. The only exception to this is the last state of the chain, which is the end of the chain and the beginning and ending of the ring at the same time. This state q is the only state that is the target of two transitions. If the defect flip transition affects a unary minimal DFA, the resulting defective automaton always accepts a finite language. The reason for this is, that the in- terchanging of the source state s and target state t 6= q of a transition deletes the connection between s and t. Due to the structure of unary DFA, this means at the same time the loss of reachability of all states that are reached from s in the original DFA. The equivalent minimal DFA to this defective automaton needs at most as many states as the original one, since the insertion of a rejecting sink state suffices. If the target state is q, all states may still be reachable, since there exist two transitions leading into q. This happens, if the transition leading from the last state of the ring r into the first state of the ring q is flipped. In this case, also a finite language is accepted by the defective automaton. The longest word of this language has at most n − 1 symbols, if the number of states of the original DFA is n. It is well known, that a minimal DFA accepting a (unary) finite language has a number Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 23 of states, that is given by the length of the longest word plus two. In this case, this gives the upper bound of n+1. To show the optimality of this bound, let A = 〈Q, {a}, δ, q1, {qn}〉 be the minimal DFA depicted in Figure 3.5, where Q = {q1, q2, . . . , qn} is the set of states and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, q1 if j = n is its state transition function. This DFA is complete. The following considerations provide the minimality of the chosen automaton. Every state qi ∈ {q1, q2, . . . , qn} is reachable from the initial state by the word ai−1. For indices x, y ∈ {1, 2, . . . , n − 1}, x < y, we have that δ(qx, a n−y) 6= qn but δ(qy, a n−y) = qn. Thus, all the states of the considered automaton are reachable and inequivalent. If the defect interchanges the transition from state qn to q1, we receive an unde- fined transition for state qn and an ambiguity for state q1. The accepted language is modified to the finite language {a, an−1}. It is well known that the minimal DFA accepting this language needs exactly n+ 1 states, n states to count up to the length n− 1 and one more state to reject all words longer than n− 1. This proves the optimality of the upper bound. For the defect flip transition in the unary case, the next theorem proves that it is possible for the minimal DFA that is equivalent to the defective automaton to have any number of states below the upper bound. This question is closely related to the one asking for magic numbers. This question was stated by Iwama, Kambayashi, and Takaki [31]. It asks whether there always exists a nondeterministic finite automaton with n states whose equivalent minimal DFA has α states for all integers n and α, with n ≤ α ≤ 2n. Iwama, Matsuura, and Paterson [32] called a number α magic, if there exists no NFA with n states having an equivalent DFA with α states. In [31] it was shown, that α = 2n − 2k or α = 2n − 2k − 1, for 0 ≤ k ≤ n/2− 2 is non-magic, and in [32] this was shown for α = 2n−k, with 5 ≤ k ≤ 2n−2, and some coprimality condition for k. These results were shown for binary NFA. The results for binary alphabets were improved in [34], [35], [20], and [49]. The question asking for magic numbers turned out to become easier, if the size of the alphabet is increased. In [34] it was even shown that for exponentially growing alphabets, no magic numbers exist at all. In [20], this result was improved to smaller growing alphabets. In [33] it was shown for four-letter alphabets that there exist no magic numbers, and in [37], this was even proven for ternary alphabets. The magic number problem for unary alphabets was studied in [21]. Further results on magic numbers can be found, for example, in [35], [36], and [25]. Theorem 3.1.4. Let n ≥ 4 be a number. Then a minimal DFA A exists with n states and unary alphabet, such that by the defect flip transition, the equivalent Measuring Defects in Finite Automata 24 3.1. Defects Occurring on the Transitions q1 q2 qi−2 qi−1 qi qi+1 qi+2 qn a a a a a a a a a a Figure 3.6: DFA for which the defect flip transition can result in any number of states between 2 and n− 1 for the equivalent minimal DFA. minimal DFA for the defective automaton needs precisely a number of i states, for all 1 ≤ i ≤ n+ 1. Proof. The existence of a minimal DFA with n + 1 states was already shown in Theorem 3.1.3. To prove the existence of a minimal DFA with a number of states i between 1 and n, that is equivalent to a defective automaton, we use the DFA A = 〈Q,Σ, δ, q1, F 〉 depicted in Figure 3.6, where Q = {q1, q2, . . . , qn} is the set of states, its set of accepting states is given by F = {qi−1, qi, qi+1}, and the transition function δ is defined by δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, q1 if j = n . This DFA is complete. It is also minimal, since each state qj , 1 ≤ j ≤ n, is reachable from the initial state q1 by the word aj−1. The accepting states can be differentiated by the words a and a2 since then we have δ(qi−1, a 2) = qi+1, δ(qi, a 2), δ(qi+1, a 2) 6∈ {qi−1, qi, qi+1}, and δ(qi, a) = qi+1, δ(qi+1, a 2) 6∈ {qi−1, qi, qi+1}. For the states with indices 1 ≤ j < l ≤ i − 2, we have δ(ql, a i−l−1) = qi−1, but δ(qj , a i−l−1) = qk, where k < i − 1. Thus, the states from the set {q1, q2, . . . , qi−2} are pairwise inequivalent. The same is true for the states with indices j and l where i + 1 ≤ j < l ≤ n, since δ(ql, a n+i−l−1) = qi−1 is accepting, but on the other hand we have δ(qj , a n+i−l−1) 6∈ {qi−1, qi, qi+1}, and, therefore, non-accepting. Any state qj ∈ {q1, q2, . . . , qi−2} is inequivalent to any state ql ∈ {qi+2, qi+3, . . . , qn} since δ(qj , a i−j−1) = qi−1, but δ(ql, a i−j−1) 6∈ {qi−1, qi, qi+1}. In the following, we distinguish the two cases 2 ≤ i ≤ n− 1, and i = 1 or i = n. To receive a number of states i between 2 and n−1 for the minimal DFA equivalent to the defective automaton, we assume that the considered defect affects the transi- tion from state qi−1 to qi of DFA A. Then all states of the set {qi, qi+1, . . . , qn} are not reachable anymore. The remaining accepted language is the language {ai−2}, which is finite and can be accepted by a minimal DFA with i states. For the case i = 1 or i = n, we modify the automaton from above. Let qn−2, qn−1, and qn be the accepting states in this case. If the transition from q1 to q2 is defective, Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 25 q1 q2 qn−2 qn−1 qn a a a a a Figure 3.7: An incomplete DFA whose equivalent minimal DFA needs exactly n+ 1 states. the resulting automaton accepts the empty language. Thus, one state is sufficient for an equivalent minimal DFA. This proves i = 1. In case the defect affects the transition from qn−1 to qn, the automaton accepts the finite language {an−3, an−2}, which can be accepted by a minimal DFA with n states. This concludes the proof. The three theorems from above considering the defect flip transition show that this defect can result in a very huge minimal DFA for arbitrary alphabets. For unary alphabets, this number is much smaller. This, again, proves the difference between unary alphabet sizes, and at least binary alphabets concerning the state complexity. In the proof of Theorem 3.1.4, we used the DFA depicted in Figure 3.6. This DFA accepts an infinite language. The defective automata constructed in the proof accept finite languages. This means that the defect can result in a loss of a lot of information about the accepted language. This is motivates the analysis of defects. The next theorem proves an upper bound for the number of states of an equivalent minimal DFA that is received by the defect delete transition affecting a given DFA. Theorem 3.1.5. Let A be a complete and minimal DFA with n ≥ 4 states and let the NFA A′ be received from A by the defect delete transition. Then the equivalent minimal DFA of A′ needs at most n + 1 states. This bound is tight even for unary alphabets. Proof. Even in the worst case, the deletion of a transition in a complete and minimal DFA cannot implement ambiguity in the automaton. It can only lead to undefined transitions. Thus, an equivalent minimal DFA may need at most one more state that represents a rejecting sink which will be the target of every undefined transition. Therefore, a minimal DFA needs at most a number of n+ 1 states. To prove the optimality of the bound for this defect affecting only a single tran- sition in a minimal DFA with unary alphabet, we will use the DFA depicted in Figure 3.5. We assume the set of accepting states to be {qn−2, qn−1, qn}. In the proof of Theorem 3.1.3 it was already shown, that this DFA is complete and mini- mal. Let this DFA be defected by deleting the transition δ(qn, a) = q1. We receive the incomplete DFA given in Figure 3.7. This automaton accepts the finite language {an−3, an−2, an−1}, which is known to be accepted by a minimal DFA with n + 1 states. Measuring Defects in Finite Automata 26 3.1. Defects Occurring on the Transitions q1 q2 qi−2 qi−1 qi qn−1 qn a a a a a a a a a Figure 3.8: DFA to prove the existence of minimal DFA equivalent to defective au- tomata for the defect delete transition with a number of states between 1 and n− 1. We now know that a minimal DFA equivalent to the defective automaton needs at most n+ 1 states when deleting one transition of a given DFA with n ≥ 2 states. In this context the question naturally arises if there exists an equivlant minimal DFA for any number of states between 1 and n+ 1 for the defect delete transition. The following theorem provides the answer. Theorem 3.1.6. Let n ≥ 3 be a number. Then there exist minimal DFA A with n states, such that by the defect delete transition, the equivalent minimal DFA for the defective automaton needs precisely a number of i states, for all 1 ≤ i ≤ n+ 1. This is true even for unary alphabets. Proof. In the proof of Theorem 3.1.5 it has already been shown that there exists a DFA with n ≥ 1 states that meets the upper bound of n + 1 states. For the remaining numbers of states, we first consider i ≤ n − 1. Let A = 〈Q, {a}, δ, q1, F 〉 be the DFA from Figure 3.8 with Q = {q1, q2, . . . , qn}, F = {qi−1, qn} where i−1 = 0 means that qn is the only accepting state of the automaton, and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, qn if j = n . This DFA is complete. The following considerations provide the minimality of automaton A. Every state qj ∈ {q1, q2, . . . , qn} is reachable by aj−1. For indices x, y ∈ {1, 2, . . . , n − 1} \ {i − 1}, x < y, either δ(qx, a n−y) 6= qi−1 or δ(qx, a n−y) = qi−1 but δ(qy, a n−y) = qn. In the first case, qx and qy are inequivalent. In the other case, qx and qy can be distinguished by reading an−ya since then we have δ(qx, a n−ya) = qi 6∈ F and still δ(qy, a n−ya) = qn. To show that states qi−1 and qn are inequivalent, we use the word a and obtain δ(qi−1, a) = qi, i < n and δ(qn, a) = qn. Thus, all the states are reachable and inequivalent. For 2 ≤ i ≤ n−1, we modify the DFA A by deleting the transition δ(qi−1, a) = qi. We receive the incomplete DFA depicted in Figure 3.9. Since this automaton accepts the finite language {ai−2}, an equivalent minimal DFA needs precisely i states. For i = 1, the transition connecting states q1 and q2 is deleted. The result- ing incomplete DFA accepts the empty language, which is accepted by a minimal DFA with only one state. Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 27 q1 q2 qi−2 qi−1 a a a a Figure 3.9: Incomplete DFA whose equivalent minimal DFA needs exactly i states. q1 q2 qn−2 qn−1 qn a a a a a a Figure 3.10: DFA that proves the existence of a minimal DFA with n states, that is equivalent to a defective automaton received for the defect delete transition. For a state complexity of n let A = 〈Q, {a}, δ, q1, F 〉 be the DFA depicted in Figure 3.10, where Q = {q1, q2, . . . , qn}, F = {qn−1} and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, qn if j = n . This DFA is complete. The following considerations provide the minimality of the chosen automaton. Every state qj ∈ {q1, q2, . . . , qn} is reachable by aj−1. For indices x, y ∈ {1, 2, . . . , n} \ {n− 1}, x < y, we have δ(qx, a n−1−y) 6= qn−1 but δ(qy, a n−1−y) = qn−1. Thus, all the states are reachable and inequivalent. We now modify the DFA A by deleting the transition δ(qn−1, a) = qn. We receive the incomplete DFA from Figure 3.11. Since this automaton accepts the finite language {an−2}, the equivalent minimal DFA has n states. Before we turn to the next defect, we prove that the minimal DFA for the NFA de- picted in Figure 3.12 needs exactly 2n − 1 states. We use this automaton to prove the optimality of Theorem 3.1.7. Lemma 3.1.1. The minimal DFA equivalent to the NFA depicted in Figure 3.12 has precisely 2n − 1 states for n ≥ 2. q1 q2 qn−2 qn−1 a a a a Figure 3.11: Incomplete DFA whose equivalent minimal DFA needs exactly n states. Measuring Defects in Finite Automata 28 3.1. Defects Occurring on the Transitions q1 q2 q3 qn−1 qn a, c a, b, c a, b, c a, b, c a, b, c a, b a, b, c Figure 3.12: NFA whose equivalent minimal DFA needs exactly 2n − 1 states for n ≥ 2. Proof. Let A = 〈Q, {a, b, c}, δ, q1, {qn}〉 be the NFA with n ≥ 2 states depicted in Figure 3.12 with state set Q = {q1, q2, . . . , qn}. The state transition function δ can be determined from the picture. At first we consider the reachability of the states of 2Q. The singletons {qi} for all qi ∈ Q are reachable by reading the words ci−1 from the initial state {q1}. The sets {q1, qx}, where x ∈ {2, 3, . . . , n} can be reached by the words abx−2. By induc- tion, the sets {q1, qx1 , qx2 , . . . , qxk}, where k ≥ 2, x1, x2, . . . , xk ∈ {2, 3, . . . , n}, and x1 < x2 < · · · < xk, are reachable from the set {q1, qx2−x1+1, qx3−x1+1, . . . , qxk−x1+1} of size k − 2 by reading the word abx1−2. Therefore, all subsets of Q containing state q1 are reachable. The sets {qx1 , qx2 , . . . , qxk}, where k ≥ 2, and where x1, x2, . . . , xk ∈ {2, 3, . . . , n} are indices with x1 < x2 < · · · < xk, can also be shown to be reachable. Since all states containing q1 are reachable we can also reach the set {qx1 , qx2 , . . . , qxk} by reading the word cx1−1 from the set {q1, qx2−(x1−1), qx3−(x1−1), . . . , qxk−(x1−1)}. Thus, every non-empty set can be reached from the initial state {q1}. The only non-reachable set from 2Q is the empty set. This cannot be reached by any word read from the initial state {q1} since there exists no undefined transition in the given NFA. In the following we show that these 2n − 1 states are inequivalent. Let x and y denote two arbitrary indices of states of A belonging to {1, 2, . . . , n − 1}, where x < y. In the DFA built by the power set construction the sets {qx} and {qy} can be distinguished by the word an−y. Therefore, also all the sets S, T ⊆ {q1, q2, . . . , qn−1}, where S 6= T , and S 6= ∅, T 6= ∅, can be distinguished. This leaves the inequivalence of the accepting states containing qn. Let S and T be the sets selected as above. The first case we consider is qn−1 6∈ S. Then we have δ(S ∪ {qn}, a) = S′ ⊆ {q1, q2, . . . , qn−1} and δ(T ∪ {qn}, a) = T ′ where S′, T ′ are not empty. If qn−1 ∈ T then T ′ contains qn and the sets S ∪ {qn} and T ∪ {qn} are distinguishable. In case that qn−1 6∈ T , T ′ is a non-empty subset of {q1, q2, . . . , qn−1} and we already know that S′ and T ′ can be distinguished and, thus, also S ∪ {qn} and T ∪ {qn}. Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 29 If qn−1 ∈ S we have that δ(S ∪{qn}, a) = S′ ∪{qn}, where S′ ⊂ {q1, q2, . . . , qn−1}, and δ(T ∪ {qn}, a) = T ′. If qn−1 6∈ T , the considered sets are inequivalent. In case that qn−1 ∈ T , we choose j = max{i | qi ∈ S \T ∪T \S}. Without loss of generality, let qj ∈ S. We then have δ(S ∪{qn}, an−j) = S′∪{qn} where S′ ⊆ {q1, q2, . . . , qn−1} and δ(T ∪ {qn}, an−j) = T ′ ⊆ {q1, q2, . . . , qn−1}. This proves all sets S ∪ {qn} and T ∪ {qn} to be inequivalent. Summarising, we have 2n − 1 inequivalent and reachable states in the minimal DFA that accepts the same language as the NFA given in Figure 3.12. With the help of Lemma 3.1.1 we can show the following theorem. Theorem 3.1.7. Let A be a minimal DFA with n ≥ 2 states and let the NFA A′ be received by the defect insert transition. Then the minimal DFA equivalent to A′ needs at most 2n − 1 states. This bound is tight for at least ternary alphabets. Proof. In the worst case the defect insert transition implements ambiguity in the automaton but does not lead to undefined transitions since the DFA affected by the defect is complete. Thus, an equivalent minimal DFA may need all states generated by the power set construction except the empty set which can only be reached in the DFA if there exists an undefined transition in the NFA. This means a complete and minimal DFA needs at most 2n − 1 states. We now show that this bound can be reached by inserting a single transition in a minimal DFA with ternary alphabet. Let A = 〈Q, {a, b, c}, δ, q1, {qn}〉 be the DFA given in Figure 3.13, where Q = {q1, q2, . . . , qn} and δ(qj , a) = { q1 if j = 1 or j = n, qj+1 if j ∈ {2, 3, . . . , n− 1} , δ(qj , b) = { q1 if j = 1 or j = n, qj+1 if j ∈ {2, 3, . . . , n− 1} , and δ(qj , c) = { qj+1 if j ∈ {1, 2, . . . n− 1}, q1 if j = n . This DFA is complete. It is also minimal since every state qi ∈ {q2, q3, . . . , qn} can be reached from the initial state q1 by reading can−i−1 and state q1 is reached by reading symbol a. The states are also inequivalent, since for states qk, ql ∈ {q1, q2, . . . , qn−1} with k < l, we have that δ(qk, a n−l) 6= qn but δ(ql, a n−l) = qn. We modify this DFA by inserting the transition δ(q1, a) = q2. We receive the NFA depicted in Figure 3.12. For this automaton we in Lemma 3.1.1 it was already shown that the equivalent minimal DFA needs precisely 2n − 1 states. Since the upper bound stated in the former theorem needs a ternary alphabet, the following theorem gives the upper bound for unary alphabet. For binary alphabets, we were not able to prove a tight upper bound, yet. In this thesis, this is left over for future work. Measuring Defects in Finite Automata 30 3.1. Defects Occurring on the Transitions q1 q2 q3 qn−1 qn c a, b, c a, b, c a, b, c a, b, c a, b a, b, c Figure 3.13: DFA for which the equivalent minimal DFA of the defective automaton with inserted transition from q1 to q2 on a needs 2n − 1 states. q1 q2 qk−1 qk qk+1 qk+2 qn−1qn a a a a a a a a a a Figure 3.14: General structure of a DFA that accepts a unary regular language. The acceptance property is omitted. Theorem 3.1.8. Let A be a complete and minimal DFA with n ≥ 2 states and unary alphabet, and let A′ be the NFA that results from the defect insert transition. Then the equivalent minimal DFA for A′ has at most n2−2n+ 2 states. This bound is tight. Proof. It is well known since Chrobak [16, 17], that minimal DFA for unary alphabets consist of a chain of states followed by a ring of states. There exists only one state belonging to both of these parts. We want to call this state the connecting state, and denote it by qk, like in the DFA depicted in Figure 3.14. q1 q2 qn−1 qn a a a a a Figure 3.15: DFA for which the defect insert transition inserting a transition on a from q2 to qn results in a minimal DFA with n2 − 2n+ 2 states. Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 31 The insertion of one additional transition to one of the states causes an ambiguity for precisely this one state. Let qa denote this state. The defective automaton consists of as many states as the original DFA since the defect does not lead to a loss of any states. Therefore, the number of states for the DFA equivalent to the defective automaton is denoted in terms of the number of states of the original DFA. In case qa is not reachable from all of its successors, the ambiguity can be used only once on every path leading through the automaton. For an equivalent minimal DFA constructed by the power set construction this means that the predecessors of qa are represented by singletons and the successors of this state are represented by sets consisting of at most two states of the original DFA. The restriction to at most two states within such a set is due to that the ambiguity leads into at most two different states, from which state qa is not reachable anymore. Let A denote a unary DFA like depicted in Figure 3.14. The insertion of a transition for any state in {q1, q2, . . . , qk−1} to a state with a higher index, leads to the restriction that the ambiguity can only be used once. For example, this is the case when inserting a transition from q2 to q4 in A. In the power set automaton for this defective automaton, only the singletons {q1} and {q2} are reachable, and the pairs of states of the form {qi, qi+1}, where 3 ≤ i ≤ n−1, and {qn, qk}. Once entering state {qn, qk} in the power set automaton, only the pairs {qi, qi+1} for k ≤ i ≤ n−1, and {qn, qk} are reachable. For such an inserted transition, an upper bound for the number of states for the power set automaton is given by 2n. This includes the possibility to reach all singletons and all pairs of two consecutive states of the DFA. For pairs of non- consecutive states, the number may only decrease. The inserted ambiguity may also be used more than only once on some paths in the automaton. In this case, the state qa needs to be reachable from at least one of its successors. This means, there exists at least one path that starts and ends with state qa. Inserting another transition on a in the DFA depicted in Figure 3.14 for state q4 leading into state q2, a path from q4 to itself is inserted. The shortes such path is given by q4q2q3q4. In this example, only the singletons {q1}, {q2}, {q3}, and {q4} are reachable in the power set automaton for k ≥ 5. This is because all states in the power set automaton that are reachable from {q4} are represented by sets of size at least two. These sets are {q2, q5}, {q3, q6}, and {q4, q7}. The reachable triples are {q2, q5, q8}, {q3, q6, q9}, and {q4, q7, q10}, if n = 10. The indices differ for different numbers n, and also the number of reachable sets of a fixed size. From this, we can deduce that the more states, also the more sets of states are existent and reachable of a fixed size between 2 and n − 1. But since n is fixed to some number, the number of reachable sets is maximisedby minimising the length of the path connecting qa to itself. Then the maximal number of sets of states of a fixed size is reachable in the power set automaton. This means, the insertion of δ(q4, a) = q3 would fulfill these requirements. If the additional transition is inserted for a state within the cycle qkqk+1 . . . qnqk, then there exist two different cycles that begin and end with state qa. If, for example, Measuring Defects in Finite Automata 32 3.1. Defects Occurring on the Transitions the additional transition is inserted for state qk+1 leading into state qk+4, there exist the cycles qk+1qk+2 . . . qnqkqk+1, and qk+1qk+4qk+5 . . . qnqkqk+1 for state qk+1. For the states within the power set automaton this means that all singletons up to {qk+1} are reachable. Also all pairs of the form {qi, qi+2} for k + 2 ≤ i ≤ n − 2, and the pairs {qn−1, qk}, {qn, qk+1} are reachable. These are n − k − 2 many pairs. The reachable sets of three states are {qi, qi+2, qi+4} for k ≤ i ≤ n − 4, and the triples {qn−3, qn−1, qk}, and {qn−2, qn, qk+1}. These are also n− k − 2 sets. From this, we can deduce that there exist n − k − 2 numbers of reachable state sets for each size between 2 and n − 1. To maximise this number, k needs to be minimised. This is the case, if k is equal to one. Since most of the reachable sets consist of states with indices that have the same fixed distance greater than or equal to one, the number of such sets is maximised, if this distance is fixed to one. Summarising all the maximising conditions, this means the original DFA consists only of a ring of n states and the inserted transition inserts a second cycle from qa to itself of length n − 1. All n singletons also need to be reachable, wherefore the state qa needs to be the state qn. In the power set automaton for such a defective automaton, the rejecting sink state cannot be reached, since there do not exist any undefined transitions. Combining the results from above, there exist precisely n − 1 reachable sets of states in the power set automaton of the defective automaton for each size i between 2 and n−1. All in all this gives the following sum for the number of reachable state sets: n+ n−1∑ i=2 n− 1 = n2 − 2n+ 2. Since it is possible that these state sets are all inequivalent, this gives an upper bound for the number of states of a complete and minimal DFA accepting the language of the defective automaton. The upper bound is met when the defect insert transition affects the DFA depicted in Figure 3.15. Let A = 〈Q, {a}, δ, q1, {qn}〉 denote this DFA, where its set of states is Q = {q1, q2, . . . , qn}, and its state transition function is given by δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, q1 if j = n . This DFA is complete and each state qi ∈ Q is reachable from the initial state q1 by the word ai−1, where 1 ≤ i ≤ n. The states are all inequivalent since for two states qi, qj ∈ Q, 1 ≤ i < j ≤ n − 1, the word an−j distinguishes the two states. Thus, the depicted DFA is minimal. If the defect adds the transition δ(qn, a) = q2 to this DFA the result is an NFA con- sisting of two cycles, one of length n and one of length n− 1. The states of the equivalent DFA resulting from the power set construction have a special structure due to the structure of A. In this special case, for the sets consisting of more than one state, the states are consecutive. That means if the state with the Measuring Defects in Finite Automata 3.1. Defects Occurring on the Transitions 33 smallest index within the set is qi, where 1 ≤ i ≤ n − 1, then the rest of the states of the set are qi+1, qi+2, . . . , qi+l. This also includes the accepting state consisting of all the states of the NFA. The indices are to be considered modulo n plus one. The only state containing state q1 and not q2 is the singleton {q1}. All of the other state sets contain either both, only q2 or none of these two states. The accepting states of the power set automaton are the ones containing state qn, the other state sets are non-accepting. For this DFA we have that all of these sets are reachable by a certain number of a’s to be read. Not all of the reachable states are inequivalent. The two states represented by the sets {q2, q3, . . . , qn} and {q1, q2, . . . , qn} are equivalent, since both states are accepting and by any number of a’s read from each of the states the resulting state is {q1, q2, . . . , qn}. Thus, they are equivalent. To prove the inequality of the other states, let X and Y be two arbitrary subsets of {q1, q2, . . . , qn} representing a state of the power set automaton, where X 6= Y , qn belongs to both of the sets, and none of the sets X and Y is equal to {q2, q3, . . . , qn}. Let qx denote the state of the symmetric difference X4Y , where we define the number x = max{m | qm ∈ X4Y } to be the biggest index of all states belonging to only one of the sets X and Y . This especially means 1 ≤ x < n. Without loss of generality, we assume qx ∈ X. Then the word an−x shows the inequality of the sets X and Y . If this word is processed from set X, the power set automaton is driven into a state represented by a set containing state qn, and if it is processed from state set Y into a state set without qn. This proves the inequality of all accepting states of the power set DFA. The only exception are the two accepting state sets {q2, q3, . . . , qn} and {q1, q2, . . . , qn}, which were already proven to be equivalent. This leaves the proof of the inequality of the reachable non-accepting states of the power set automaton. For this, let X,Y ⊆ {q1, q2, . . . , qn−1} now be two ar- bitrary, non-accepting state sets, where X 6= Y . These sets consist of consecutive states since they do not contain state qn. Let X = {qi, qi+1, . . . , qi+k} and let Y = {qj , qj+1, . . . , qj+l}, where 1 ≤ i, j ≤ n − 2, k, l ≥ 0, and qi+k 6= qn, qj+l 6= qn. Since these two sets are not equal and consist of consecutive states, their intersec- tion X ∩ Y is either empty or contains states, that belong to both of the sets. In the first case, there exists a state qx in one of the sets X or Y , for which x = max{m | qm ∈ X ∪ Y } is the biggest index of all states belonging to either X or Y . Without loss of generality, we can assume qx ∈ X. Then the word an−x proves the inequality of the sets X and Y . If this word is read from set X the automaton is driven into a state containing state qn, and read from state set Y into a state without qn. In case that X ∩ Y is not empty, there exists a state qx in this intersection, where x = max{m | qm ∈ X ∩ Y } is the biggest index belonging to both sets. Processing the word an−x from X and Y , the power set automaton is driven into two different accepting states, since X 6= Y . These accepting states cannot be the two equivalent accepting sets {q2, q3, . . . , qn} and {q1, q2, . . . , qn}, since the latter one is only reachable from the former one. The inequivalence of the accepting states also proves the inequivalence of X and Y . Measuring Defects in Finite Automata 34 3.2. Defects Occurring on the States The total number of states for a minimal DFA accepting the same language as the defective automaton constructed above, is given by summing up the numbers of all the reachable state sets of a fixed size between 1 and n that were proven to be inequivalent. This minimal DFA has two state less than the DFA received from the power set construction, since {q2, q3, . . . , qn} and {q1, q2, . . . , qn} are the only equivalent states, and the empty set does not represent one of the reachable states. The rest of the unreachable states were already omitted. This DFA consists of precisely n different singletons. For each size 2 ≤ i ≤ n− 1, the minimal DFA contains a state represented by a state set of exactly size i. For each of these sizes there exist exactly n − 1 such sets. The set {q2, q3, . . . , qn} is equivalent to the last state also belonging to the power set automaton, which is represented by the state set Q. The following sum then gives the number of all states of the minimal DFA for the considered defective automaton: n+ 1 + (( n−1∑ i=2 n− 1 ) − 1 ) = n+ (n− 2)(n− 1) = n2 − 2n+ 2. This concludes the proof. 3.2 Defects Occurring on the States This section deals with defects occurring on the states of deterministic finite au- tomata. The following theorem covers the defect of deletions of accepting states of a given DFA and of all transitions leading into these states and the ones leaving these states. Theorem 3.2.1. Let A be a minimal DFA with n ≥ 2 states and let the NFA A′ be received by the defect delete accepting state affecting A. Then the equivalent minimal DFA for A′ needs at most n states. This bound is tight even for unary alphabets. Proof. For this defect the upper bound is n since the deletion of an accepting state and all its incoming and outgoing transitions only leads to undefined transitions. It does not insert any ambiguity to the automaton. Thus, to receive an equivalent minimal DFA it is sufficient to add a rejecting sink state if it does not already exist. We show that the automaton given in Figure 3.16 meets the upper bound when deleting the accepting state n. Let A = 〈Q, {a}, δ, q1, {qn−1, qn}〉 denote the depicted DFA, where Q = {q1, q2, . . . , qn} is the set of states, and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, q1 if j = n defines its state transition function. Deleting the accepting state qn and all transitions leading into or going out of this state results in an NFA that accepts the finite language {an−2}. The complete and minimal DFA accepting this language needs n states. Measuring Defects in Finite Automata 3.2. Defects Occurring on the States 35 q1 q2 q3 qn−1 qn a a a a a a Figure 3.16: DFA for the proof of the optimality of the upper bound for the defect delete accepting state. The theorem below deals with the question for magic numbers for the defect delet accepting state. Theorem 3.2.2. Let n ≥ 2 be a number. Then there exist minimal DFA A with n states, such that by the defect delete accepting state, the equivalent DFA for the defective automaton needs precisely a number of i states, for all 1 ≤ i ≤ n. This is true even for unary alphabets. Proof. For i = n we have already shown in the proof of Theorem 3.2.1 that there exists a DFA that meets the upper bound of n states. Let 1 ≤ i ≤ n − 1 and A = 〈Q, {a}, δ, q1, F 〉 be the DFA depicted in Figure 3.8 with state set Q = {q1, q2, . . . , qn}, set of accepting states F = {qi−1, qn} where i− 1 = 0 means that qn is the only accepting state of this automaton, and δ(qj , a) = { qj+1 if j ∈ {1, 2, . . . , n− 1}, qn if j = n . This DFA is complete and minimal, which was alredy used in the proof of Theo- rem 3.1.6. If the considered defect affecting DFA A deletes the accepting state qn and all transitions having qn as source or target, we receive an incomplete DFA that accepts the finite language {ai−2}. For this language it is well known that the accepting minimal DFA consists of precisely i states. The next possible defect on states we want to consider is the defect delete non- accepting state. In the following theorem an upper bound for the number of states of an equivalent minimal DFA for the defective automaton is presented. Theorem 3.2.3. Let A be a minimal DFA with n ≥ 2 states and let A′ denote the NFA that is received by the defect delete non-accepting state affecting A. Then the equivalent minimal DFA of A′ has at most n states. This bound is tight even for unary alphabet. Proof. Deletion of one non-accepting state and all transitions leading into and leav- ing this state in a minimal DFA with n ≥ 2 states does not lead to ambiguity, only to undefined transitions. Thus, this defect results in an incomplete DFA. By adding Measuring Defects in Finite Automata 36 3.2. Defects Occurring on the States a rejecting sink state if it does not already exist, the upper bound for the number of states of an equivalent minimal DFA is given by n. For the proof of tightnes