Simplifying regular expressions : A quantitative perspective
In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an ... exact bound for the relation between the two most common size measures for regular expressions, namely alphabetic width and reverse polish notation length. Then we proceed to show that every regular expression of alphabetic with n can be converted into a nondeterministic finite automaton with e-transitions of size at most 42 5n + 1, and that this bound is optimal. This provides an exact resolution of a research problem posed by Ilie and Yu, who had obtained lower and upper bounds of 4n -1 and 9n - 1 2, respectively [L. Ilie, S. Yu: Follow automata. Inform. Comput. 186, 2003]. For reverse polish notation length as input size measure, an optimal bound was recently determined [S. Gulan, H. Fernau: An optimal construction of finite automata from regular expressions. In: Proc. FST & TCS, 2008]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.