# Lecture: Finishing NFAs to Regexps, Pumping Lemma and Proving Languages Non-Regular

Continuing from where we left off last time, with the definition of GNFAs, we needed to show that we can take a GNFA with our peculiar restrictions and turn it into a RegExp. Again, we follow Sipser extremely closely. In part, because all of this is tedious enough I didn’t feel like trying to be original in my presentation. We start off by taking our DFA $M$ and turning it into a GNFA $N$ as follows:

• Add a new start state with an $\epsilon$ transition from it to the old start state
• Add a new accept state with an $\epsilon$ transition to it from each of the old accept states
• Where there are multiple transitions between states of the DFA, we combine them using $\cup$ into a regular expression that matches the “or” of the individual transitions.
• Whenever there are no transitions where the requirements of our GNFA force there to be one, add a transition for $\emptyset$

Alright, from here hopefully it’s obvious that $M$ and $N$ recognize the same language given all this graph-surgery. From here, though, we need to progressively construct a GNFA that keeps recognizing the same language until we get one that can obviously be interpreted as a RegExp. What does that mean, you might be wondering? Well the basic plan is that we’ll keep simplifying the structure of the GNFA until there are only two states: the start and the accept state, and there will be one transition between them which is labeled with the regular expression that matches the language decided by our original $M$.

We describe the iterative process as follows:

• if there are only two states, then we return the RegExp that labels the solitary transition in the graph
• if there are more than two states, we arbitrarily choose one of them that isn’t the accept or start state and “rip” it out. We’ll call this state, again following Sipser, $q_{rip}$. Now, we “repair” the GNFA by, for all states $q_i$ and $q_j$ which are not the accept or start states respectively, we make the new transition from $q_i$ to $q_j$ be $(R_1 R_2^* R_3) \cup R_4$ where $R_1 = \delta(q_i,q_{rip})$ , $R_2 = \delta(q_{rip},q_{rip})$, $R_3 = \delta(q_{rip},q_j)$, and $R_4$ is the original transition between $q_i$ and $q_j$. So what does this mean in words? It means that we are taking into account that there are two ways, now, that we can use to get from $q_i$ to $q_j$: the original path or the path that went through $q_{rip}$.Since our process removes a state every time, we know that this recursion is well-founded and that we’ll eventually terminate. Each step in the algorithm keeps the same meaning in terms of how the regular expression can expand, so the final regular expression returned will correspond to the original NFA.It’s a bit of a goofy construction, I know, but there’s something to be said for going through it in detail so that we have reason to believe that the regular languages match up exactly with the regular expressions.

Now that we have all these different examples of how to define the regular languages, let’s talk about what languages aren’t regular. Awhile back, we asked if we could define a DFA for the language $\{0^n1^n | n \geq 0\}$. Of course, we couldn’t actually do this but the absence of evidence isn’t evidence of absence. We wanted to prove that we couldn’t ever build a DFA or NFA for this language.

In order to do that, however, we need a tool called the pumping lemma for regular languages. The pumping lemma states that

• For any regular language $L$, there exists a constant $p$ that we’ll call the pumping constant.
• For all strings $w$ such that $|w| \geq p$, then there exists strings $x$,$y$, and $z$ such that $w = xyz$ and $|xy| < p$ and $|y| \geq 0$ and such that for all numbers $i \geq 0$ then $xy^iz$ is in $L$.Now what does the pumping lemma actually mean? It tells us that for every regular language there must exist some size $p$ such that all strings of size $p$ or larger must have some kind of “loop” that can be repeated an arbitrary many times. We can use this to prove that a language isn’t regular, by showing that the pumping lemma does not hold. If the pumping lemma doesn’t hold for a language, and yet the pumping lemma holds for all regular languages, then the language cannot be regular.We need to prove this lemma in order to actually use it that way, though. We start by noting that since we want to prove this lemma about regular languages, that means we’re proving it about languages that can be represented as DFAs. So now we assume that $L$ is a regular language. $L$ thus has some DFA $M$ that decides it. $M$, being a DFA, has a finite number of states $n$. We will now prove the pumping lemma with $n$ as the pumping length.This argument, essentially, proceeds based off of the “pigeonhole principle”. Assume we have a string $w$, accepted by $M$, of length $l$ greater than $n$. Then we know that, since this is a DFA, there must exist a length $l$ sequence of states $q_1 \ldots q_l$ that the DFA passes through. Now, since there are more states in this sequence than there are states in the DFA. This means that, by the pigeonhole principle, that some of these states must be repeated. Since the sequence of states follows transitions, this means that there must be some cycle in the graph. If there’s a cyle in the graph, then we should be able to repeat that cycle as many times as we want. This cycle corresponds to $y$ in the pumping lemma and the chunk of the string before the start of the cycle is $x$ and the piece of the string after the cycle is done is $z$. Now, let’s check and make sure that we actually are satisfying the pumping lemma:

For every string with a length greater than $n$, we know that a cycle occurs in the first $n$ characters because in $n$ characters we must pass through $n+1$ states, which means that we hit our cycle. As describe above, the part before the cycle, if it exists, will be our $x$ and then the cycle will be $y$. Everything after the cycle will be $z$. We have that $|xy| \leq p$, that $|y| > 0$, and thus we can repeat the cycle so that for all $i \geq 0, xy^iz \in L$.

Neat!

Now we come back to how we should use the pumping lemma. Let’s consider the following example that we’ve done in class before: $\{0^n1^n | n \geq 0\}$. So the pumping lemma says that for all strings, then there exists a way to break them up into $xyz$, such that for all $i$ $xy^iz \in L$. Now, in order to prove a language isn’t regular, we start by assuming the language is regular and then show that it fails to obey the pumping lemma as follows

• we assume that the pumping length is $p$
• we pick a string $s$ such that $|s| > p$
• in order to show that there exists no way to break the string into $xyz$ such that $xy^iz$ is always in the language then we have to consider all possible ways $s$ can be broken into $xyz$ such that $|xy| \le p$ and $|y| > 0$ and then show that no matter how the string is broken up we can pick an $i$ such that $xy^iz$ is not in $L$

for this particular example let’s pick

• $s = 0^p1^p$
• then the way we break up this string must be $x=0^l$, $y=0^m$, $z=0^n1^p$ such that $m > 0$ and $l + m + n = p$. No matter what exactly $l,m,n$ are then we have that $xy^0z = 0^{l+n}1^p$ which is not in the languageWe’ll leave this here for now and continue next time with expanding the languages we can cover to a larger set: the context free languages