Continuing from where we left off last time, with the definition of GNFAs, we needed to show that we can take a GNFA with our peculiar restrictions and turn it into a RegExp. Again, we follow Sipser extremely closely. In part, because all of this is tedious enough I didn’t feel like trying to be original in my presentation. We start off by taking our DFA and turning it into a GNFA as follows:

- Add a new start state with an transition from it to the old start state
- Add a new accept state with an transition
*to*it*from*each of the old accept states - Where there are multiple transitions between states of the DFA, we combine them using into a regular expression that matches the “or” of the individual transitions.
- Whenever there are no transitions where the requirements of our GNFA force there to be one, add a transition for

Alright, from here hopefully it’s obvious that and recognize the same language given all this graph-surgery. From here, though, we need to progressively construct a GNFA that keeps recognizing the same language until we get one that can obviously be interpreted as a RegExp. What does that mean, you might be wondering? Well the basic plan is that we’ll keep simplifying the structure of the GNFA until there are only two states: the start and the accept state, and there will be one transition between them which is labeled with *the* regular expression that matches the language decided by our original .

We describe the iterative process as follows:

- if there are only two states, then we return the RegExp that labels the solitary transition in the graph
- if there are more than two states, we arbitrarily choose one of them that isn’t the accept or start state and “rip” it out. We’ll call this state, again following Sipser, . Now, we “repair” the GNFA by, for all states and which are not the accept or start states respectively, we make the new transition from to be where , , , and is the original transition between and . So what does this mean in words? It means that we are taking into account that there are two ways, now, that we can use to get from to : the original path or the path that went through .Since our process removes a state every time, we know that this recursion is well-founded and that we’ll eventually terminate. Each step in the algorithm keeps the same meaning in terms of how the regular expression can expand, so the final regular expression returned will correspond to the original NFA.It’s a bit of a goofy construction, I know, but there’s something to be said for going through it in detail so that we have reason to believe that
*the*regular languages match up exactly with*the*regular expressions.

Now that we have all these different examples of how to define the regular languages, let’s talk about what languages *aren’t* regular. Awhile back, we asked if we could define a DFA for the language . Of course, we couldn’t actually do this but the absence of evidence isn’t evidence of absence. We wanted to *prove* that we couldn’t ever build a DFA or NFA for this language.

In order to do that, however, we need a tool called the pumping lemma for regular languages. The pumping lemma states that

- For any regular language , there exists a constant that we’ll call the pumping constant.
- For all strings such that , then
*there exists*strings ,, and such that*and**and**and*such that for all numbers then is in .Now what does the pumping lemma actually mean? It tells us that for every regular language there must exist*some*size such that all strings of size or larger must have some kind of “loop” that can be repeated an arbitrary many times. We can use this to prove that a language isn’t regular, by showing that the pumping lemma does*not*hold. If the pumping lemma doesn’t hold for a language, and yet the pumping lemma holds for all regular languages, then the language cannot be regular.We need to*prove*this lemma in order to actually use it that way, though. We start by noting that since we want to prove this lemma about regular languages, that means we’re proving it about languages that can be represented as DFAs. So now we assume that is a regular language. thus has some DFA that decides it. , being a DFA, has a finite number of states . We will now prove the pumping lemma with as the pumping length.This argument, essentially, proceeds based off of the “pigeonhole principle”. Assume we have a string , accepted by , of length greater than . Then we know that, since this is a DFA, there must exist a length sequence of states that the DFA passes through. Now, since there are more states in this sequence than there are states in the DFA. This means that, by the pigeonhole principle, that some of these states must be repeated. Since the sequence of states follows transitions, this means that there must be*some cycle*in the graph. If there’s a cyle in the graph, then we should be able to repeat that cycle as many times as we want. This cycle corresponds to in the pumping lemma and the chunk of the string before the start of the cycle is and the piece of the string after the cycle is done is . Now, let’s check and make sure that we actually are satisfying the pumping lemma:For every string with a length greater than , we know that a cycle occurs in the first characters because in characters we must pass through states, which means that we hit our cycle. As describe above, the part before the cycle, if it exists, will be our and then the cycle will be . Everything after the cycle will be . We have that , that , and thus we can repeat the cycle so that for all .

Neat!

Now we come back to how we should *use* the pumping lemma. Let’s consider the following example that we’ve done in class before: . So the pumping lemma says that *for all* strings, then *there exists* a way to break them up into , such that *for all* . Now, in order to prove a language *isn’t* regular, we start by assuming the language *is* regular and then show that it fails to obey the pumping lemma as follows

- we assume that the pumping length is
*we*pick a string such that- in order to show that there exists
*no*way to break the string into such that is always in the language then we have to consider*all*possible ways can be broken into such that and and then show that no matter how the string is broken up we can pick an such that is*not*in

for this particular example let’s pick

- then the way we break up this string
*must*be , , such that and . No matter what exactly are then we have that which is*not*in the languageWe’ll leave this here for now and continue next time with expanding the languages we can cover to a larger set: the context free languages