# Lecture: NFAs and Proving Equivalence of DFAs and NFAs

Meta-Commentary: So I’ve taken off the numbers from these posts because as I’m going I’m realizing that

1. I have no idea how long these are going to actually take once I flesh out my examples
2. I NEED MORE EXAMPLES, very baby-steps “let’s trace through how this works” examples
3. I’m not allowing for questions or confusion, and I really want to encourage questions in each lecture.

So, yeah, we’ll have to have more examples when I actually do the NFA and DFA lectures. For now, though, I’m going to put up what I have on this post, which ended up being the longest yet, and just keep posting two of these a week and hoping that I’m staying ahead of my real lectures. Those real lectures start next week, by the way, which is a little scary.

End meta-commentary

Given where we ended last time, we wanted something like a DFA but where an informal description such as “try $M$ or $N$ and if one of them works, accept” might make sense as an implementation of the union for regular languages. Now, what we really want here is the ability to make a non-deterministic choice of which branch we take: $M$ or $N$. We can think of a non-deterministic choice as essentially meaning that we are trying all possible moves simultaneously, and if one of them leads to an accept state then the entire process accepts.

Now, if you look at the picture we want here there’s something that might seem a little odd: we don’t actually want to consume input as we make this branching move to try running either $M$ or $N$. This implies that we might want some kind of new move that allows us to move to a state without consuming input. If we combine both of these ideas, non-deterministic choice and transitions that do not consume input we get the definition of a non-deterministic finite automata (NFA).

More formally, we can say that an NFA is a tuple the same as a DFA except that the type of the transition function $\delta$ is different. Instead of $\delta : Q \times \Sigma \to Q$ we have $\delta : Q \times \Sigma_{\epsilon} \to P(Q)$ where $P$ is the powerset operator and for any set $A$ then $A_{\epsilon}$ is the set $A \cup \{\epsilon\}$ where $\epsilon$ is the symbol that corresponds to consuming no input. Now there’s a few things we can note here. First, that because the empty set is an element of the powerset that we’re allowed to have “empty” transitions such as $\delta(q,a) = \varnothing$ which means that in the state $q$ the NFA transitions to no states on the character $a$. This is in sharp contrast to DFAs where there needed to be exactly one transition defined per letter of the alphabet. This allows to, for example, define the NFA for the language that only contains the empty string with only a single state rather than two as follows:

We should also note that we need to change the formal definition of what it means for a string $w$ to be accepted by a NFA $N$. Recall that previously our definition of computation for a DFA was

“A string $w = w_0 \ldots w_{n-1}$ \$ of length $n$ is accepted by a DFA when there is a sequence of states $r_0 \ldots r_n$ such that

• $r_0 = q_0$
• $r_i = \delta (w_{i-1},r_{i-1})$
• $r_n \in F$

Now, looking at the type of our transition function we can see that since $\delta$ returns a set of states, not a single state, then we need to change the second condition to be $r_i \in \delta (w_{i-1},r_{i-1})$. This isn’t quite right though, as you might have already guessed. We still need to include the $\epsilon$ transitions as well! Now, I’ll follow Sipser’s definition even though I think it’s not as clear as it could be. First off, we define concatenation of $\epsilon$ with other characters as

• $\epsilon w = w$
• $w \epsilon = w$

or in words, that $\epsilon$ is the unit of concatenation of characters. Then we say instead of $w = w_0 \ldots w_{n-1}$ where $n$ is the length of the string and each $w_i$ is a character in $\Sigma$, we instead let $w = y_0 \ldots y_n$ where $n$ is no longer connected to the length of the string and each $y_i$ is an element of $\Sigma_{\epsilon}$. Of course, since we’ve modified our notion of acceptance of a string let’s think for a moment and make sure that it’s still sensible under our definition of computable. We still have “finite rules” and “finite data”, but does it still necessarily take finite time if we’re allowing this non-determinism? Consider that one can simulate non-determinism with backtracking. We try, sequentially, each possible path for processing the input string. This might end up taking much longer based on the possible branching, but since each individual path is finite and the finite number of states means the number of paths is finite, then the sum of all the time needed to try each path is finite. Therefore, NFAs still fit our informal definition of “computable”.

All this being well defined, we can perform the union in a very simple way:

which is exactly what we were hoping for in the beginning.

So before we go further into defining regular operations and showing that the regular languages are closed under them, there’s a bit of a problem: we have to show that the NFAs decide the same set of languages as the DFAs, i.e. that they really are the regular languages.

How would one prove such a thing? Well, what we can do is show that for any DFA $M$ that decides the language $L$, then there exists an NFA $N$ which also decides $L$. This would prove that the regular languages are a subset of the languages decided by NFAs. The other direction is showing that for an NFA $N$ that decides $L$, then we can construct a DFA $M$ that also decides $L$. This would prove that the languages decided by the NFAs are a subset of the regular languages. Reminding ourselves that when two sets are subsets of each other, then they are equal, this means that if we can do both of these constructions we will have shown that the languages decided by NFAs are exactly the regular languages. This is an example of proof by construction, as we discussed in the very first lecture.

Please note that I’m trying to be careful and say that the set of languages decided by DFAs and NFAs are the same. We are not directly comparing NFAs and DFAs or saying that the “set of NFAs” and the “set of DFAs” are equal, because that isn’t even a sensible question as they’re sets of different “types” of things. On the other hand, they both decide languages and we can compare sets of the same thing. In addition, languages are what we really care about here because the set of languages decided tells us about the computational power of a model.

Since we know what construction we want, let’s try building it. To start with, the easy direction is showing that for every DFA $M$ that decides a language $L$ there exists an NFA $N$ that also decides $L$. To do this, first we assume that we have our tuple $(Q,\Sigma,\delta,q_0,F)$ for the DFA $M$. Now we can make our NFA $N$ as follows

• $Q^N = Q$
• $\Sigma^N = \Sigma$
• $q^N_0 = q_0$
• $F^N = F$

and last we have the non-trivial part

• $\delta^N(q,\epsilon) = \emptyset$
• $\delta^N(q,a) = \{\delta(q,a)\}$

or in words $\delta^N$ has no $\epsilon$ transitions and on a non-epsilon input, it just returns the singleton-set of what $\delta$ returns.

This embedding is so simple that as we proceed in the class we may refer to the idea that DFAs “really are” just NFAs. To show that this recognizes the same language, we’d need to show that for a string $w$ there exists a sequence of states $r_0 \ldots r_n$ witnessing that $M$ accepts $w$ IFF there exists a sequence of states $y_0 \ldots y_k$ that witnesses that $N$ accepts $w$. For this construction, the theorem is trivial because the sequence of states is the same in both cases.

As for the other direction, that will be somewhat more complicated. We’ll start with recalling two things we’ve seen before: that for DFAs we simulated the union by using pairs of states as our new set of states and that the transition function represents non-determinism as sets of states. Combining these two ideas, we get that in order to simulate an NFA with a DFA the states of the DFA should be sets of states of the NFA. This is still a finite set of states because the powerset of a finite set is finite, though exponentially larger. The idea here is that we’re “paying” for the cost of the simulation in space, not time, since a DFA will always take time linear in the input string. This linearity is why we can’t use the perhaps more obvious trick of “backtracking” to simulate the non-determinism: it doesn’t fit the computational model of a DFA.

We can then take a stab at defining the DFA $M$, given that our NFA is described by the tuple $(Q,\Sigma,\delta,q_0,F)$ we can define our new DFA $M$ as

• $Q^M = P(Q)$, the states of $M$ are sets of states
• $q^M_0 = \{ q_0 \}$, the start state is the singleton set of the original start state
• $\delta^M(qs,a) = \bigcup_{q \in qs} \delta(q,a)$, the transition function takes a step from all its possible states and collates the results into the new set of possible states
• $F^M = \{S | \exists s : S. s \in F\}$, or that our new accepting states are the ones that contain at least one element of the old $F$. Not every state you can be in needs to be in an accept state, but you need to be in at least one accept state.Wait, though, there’s a bit of a problem here: we haven’t taken into account the epsilon transitions. We have to get rid of them somehow in order to have a valid DFA. To do that, we need to introduce a new construct: the epsilon closure of a set of states. The epsilon closure is defined as $E(A) = \{q | q \text{ is reachable from some state in } A \text{ in 0 or more epsilon transitions}\}$, and the reason why it’s “0 or more” is that we want $A \subseteq E(A)$ and the “0” guarantees that all elements of $A$ will be in $E(A)$. So given this construct, we need to use it in two places: first, the starting state should really be the epsilon closure of $q_0$ and second in the definition of $\delta^M$ we should actually have $\bigcup_{q \in qs} E(\delta(q,a))$. Now we have the correct definition of the conversion from an NFA to a DFA.

For this lecture, we’ll elide proving that this construction is correct but hopefully it is clear that this follows the prior description of how we’ll simulate non-determinism with a DFA.

Here I think I’ll end things until my next post.