# Lecture: Introduction to PDAs

Continuing from last time, we have that the “machine” that corresponds to CFGs are PDAs. Informally, our machines will be finite automata with a limited notion of memory: a stack. In our transitions, we’ll be allowed to not only look at the input character when making our decision but we’ll also be allowed to look at the top of the stack. When we make a transition, we pop a symbol from the stack and then look at both the next character of the input stream as well as the character we just popped. Note that there’s no reason why the input stream and the stack have to have the same alphabet, so in our definition of push down automata we’ll allow them to be different. After we make our transition, we will optionally push another character to the top of the stack. A PDA accepts a string when we reach an accept state at the end of processing the string. This is informally computable by the definition we’ve been using since, because you can only look at the top character our number of rules is just going to be, roughly, the product of the number of states, the size of the input alphabet, and the size of stack alphabet. We can clearly do this in finite time for the same reasons that our NFA and DFA were finite, and we only need a finite amount of data for storing the stack and the state machine. So, this is also a nice computable definition.

One thing we should address: should our PDA be deterministic or non-deterministic? If we think about our goal, which is to have a kind of machine that represents context free languages and has the same power as context free grammars, are context free grammars inherently deterministic or non-deterministic? Let’s consider a grammar such as

• $A \to 0A$
• $A \to A0$
• $A \to \epsilon$

and let’s consider the string $00$. How many ways are there to expand the start variable, $A$, into this string? Just at first blush, I believe there are four different ways. If there’s ambiguity in how we generate strings, how do we pick? Non-deterministically! Context free grammars are naturally non-deterministic. Now, you might wonder if for every CFG there exists a deterministic CFG that also describes the same language and thus the non-determinism isn’t necessary. It turns out that, indeed, the CFGs and determinstic CFGs are not equivalent. I don’t actually know a cute way to demonstrate this, but if I end up finding one I’ll share it with the class. (Also, that’s a hint to anyone reading this that if they know a cute demonstration that I’m overlooking then please share!)

We’ll include the formal definition as a tuple just like we did with NFAs/DFAs. It consists of

• A finite set of states $Q$
• $\Sigma$, the input alphabet
• $\Gamma$, the stack alphabet
• $\delta : Q \times \Sigma_{\epsilon} \times \Gamma_{\epsilon} \to P(Q \times \Gamma_{\epsilon})$
• $q_0 \in Q$ which is the start state
• $F \subseteq Q$ which is the set of accept states

Now, let’s talk about what all of this actually means. We have a state machine much like what we had with NFAs, it’s non-deterministic as we can see if we look at the presence of the power set in the type of $\delta$, and we have two different alphabets now just as we discussed above. Note, though, that the powerset isn’t just over the set of states this time but of the product $Q \times \Gamma_{\epsilon}$. That’s because the choices we have aren’t just in terms of which state to go to next, but also in terms of what to do with the stack. Continuing, we interpret $\Gamma_{\epsilon}$ on the left hand side of the arrow in the type of $\delta$ to mean that we’re popping a character from the stack, if the stack is non-empty, and looking at it in order to make our decision. If the stack is empty, then we get an $\epsilon$ instead of an element of $\Gamma$. $\Gamma_{\epsilon}$ means something slightly different on the right-hand side of the arrow, because that’s what we’re going to be pushing onto the stack. In this case, we’re either pushing a character from $\Gamma$ onto the stack or we’re optionally pushing nothing onto the stack, in which case we’re pushing $\epsilon$.

You might wonder, since we’ve been trying to keep our informal notation of computation intact so far, if there are any limits to the size of the stack. The answer will be “no”, because we’ll be using only a finite amount of stack after a finite number of steps, since we can either accept or reject a string after a finite number of steps then we know we’ll always be using just a finite amount of memory. We could, in a sense, just assume that there’s some size limit to the stack that’s hidden from us and behind the scenes for every input the PDA gets configured to set the size of the stack to be larger than we could possibly need for an input of that length. That’s a completely valid interpretation of things, mechanically, but mathematically let’s just assume that there are no hard limits on the size of the stack and just get comfortable with the fact that we only use a finite amount of it if we take a finite number of steps.

We still need to define, formally, what it means for a string to be accepted by a PDA though. First, we define what the state of the stack is at all times by defining what it is after a step of computation.

• If our stack is $c w$, where $c \in \Gamma$, and $\delta(q,a,c) = (q',c')$ where $c' \in \Gamma_{\epsilon}$ then our new stack is $c' w$. Note that we’re representing the stack as, essentially, being a string here and reusing the machinery of string concatenation to describe this. We could also introduce a list data structure, but Sipser just uses strings to represent stacks, where the leftmost character of the string is the top of the stack, and represent the empty stack as $\epsilon$. I don’t entirely agree with reusing strings for this, but can appreciate the economy of abstractions by introducing as little machinery as possible.
• If our stack is $\epsilon$, and $\delta(q,a,c) = (q',c')$ then our new stack is $c'$.

Thus, as long as we have an initial definition of the state of the stack, we can understand what the sequence of stack states as the computation progresses are.

We say that a PDA $M$ accepts a string $w$ when $w = w\_0 \ldots w_n$ where $w_i \in \Sigma_{\epsilon}$ and that there exists a sequence of states $r_0 \ldots r_n$ and stack states $s_0 \ldots s_n$ such that

• $r_0 = q_0$ and $s_0 = \epsilon$, i.e. we start in the start state and the stack is initially empty
• $(r_{i+1},a_{i+i}) \in \delta(q_i,w_i,a_i)$ where $s_i = a_i t$ and $s_{i+1} = a_{i+1} t$.
• $r_m \in F$

Now that we’ve done all of that we can go ahead and start working out examples of PDAs and show that, indeed, they can handle the kinds of CFLs we’re wanting to do. We’ll label the transitions with somethig slightly more complicated than before and all our labels will be of the form “(a,b) -> c\$ where “a” is going to be the character we read from input, “b” is the character we pop off the stack, and “c” is the character we’re pushing onto the stack. And the reason why those are formatted in ugly ascii rather than latex code is that I’m still not sure how to get latex excepted by the tool I’m using to make the inline graphs. In any case, let’s consider what the PDA looks like for the language $\{0^n1^n | n \geq 0\}$, our old friend. The basic idea is that we’re going to use the stack to track how many 0s we see before we start accepting 1s, pushing a 0 onto the stack per 0 we see in the input stream. We then pop off a 0 for each 1 we see, and then we make sure that the whole stack is empty before accepting at the end of input. Wait, shoot, how do we see if the stack is empty? We do that by pushing a special “start symbol” onto the stack during our first transition, and then by having the transition to the accept state only happen by popping the start symbol back off the stack. Also, a last notational thing is that we’ll use e for $\epsilon$. Without further ado,

We can see how this graph implements the algorithm we just saw. Now, what about the palindromes? Let’s remember that our CFG for the palindromes was

• $A \to \epsilon$
• $A \to 0A0$
• $A \to 1A1$
• $A \to 1$
• $A \to 0$

Well what we want here is to use the memory of the PDA to keep track of all the characters we saw up until we start accepting the “other half” of the string. Of course, how can you tell when you’ve “seen half” of the string? That’s where non-determinism comes in incredibly handy, because we can just make that whenever we want. Now, keep in mind, though, that there’s those two transitions that we need to get the odd palindromes as well

• $A \to 1$
• $A \to 0$

because they’ll mean that when we make the switch from “first half” to “second half” then we’ll need to use an $\epsilon$ or 1 or 0. Let’s just see what this looks like

We can see that this follows a very similar structure to the language of matched 0s and 1s and that if we trace out something like the execution for $00100$ then it should look something like the following, where we represent the computation as a triple of $w$ which will be what’s left of the string to process, $s$ which will be the state of the stack, and $q$ which is the state we’re in. So we start out in $(00100,\epsilon,M_0)$ and the correctly terminating trace of the execution becomes

• $(00100,\epsilon,M_0)$
• $(00100,\,M_1)$
• $(0100,0\,M_1)$
• $(100,00\,M_1)$
• $(00,00\,M_2$
• $(0,0\,M_2)$
• $(\epsilon, \, M_2)$
• $(\epsilon,\epsilon,M_3)$
• accept

Neat, huh?

We’ll leave this lecture here and pick up next post with a sketch of the equivalence of PDAs and CFGs and a bit on the context free pumping lemma