# Lecture: Context Free Languages, CFGs, PDAs

Now we come to our next notion of computation beyond the regular languages and their associated models of computation, regexps nfas and dfas: the context free languages. Our motivating example is going to be the language we’ve seen repeatedly at this point, $\{0^n1^n | n \geq 0\}$. We showed last time there was no way to make a DFA that decides this language.

Again, we’ll define our set of languages in terms of some model of computation. To this point, we introduce context free grammars (CFGs). A context free grammar is like a regular expression but much more powerful. The basic model of computation is the same: we have a set of symbols and rules to expand them. What’s different about CFGs over RegExps is that RegExps have a pre-defined set of rules for their expansion, meanwhile part of the definition of a CFG is the set of rules for expansion of symbols.

To whit, the CFG that matches our troublesome language is

• $A \to 0A1$
• $A \to \epsilon$

So, for example, we can expand to get the string “00001111” by the sequence of expansions $A \to 0A1 \to 00A11 \to 000A111 \to 0000A1111 \to 00001111$. Let’s define these CFGs a bit more formally.

A context free grammar is:

• A finite set of variables $V$
• An alphabet $\Sigma$, where $\Sigma$ and $V$ are disjoint. These are the “terminals” of the CFG.
• A finite set of rules $R$, where a “rule” is a pair of a variable and a sequence of terminals and variables.
• A distinguished variable that’s the start variable

The set of strings that are generated by all the expansions of the grammar is the language described by the grammar. Again, it’s a finite computable process because since there are a finite number of rules and any string we are testing against has a finite length we can simply brute force check through all the expansions of the grammar that are the length of the target string.

We can do a number of other things with CFGs. For example, we could have a CFG for palindromes over an alphabet.

There’s one special form for CFGs that we should note specifically, which is Chomsky Normal Form. A CFG is in Chomsky Normal Form whenever it has the following properties

• Every expansion of a variable is either to exactly two variables or a single terminal, i.e. is of the form $A \to BC$ or $A \to c$
• No variable except the start variable can expand to $\epsilon$
• No variable can expand to the start variable

This means that Chomsky Normal Form CFGs have a very simple inductive structure that we can take advantage of for proofs. What’s particularly useful is that, as we’ll show, all CFGs have an equivalent CFG in Chomsky Normal Form that generates the same language.

Now we make this construction clear in steps:

• We first introduce a new fresh start variable, $S'$, and have it expand to the old start variable $S$
• The second step is that remove all rules of the form $A \to \epsilon$. This is a recursive process where we pick a variable $A$ that has an $\epsilon$ expansion and then we remove that rule and modify the rest of the expansions to account for the fact that $A$ can expand to nothing. We do this by taking every rule that contains an $A$ on the right hand side, i.e. something like $X \to B \ldots A \ldots C$, and replace it with a rule that has the $A$ removed, i.e. $X \to B \ldots C$. Now, if the rule is $X \to A$ then we replace it with $X \to \epsilon$. Wait, aren’t we removing the $\epsilon$ transitions? Yes, and so we iterate this process until all rules that have an $\epsilon$ on the rhs other than the start variable are eliminated. We are, essentially, propagating up the use of $\epsilon$ to the top of the derivation tree.
• Next, we replace all rules of the form $A \to B$ by inlining the possible expansions of $B$ so that if we had $A \to B$ and $B \to \ldots$ then we replace $A \to B$ with $A \to \ldots$. Note that in this step we don’t remove expansions from $B$
• Now, finally, we take care of rules where a variable expands to more than two variables, more than one terminal, or a mixture of variables and terminals. If we have an expansion such as $A \to 0B$ we replace the 0 with a new variable and a single expansion, i.e. $A \to 0B$ will become $A \to XB$ and $X \to 0$. If we have an expansion that has more than two variables, such as $A \to B C D$ then we add a new variable that expands into the sequence piecewise, i.e. the rule becomes $A \to X D$ where $X \to B C$. Note that there’s some freedom here but that no matter how you choose the steps involved you’ll get an equivalent grammar in Chomsky Normal Form

It’s probably a good time for an example, so let’s consider our language above for

• $A \to 0A1$
• $A \to \epsilon$

Following step 1 of the above process, we get a new start symbol that must expand to $A$ so the grammar becomes

• $S \to A$
• $A \to 0A1$
• $A \to \epsilon$

now, we eliminate the $\epsilon$ transitions.

• $S \to A$
• $S \to \epsilon$
• $A \to 01$
• $A \to 0A1$

You can see that everywhere there was an $A$ on the rhs, we’ve added a new rule that has the $A$ removed. Now the only place $\epsilon$ shows up is in an expansion of the start variable, which is allowed in Chomsky Normal Form.

Next, we eliminate unary transitions so now we have

• $S \to 01$
• $S \to 0A1$
• $S \to \epsilon$
• $A \to 01$
• $A \to 0A1$

Yes, this step has created a lot of redundancy in the rules. Chomsky Normal Form is useful for its simple inductive structure, but the price of simplicity is that we can no longer represent things as compactly as we’d like.

Finally, we put all the remaining rules in the proper form. First, we’ll clean up the terminals and then make the rest of rules only expand to two variables.

• $S \to XY$
• $S \to XAY$
• $S \to \epsilon$
• $A \to XY$
• $A \to XAY$
• $X \to 0$
• $Y \to 1$

and after the final bit of cleanup

• $S \to XY$
• $S \to ZY$
• $S \to \epsilon$
• $A \to XY$
• $A \to ZY$
• $X \to 0$
• $Y \to 1$
• $Z \to XA$

and our grammar is now in Chomsky Normal Form. Wow, umm, that’s a lot uglier and harder to read now isn’t it? Moving on!

So when dealing with the regular languages, we had regular expressions which had an interpretation as DFAs/NFAs. Now if CFGs play the role of regexps for the context-free languages, then what plays the role of the NFA? Let’s think for a moment about why we couldn’t build an NFA for that pesky language $\{0^n1^n | n \geq 0\}$. We didn’t have any notion of “memory” for our NFA, there was no way to keep count of how many 0s we’d already seen so we’d know to only accept an equal number of 1s.

That being said, if we had something that was an awful lot like an NFA yet had a notion of memory then maybe that would solve the problem. That’s exactly what we’re going to introduce: Pushdown automata (PDAs). We’ll get to those next time.