Context free languages

A context free grammar is a phrase structure grammar where every rule gives a finite set of alternates for a non-terminal symbol, each alternate being a finite list of symbols. The second “finite” is redundant because lists are always finite. It’s just there as a reminder.

All of the phrase structure grammars we’ve considered are of the form described above. They have the property that the possible expansions of a symbol are independent of what symbols appear before or after it. That’s where the term “context free” comes from. We could imagine more general grammars where the possible expansions are allowed to depend on other symbols in the list. That would take us into the realm of context sensitive grammars. We won’t do that this semester though.

A language is called context free if it can be generated by a context free grammar. Left and right regular grammars are context free grammars so regular languages are context free languages. Not every context free grammar is regular though. We saw a context free grammar for the language of balanced parentheses earlier, so it is context free, but we’ve already seen that it is not regular.

As another example of a context free language which is not regular, consider the language whose members are strings with some number of x’s, followed by the same number of y’s, followed by any positive number of z’s. We can show that this language is not regular using either the Pumping lemma or the Myhill-Nerode theorem. It is context free though, since we can write down the following simple phrase structure grammar for it.

%%
start : xsys zs
      ;

xsys  : | x xsys y
      ;

zs    : z | zs z
      ;

Similarly, the language with any positive number of x’s, followed by some number of y’s, followed by the same number of z’s is context free but not regular. In this case it’s not possible to use the usual version of the Pumping lemma but it is possible to use the second version, and it’s also possible to use the Myhill-Nerode theorem. It is straightforward to write down a phrase structure for this grammar, very similar to the one above.

Not all languages are context free. This follows from a simple counting argument since the set of phrase structure grammars for a non-empty countable set of tokens is countable but but the set of languages for the same set of tokens is uncountable.

Natural languages tend not to be context free, although some of them aren’t far off. Well designed computer languages typically are context free, which makes it relatively straightforward to write parsers for them. Languages designed by people or committees who don’t have to implement them often fail to be context free, as do languages where the language specification evolved from a preexisting compiler implementation.

I am cheating slightly though when I claim that well designed languages have context free grammars, because there may be some programs which parse correctly but are not valid due to constraints in the language specification which cannot be implemented in a context free way, like declaration before use requirements. Violating these constraints is not, strictly speaking, a syntax error, but this is admittedly a fine distinction. There are programming languages which are context free in the strictest possible sense but you probably wouldn’t enjoy debugging a program written in one.