The language of balanced parentheses

Another interesting example language is the language of balanced parentheses, also known as the von Dyck language. This language has only two tokens, “(” and “)”. It consists of those lists of tokens where we can match open and close parentheses in such a way that they are nested and occur in the correct order.

Every element of this language has the same number of (’s and )’s, but that’s not sufficient. “)(”, for example, does not belong to the language. We could also allow pairs of “[”’s and ”]”’s or “{”’s and “}”’s, in addition to \({ ( }\)’s and \({ ) }\)’s, but won’t, or at least not yet.

A grammar for the language of balanced parentheses is

a : | b ;
b : "(" ")" | "(" b ")" | "(" ")" b | "(" b ")" b

Note the empty space between the : and | in the rule for a. This indicates that an empty list of symbols is an acceptable expansion for a. We need this because an empty string has balanced parentheses, trivially, according to our definition.

Every list generated by these rules has balanced parentheses. Less obviously, every element of the language can be generated by these rules. Still less obviously, it can be generated in only one way. I won’t give the proof but here’s the key idea: Every non-empty element of the language starts with a (. This ( must have a matching ). The list of tokens in between has matching parentheses, as does the list after. Either or both of those lists could be empty.