Ambiguous grammars

Almost all of the grammars we’ve considered so far have been unambiguous, in the sense that there’s only one abstract syntax tree we can get from any given input. One exception was the fragmentary grammar for English. Grammars for natural languages are almost always ambiguous. The other exception was the language for expressions which we extracted from the grammar of the bc utility.

expression
expression + expression
number + expression
integer + expression
digit + expression
1 + expression
1 + expression + expression
1 + number + expression
1 + digit + expression
1 + 2 + expression
1 + 2 + number
1 + 2 + digit
1 + 2 + 3

and another is

expression
expression + expression
expression + number
expression + digit
expression + 3
expression + expression + 3
number + expression + 3
digit + expression + 3
1 + expression + 3
1 + number + 3
1 + digit + 3
1 + 2 + 3

These differ not just in the order in which we expanded symbols but in how the expression 1 + 2 + 3 is broken up into phrases. In the first one 1 and 2 + 3 are expressions joined by a +. In the second 1 + 2 and 3 are expressions joined by a +.

You can see that these are in fact distinct parsings by looking at the corresponding parse trees.

First parse tree for 1 + 2 + 3
Second parse tree for 1 + 2 + 3

Strictly speaking these aren’t full parse trees, since they don’t show all the symbols which are expanded when generating the input. The full version, with nodes labelled by symbols, is given just for the first of these.

Full parse tree for 1 + 2 + 3

Ambiguous grammars are allowed. In fact there are context free languages for which no unambiguous grammar exists. It’s also possible, and indeed common, for an ambiguous grammar and an unambiguous grammar to define the same language. It’s often easier to write an ambiguous grammar for a language and often easier to analyse an unambiguous one. In fact the specification for bc doesn’t require the particular grammar given in the specification, merely that whatever grammar is used should recognise the same language as this one generates. There are unambiguous grammars for this language and an implementation which used one would still be compliant.