Parse trees

The process described above, splitting a statement up into successively smaller phrases until we get to the simplest possible components, is called parsing. A common way to describe the result, both for natural and for formal languages, is what’s called an abstract syntax tree.

The following three figures give a visual representation of the abstract syntax trees for the two statements considered above.

Syntax tree for ((Probability and Statistics) or (Algebra and Geometry))
Syntax tree for ((Probability and (Statistics or Algebra)) and Geometry)
Syntax tree for (Probability and ((Statistics or Algebra) and Geometry))

A tree has elements called nodes and has arrows from one node to another. The nodes with no arrows going out are called the leaves of the tree. There is a single node with no arrows coming in, which is called the root. In our case the leaf nodes are all labelled by module names and the other nodes are all labelled by logical operators.

These visual representations are nice, but all computers, and many humans, are blind. It’s possible to describe the same information in a different way, with fully parenthesised expressions. The fully parenthesised expressions corresponding these abstract syntax trees are as follows.

((Probability and Statistics) or (Algebra and Geometry))

((Probability and (Statistics or Algebra)) and Geometry)

(Probability and ((Statistics or Algebra) and Geometry))

The internal representation a computer would use for a tree data structure isn’t either of these. The visual description and the parenthesised expressions are just for humans.

The fact that there are two possible abstract syntax trees for “Probability and (Statistics or Algebra) and Geometry”, depending on which “and” has higher precedence, shows that our grammar isn’t fully unambiguous.

When parsing statements in a formal language with a program one often wants to construct a data structure which mirrors this structure. For simple enough languages though it may be possible, as we’ll see, to use simpler data structures than a tree.