Parsing by guessing

The preceding section gave a deterministic pushdown automaton recogniser for a particular grammar. The method used was adapted to that particular language and doesn’t provide much inspiration if someone hands us another language and asks for a pushdown automaton recogniser for it. If we’re willing to give up determinism then there’s a simple recipe we can use:

This doesn’t have to terminate. What’s different about this method from the one we saw in the example is that this one processes one stack item at a time rather than one input token at a time. The stack can shrink or grow, while the remaining input only ever shrinks. In fact it’s very unusual for all computational paths to terminate.

It’s also non-deterministic because of the step where we choose an alternate from the rule for a non-terminal symbol.

If the input belongs to the language then some computational path will terminate successfully. If the input does not belong to the language then it’s possible that all computational paths terminate unsuccessfully, but it’s more likely that at least one fails to terminate at all. If so then the automaton will never provide us with an answer because at any finite stage of the computation we won’t know whether it would eventually succeed if given more time.

I’ll illustrate this method with the same input as above for the language of zeroeth order logic. It will be necessary to write this in a somewhat different format from the previous example because of its size, and because it’s organised somewhat differently, processing one stack item at a time rather than one input character at a time.

  1. input: \({ \{[(p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: statement
  2. input: \({ \{[(p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: expression
  3. input: \({ \{[(p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: { expression binop expression }
  4. input: \({ [(p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: expression binop expression }
  5. input: \({ [(p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: [ expression binop expression ] binop expression }
  6. input: \({ (p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: expression binop expression ] binop expression }
  7. input: \({ (p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: ( expression binop expression ) binop expression ] binop expression }
  8. input: \({ p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: expression binop expression ) binop expression ] binop expression }
  9. input: \({ p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: variable binop expression ) binop expression ] binop expression }
  10. input: \({ p⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: letter binop expression ) binop expression ] binop expression }
  11. input: \({ ⊃q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: binop expression ) binop expression ] binop expression }
  12. input: \({ q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: expression ) binop expression ] binop expression }
  13. input: \({ q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: variable ) binop expression ] binop expression }
  14. input: \({ q)∧(q⊃r)]⊃(p⊃r)\} }\) stack: letter ) binop expression ] binop expression }
  15. input: \({ )∧(q⊃r)]⊃(p⊃r)\} }\) stack: ) binop expression ] binop expression }
  16. input: \({ ∧(q⊃r)]⊃(p⊃r)\} }\) stack: binop expression ] binop expression }
  17. input: \({ (q⊃r)]⊃(p⊃r)\} }\) stack: expression ] binop expression }
  18. input: \({ (q⊃r)]⊃(p⊃r)\} }\) stack: ( expression binop expression ) ] binop expression }
  19. input: \({ q⊃r)]⊃(p⊃r)\} }\) stack: expression binop expression ) ] binop expression }
  20. input: \({ (q⊃r)]⊃(p⊃r)\} }\) stack: variable binop expression ) ] binop expression }
  21. input: \({ q⊃r)]⊃(p⊃r)\} }\) stack: letter binop expression ) ] binop expression }
  22. input: \({ q⊃r)]⊃(p⊃r)\} }\) stack: letter binop expression ) ] binop expression }
  23. input: \({ ⊃r)]⊃(p⊃r)\} }\) stack: binop expression ) ] binop expression }
  24. input: \({ r)]⊃(p⊃r)\} }\) stack: expression ) ] binop expression }
  25. input: \({ r)]⊃(p⊃r)\} }\) stack: variable ) ] binop expression }
  26. input: \({ r)]⊃(p⊃r)\} }\) stack: letter ) ] binop expression }
  27. input: \({ )]⊃(p⊃r)\} }\) stack: ) ] binop expression }
  28. input: \({ ]⊃(p⊃r)\} }\) stack: ] binop expression }
  29. input: \({ ⊃(p⊃r)\} }\) stack: binop expression }
  30. input: \({ (p⊃r)\} }\) stack: expression }
  31. input: \({ (p⊃r)\} }\) stack: ( expression binop expression ) }
  32. input: \({ p⊃r)\} }\) stack: expression binop expression ) }
  33. input: \({ p⊃r)\} }\) stack: variable binop expression ) }
  34. input: \({ p⊃r)\} }\) stack: letter binop expression ) }
  35. input: \({ ⊃r)\} }\) stack: binop expression ) }
  36. input: \({ r)\} }\) stack: expression ) }
  37. input: \({ r)\} }\) stack: variable ) }
  38. input: \({ r)\} }\) stack: letter ) }
  39. input: \({ )\} }\) stack: ) }
  40. input: \({ \} }\) stack: }
  41. input: \({ }\) stack:

At no point before the stack emptied does the automaton terminate unsuccessfully and the input is empty once the stack is so the automaton terminates successfully. In other words, this string is recognised as a member of the language.

Note that this is one possible computational path. It is, in fact, the only one which terminates successfully. There are many others, some of which terminate unsuccessfully and some of which fail to terminate at all. As discussed earlier we could represent the set of all computational paths with a tree, but this tree would be infinite. It is possible to give a deterministic algorithm by, for example, traversing this tree in breadth first order. There’s no way for a deterministic pushdown automaton to do this, since maintaining the full tree requires something more powerful than a stack, but it could be done, for example, by a deterministic Turing machine. With a breadth first traversal we would only see a finite portion of the full infinite tree. I have chosen not to illustrate the portion which would be traversed because this planet is unfortunately not large enough to contain it.