I’ve described one way of constructing a pushdown automaton from a context free grammar in such a way that the automaton recognises exactly those lists which are generated by the grammar. The reverse construction is also possible. Given a pushdown automaton we can construct from is a context free grammar which generates those lists which are recognised by the automaton.
As an example, consider a pushdown automaton working with the symbols
(
and )
set up as follows:
(
it pushes it. Whenever it reads a )
it attempts to pop whatever is at the top of the stack. If it can’t pop
an item because the stack is already empty then the computation
terminates unsuccessfully.We want a grammar for the language of lists of symbols which cause this automaton to terminate successfully when given them as input.
The key idea is to introduce a new symbol whose expansion should be
precisely those lists of symbols which leave us with an empty stack if
we started with an empty stack. In other words, although the automaton
might push new symbols onto the stack it will pop all of them off of the
stack by the time it reaches the end of the list and won’t attempt to
pop off more than it has pushed on. Lacking any more creative ideas, I
will call this new symbol new
.
Although I described the set of expansions of new
in
terms of a stack which is empty at the start and end of the list the
behaviour of the automaton is essentially the same with any other
initial stack. It will end up with the same stack at the end as at the
start because it never dips below the current stack of the top. If it
did then it would cause a failure when given the empty stack. For this
reason we can describe the expansions of new
as those lists
of symbols which, when fed to the automaton at a point where there are
\({ k }\) items on the stack will leave
the stack with the same \({ k }\) items
and no others, with the size of the stack never dipping below \({ k }\) in between.
Although it’s obvious, it’s worth stating explicitly that the empty
list has the property described above and so should be a possible
expansion for the symbol new
.
As described above, the stack has size \({
k }\) at the start and end of any list of symbols which are a
valid expansion of new
and it has size at least \({ k }\) at any point in between. It might
or might not have exactly \({ k }\) at
some intermediate point.
If it has size \({ k }\) at an
intermediate point then that means we can split the list in two at that
point and each of those pieces will also be a valid expansion of
new
.
If it doesn’t have size \({ k }\) at
any intermediate point then it has size greater than \({ k }\) everywhere in between, since the
size can never be less than \({ k }\).
In other words right at the start of the list we must have pushed a
symbol and this symbol isn’t popped off until the end of the list. In
between those operations we have a situation where the stack starts with
size \({ k + 1 }\), ends with size
\({ k + 1 }\), and has size at least
\({ k + 1 }\) everywhere in between. In
other words, whatever list of symbols we read in between that first push
and that last pop must be an expansion of new
. In this case
it’s simple to figure out what the first and last things we read are,
since only a (
can cause a push and only a )
can cause a pop. So in the case we’re currently considering, the one
where the stack doesn’t have size \({ k
}\) at any intermediate point, we must have a (
,
then something of type new
, and then a )
.
In the paragraphs above we’ve seen three things new
could expand to: the empty list, a concatenation of two
new
s, or a (
followed by a new
and a )
. It’s easy to see that these are the only
possibilities. So the grammar rule for new
must be
new : | new new | "(" new ")"
What is the start symbol for this grammar? We’ve specified that the
stack is initially empty and the for a successful termination it should
be empty finally as well and that no attempt to pop from an empty stack
occurs in between. The definition of new
fits this
description exactly, so we can just take new
to be the
start symbol and the single rule above is the entire grammar.
As you may already have guessed, the language recognised by this machine is the language of balanced parentheses and the grammar above is a grammar for it. It’s not the grammar we used previously. That grammar was unambiguous while this one is ambiguous.
The sort of analysis we employed above can be applied, with some modifications, to any pushdown automaton. The main complication is the possibility of multiple states. If we have multiple states then we need a new symbol for each possible pair of initial and final state for lists of symbols which preserve the stack in the way described above. Only for those where those states are the same is the empty list a valid expansion. When we consider those with an intermediate point where the stack is of the same size as at the beginning and end we have to allow for the various states the automaton could be in at that intermediate point. In the case where this doesn’t happen, i.e. where we push something onto the stack at the start and pop something off only at the end, we have to account for all the states we could be in just after pushing and just before popping. We also need to consider all the various choices for a symbol to push and pop. Finally, if we have multiple states then we need to identify the start state or states and the accepting state or states, those which represent a successful termination. The start state will then have an alternate for each pair of start and accepting state. Considering all of these possible combinations of states and symbols leads to complicated grammars, but conceptually the procedure isn’t really different from the one we employed in the language of balanced parentheses example.