I haven’t described the grammar for our grammar specification
language. You’ve probably picked up on most of it from the examples but
just in case you haven’t here are the details. %token
statements list the types of terminals the parser can expect from the
lexical analyser. Tokens which are single character don’t need to be
listed. The %start
statement identifies the start symbol.
%%
separates these from the actual grammar rules. Each
grammar rule has the nonterminal being expanded, followed by a colon,
followed by the possible expansions, followed by a semicolon. Different
possible expansions are separated by |
characters. Each
expansion is just a list of symbols.
This is one of many conventions for listing grammar rules. The
authors of the POSIX specification chose it because it happens to be the
format expected by the parser generator yacc
, which, like
bc
, is one of the utilities described in the specification.
So its grammar is also defined by the standard. If you’re curious here
it is:
/* Grammar for the input to yacc. */
/* Basic entries. */
/* The following are recognized by the lexical analyzer. */
%token IDENTIFIER /* Includes identifiers and literals */
%token C_IDENTIFIER /* identifier (but not literal)
followed by a :. */
%token NUMBER /* [0-9][0-9]* */
/* Reserved words : %type=>TYPE %left=>LEFT, and so on */
%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
%token MARK /* The %% mark. */
%token LCURL /* The %{ mark. */
%token RCURL /* The %} mark. */
/* 8-bit character literals stand for themselves; */
/* tokens have to be defined for multi-byte characters. */
%start spec
%%
spec : defs MARK rules tail
;
tail : MARK
{
/* In this action, set up the rest of the file. */
}
| /* Empty; the second MARK is optional. */
;
defs : /* Empty. */
| defs def
;
def : START IDENTIFIER
| UNION
{
/* Copy union definition to output. */
}
| LCURL
{
/* Copy C code to output file. */
}
RCURL
| rword tag nlist
;
rword : TOKEN
| LEFT
| RIGHT
| NONASSOC
| TYPE
;
tag : /* Empty: union tag ID optional. */
| '<' IDENTIFIER '>'
;
nlist : nmno
| nlist nmno
;
nmno : IDENTIFIER /* Note: literal invalid with % type. */
| IDENTIFIER NUMBER /* Note: invalid with % type. */
;
/* Rule section */
rules : C_IDENTIFIER rbody prec
| rules rule
;
rule : C_IDENTIFIER rbody prec
| '|' rbody prec
;
rbody : /* empty */
| rbody IDENTIFIER
| rbody act
;
act : '{'
{
/* Copy action, translate $$, and so on. */
}
'}'
;
prec : /* Empty */
| PREC IDENTIFIER
| PREC IDENTIFIER act
| prec ';'
;
I mentioned in the introduction that you could have a parser
generator generate its own parser. The grammar given above is what you
would need in order to do that. Perhaps surpringly its grammar is no
more complicated than that of the simple calculator bc
.