A final example

I haven’t described the grammar for our grammar specification language. You’ve probably picked up on most of it from the examples but just in case you haven’t here are the details. %token statements list the types of terminals the parser can expect from the lexical analyser. Tokens which are single character don’t need to be listed. The %start statement identifies the start symbol. %% separates these from the actual grammar rules. Each grammar rule has the nonterminal being expanded, followed by a colon, followed by the possible expansions, followed by a semicolon. Different possible expansions are separated by | characters. Each expansion is just a list of symbols.

This is one of many conventions for listing grammar rules. The authors of the POSIX specification chose it because it happens to be the format expected by the parser generator yacc, which, like bc, is one of the utilities described in the specification. So its grammar is also defined by the standard. If you’re curious here it is:

/* Grammar for the input to yacc. */
/* Basic entries. */
/* The following are recognized by the lexical analyzer. */


%token    IDENTIFIER      /* Includes identifiers and literals */
%token    C_IDENTIFIER    /* identifier (but not literal)
                             followed by a :. */
%token    NUMBER          /* [0-9][0-9]* */


/* Reserved words : %type=>TYPE %left=>LEFT, and so on */


%token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION


%token    MARK            /* The %% mark. */
%token    LCURL           /* The %{ mark. */
%token    RCURL           /* The %} mark. */


/* 8-bit character literals stand for themselves; */
/* tokens have to be defined for multi-byte characters. */


%start    spec


%%


spec  : defs MARK rules tail
      ;
tail  : MARK
      {
        /* In this action, set up the rest of the file. */
      }
      | /* Empty; the second MARK is optional. */
      ;
defs  : /* Empty. */
      |    defs def
      ;
def   : START IDENTIFIER
      |    UNION
      {
        /* Copy union definition to output. */
      }
      |    LCURL
      {
        /* Copy C code to output file. */
      }
        RCURL
      |    rword tag nlist
      ;
rword : TOKEN
      | LEFT
      | RIGHT
      | NONASSOC
      | TYPE
      ;
tag   : /* Empty: union tag ID optional. */
      | '<' IDENTIFIER '>'
      ;
nlist : nmno
      | nlist nmno
      ;
nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
      | IDENTIFIER NUMBER  /* Note: invalid with % type. */
      ;


/* Rule section */


rules : C_IDENTIFIER rbody prec
      | rules  rule
      ;
rule  : C_IDENTIFIER rbody prec
      | '|' rbody prec
      ;
rbody : /* empty */
      | rbody IDENTIFIER
      | rbody act
      ;
act   : '{'
        {
          /* Copy action, translate $$, and so on. */
        }
        '}'
      ;
prec  : /* Empty */
      | PREC IDENTIFIER
      | PREC IDENTIFIER act
      | prec ';'
      ;

I mentioned in the introduction that you could have a parser generator generate its own parser. The grammar given above is what you would need in order to do that. Perhaps surpringly its grammar is no more complicated than that of the simple calculator bc.