The full language for bc is rather complicated so let’s
concentrate just on the last bit for now:
NUMBER : integer
| '.' integer
| integer '.'
| integer '.' integer
;
integer : digit
| integer digit
;
digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
| 8 | 9 | A | B | C | D | E | F
;
You shouldn’t assume just because a name is familiar that it means
what you think. According to this grammar ACAB is a
NUMBER while -7 and 5,011,400 are
not. There are reasons for this. A through F
are classed as digits to allow for hexadecimal representations of
numbers. Disallowing the commas which traditionally separate groups of
three digits is a design decision. It simplifies processing and avoids
the awkward fact that most of the non-English speaking world uses dots
instead of commas, while India uses commas but places them differently.
The minus sign isn’t needed because bc is a calculator and
a further part of its grammar is a rule for expression
which includes '-' expression. A NUMBER is an
expression and - followed by any
expression is an expression so -7
isn’t a NUMBER but it is an expression.
In any case, lets try the generative grammar approach and generate
some NUMBERs. We’ll start from the rule for
NUMBER and pick possibilities at random each time we have
to expand a nonterminal or choose a token for a terminal. Each line will
be the result of doing this to the previous line.
NUMBER
integer
digit
7
So 7 is a NUMBER. Let’s try again.
NUMBER
. integer
. digit
. B
So .B is also a NUMBER. Another two
attempts:
NUMBER
integer
digit
5
NUMBER
integer . integer
digit . integer
digit . integer digit
digit . digit digit
5 . digit digit
5 . C digit
5 . C C
So .B, 5 and 5.CC are
NUMBERs. Note that the spaces between symbols above, and in
the specification are just there to improve readability and are not part
of the string we’re generating.