The full language for bc
is rather complicated so let’s
concentrate just on the last bit for now:
NUMBER : integer
| '.' integer
| integer '.'
| integer '.' integer
;
integer : digit
| integer digit
;
digit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
| 8 | 9 | A | B | C | D | E | F
;
You shouldn’t assume just because a name is familiar that it means
what you think. According to this grammar ACAB
is a
NUMBER
while -7
and 5,011,400
are
not. There are reasons for this. A
through F
are classed as digits to allow for hexadecimal representations of
numbers. Disallowing the commas which traditionally separate groups of
three digits is a design decision. It simplifies processing and avoids
the awkward fact that most of the non-English speaking world uses dots
instead of commas, while India uses commas but places them differently.
The minus sign isn’t needed because bc
is a calculator and
a further part of its grammar is a rule for expression
which includes '-' expression
. A NUMBER
is an
expression
and -
followed by any
expression
is an expression
so -7
isn’t a NUMBER
but it is an expression
.
In any case, lets try the generative grammar approach and generate
some NUMBER
s. We’ll start from the rule for
NUMBER
and pick possibilities at random each time we have
to expand a nonterminal or choose a token for a terminal. Each line will
be the result of doing this to the previous line.
NUMBER
integer
digit
7
So 7
is a NUMBER
. Let’s try again.
NUMBER
. integer
. digit
. B
So .B
is also a NUMBER
. Another two
attempts:
NUMBER
integer
digit
5
NUMBER
integer . integer
digit . integer
digit . integer digit
digit . digit digit
5 . digit digit
5 . C digit
5 . C C
So .B
, 5
and 5.CC
are
NUMBER
s. Note that the spaces between symbols above, and in
the specification are just there to improve readability and are not part
of the string we’re generating.