You shouldn’t assume just because a name is familiar that it means what you think. In our example
number : integer | "." integer | integer "." | integer "." integer
integer : digit | integer digit
digit : "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
| "8" | "9" | "A" | "B" | "C" | "D" | "E" | "F"
ACAB
is a number
while -7
and
5,011,400
are not. There are reasons for this.
A
through F
are classed as digits to allow for
hexadecimal representations of numbers. Disallowing the commas which
traditionally separate groups of three digits is a design decision. It
simplifies processing and avoids the awkward fact that most of the
non-English speaking world uses dots instead of commas, while India uses
commas but places them differently.
The minus sign isn’t needed because bc
is a calculator
and part of its grammar that I omitted earlier is
expression : number | "(" expression ")" | "-" expression
| expression "+" expression | expression "-" expression
| expression MUL_OP expression | expression "^" expression
This is another recursive rule. MUL_OP
is in upper case
so we recognise that it must be a terminal symbol. As you might guess
from the name it includes the token *
for the
multiplication operator but it also includes a couple of tokens. From
this rule we see that a number
is an
expression
and -
followed by any
expression
is an expression
so -7
isn’t a number
but it is an expression
. You
might have noticed that a -
appears in two different
possible expansions of expression
. In addition to the
expansion "-" expression
there’s also
expression "-" expression
, which would allow, for example,
27-9
.
In any case, lets try the generative grammar approach and generate
some NUMBER
s. We’ll start from the rule for
NUMBER
and pick possibilities at random each time we have
to expand a nonterminal or choose a token for a terminal. Each line will
be the result of doing this to the previous line.
number
integer
digit
7
So 7
is a number
. Let’s try again.
number
. integer
. digit
. B
So .B
is also a number
. Another two
attempts:
number
integer
digit
5
number
integer . integer
digit . integer
digit . integer digit
digit . digit digit
5 . digit digit
5 . C digit
5 . C C
So .B
, 5
and 5.CC
are
number
s. Note that the spaces between symbols above, and in
the specification are just there to improve readability and are not part
of the string we’re generating.