We can construct complicated languages from simpler languages in a variety of ways. It’s useful to be able to construct a grammar for the more complicated language from grammars for the simpler languages from which it’s built.
Languages are sets of lists. As sets it makes sense to talk about unions, intersections and relative complements. The union of two languages is again a language, as is the intersection or relative complement. Given left regular grammars for a pair of languages can we give a left regular grammar for their union? For the intersection? For the relative complement? The answer in each case is yes, but this is only easy to do for the union. We just need to create a new rule for the start symbol, which includes all the alternates for the start symbols of the original two languages, and copy all the rule for the other symbols, changing names if necessary to avoid duplicates. So the union of the two languages above has the grammar
%start union
%%
union : xxyy | zero | pos_int | neg_int
;
xxyy : | xxyy y | xx x
;
xx : | xx x | error y
;
error : error x | error y
;
zero : 0
;
neg_int : - pos_int
;
pos_int : 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
digits : | 0 digits
| 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
Of course the same remarks apply to right regular grammars as well. The union of two languages with right regular grammars has a right regular grammar. Also, the construction above is easily adapted to the union of finitely many languages.
We can also define a language whose members are the concatenation of
a member of the first language and a member of the second. As with the
union, we can construct a grammar for this new language from grammars
for the two old languages by a purely mechanical procedure, though this
time it’s rather more complicated. It may be easiest to understand the
procedure through examples. Consider, then, the language consisting of
integers followed by some number of x
’s and then some
number of y
’s.
We can start from our right regular grammar for the integers. Instead of an empty string at the end the of the input we should now have a string in the xy language, so we replace the empty alternatives in the right regular grammar for integers with the start symbol in the right regular grammar for the xy language.
%start intxxyy
%%
intxxyy : zero | pos_int | neg_int
;
zero : 0
;
neg_int : - pos_int
;
pos_int : 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
digits : weird | 0 digits
| 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
In this case there was only empty alternative, in the rule for the
digits
symbol. I haven’t added in the rules from the xxyy
grammar yet because we have a problem. weird
is a
non-terminal symbol and digits
is also a non-terminal
symbol, but the alternatives in a rule for a non-terminal symbol in a
right regular grammar should be empty or a terminal followed by a
non-terminal. As a first step to fixing this we need to replace
weird
by its possible expansions.
digits : xxyy | 0 digits
| 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
There was in fact only one alternate, namely xxyy
. This
also isn’t suitable as an alternate in a rule for a non-terminal symbol
so we need to replace it by it’s possible expansions.
digits : | x xxyy | y yy | 0 digits
| 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
Now we have a rule of the required form. We need to include more
rules from the right regular grammar for the xy language so we can
expand the symbols xxyy
and yy
. The complete
grammar for the concatenation language is
%start intxxyy
%%
intxxyy : zero | pos_int | neg_int
;
zero : 0
;
neg_int : - pos_int
;
pos_int : 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
digits : | x xxyy | y yy | 0 digits
| 1 digits | 2 digits | 3 digits
| 4 digits | 5 digits | 6 digits
| 7 digits | 8 digits | 9 digits
;
xxyy : | x xxyy | y yy
;
yy : | y yy | x error
;
error : x error | y error
;
We can also construct a left regular grammar for this concatenation language from the left regular grammars for the integer language and the xy language. This time we start from the left regular grammar for the xy language and replace the empty alternates with the start symbol for the integer language.
%start intxxyy
%%
intxxyy : xxyy
;
xxyy : integer | xxyy y | xx x
;
xx : integer | xx x | error y
;
error : error x | error y
;
This grammar is not of the required form though so we need to replace
integer
by its possible expansions. Those rules will still
not be be of required form, so we replace those replacements. The new
rules for xxyy
and xx
are then
xxyy : zero | nonzero | xxyy y | xx x
;
xx : zero | nonzero | xx x | error y
;
Those rules will still not be be of required form, so we replace
those replacements. The new rules for xxyy
and
xx
are then
xxyy : empty 0 |
| nonzero 0 | nonzero 1 | nonzero 2 | nonzero 3 | nonzero 4
| nonzero 5 | nonzero 6 | nonzero 7 | nonzero 8 | nonzero 9
| head 1 | head 2 | head 3 | head 4 | head 5
| head 6 | head 7 | head 8 | head 9
| xxyy y | xx x
;
xx : empty 0 |
| nonzero 0 | nonzero 1 | nonzero 2 | nonzero 3 | nonzero 4
| nonzero 5 | nonzero 6 | nonzero 7 | nonzero 8 | nonzero 9
| head 1 | head 2 | head 3 | head 4 | head 5
| head 6 | head 7 | head 8 | head 9
| xx x | error y
;
The full grammar is then
%start intxxyy
%%
intxxyy : xxyy
;
xxyy : empty 0 |
| nonzero 0 | nonzero 1 | nonzero 2 | nonzero 3 | nonzero 4
| nonzero 5 | nonzero 6 | nonzero 7 | nonzero 8 | nonzero 9
| head 1 | head 2 | head 3 | head 4 | head 5
| head 6 | head 7 | head 8 | head 9
| xxyy y | xx x
;
xx : empty 0 |
| nonzero 0 | nonzero 1 | nonzero 2 | nonzero 3 | nonzero 4
| nonzero 5 | nonzero 6 | nonzero 7 | nonzero 8 | nonzero 9
| head 1 | head 2 | head 3 | head 4 | head 5
| head 6 | head 7 | head 8 | head 9
| xx x | error y
;
error : error x | error y
;
head : | empty -
;
empty :
;
The general procedure constructing a grammar for the concatenation language from the grammars for a pair of languages is as follows.
As a further example I’ll construct a right regular grammar for the concatenation of the xy language with itself. This time we’ll need the renaming step mentioned above. We take two copies of the the right regular grammar for the xy language.
%start weirdl
%%
weirdl : xxyyl
;
xxyyl : | x xxyyl | y yyl
;
yyl : | y yyl | x errorl
;
errorl : x errorl | y errorl
;
and
%start weirdr
%%
weirdr : xxyyr
;
xxyyr : | x xxyyr | y yyr
;
yyr : | y yyr | x errorr
;
errorr : x errorr | y errorr
;
We’re constructing a right regular grammar so we start from the grammar for the left language.
%start weirdl
%%
weirdl : xxyyl
;
xxyyl : | x xxyyl | y yyl
;
yyl : | y yyl | x errorl
;
errorl : x errorl | y errorl
;
We then replace both empty alternates with weirdr
and
then replace that with xxyyr
and then that with
| x xxyyr | y yyr
.
%start weirdl
%%
weirdl : xxyyl
;
xxyyl : | x xxyyr | y yyr | x xxyyl | y yyl
;
yyl : | x xxyyr | y yyr | y yyl | x errorl
;
errorl : x errorl | y errorl
;
Then we add in rules from the other language.
%start weirdl
%%
weirdl : xxyyl
;
xxyyl : | x xxyyr | y yyr | x xxyyl | y yyl
;
yyl : | x xxyyr | y yyr | y yyl | x errorl
;
errorl : x errorl | y errorl
;
xxyyr : | x xxyyr | y yyr
;
yyr : | y yyr | x errorr
;
errorr : x errorr | y errorr
;
It’s important to understand which language we’ve just constructed a grammar for. A string is in this language if and only if it is the concatenation of two strings in the xy language. Those two strings could be the same but they don’t have to be. The question of whether we can construct a grammar for the language of two repetitions of the same string is one we’ll return to later.
Once we know how to construct a grammar for the concatenation of two languages we can construct a grammar for the concatenation of finitely many, by considering it at as a repeated concatenation.
Given a left or right regular grammar we can, using the techniques of the preceding section, construct, for each positive number \({ n }\), a grammar for the language whose members are concatenations of \({ n }\) members of the original language. Of course we can can also do this for \({ n = 0 }\). In this case the language consists of only the empty list and it has the grammar
%start start
%%
start : empty
;
empty :
;
We can also construct a grammar for the language of concatenations of between \({ m }\) and \({ n }\) members of a language, for natural numbers \({ m }\) and \({ n }\), using the union construction earlier. What’s less obvious is that we can construct a grammar for the language of concatenations of arbitrarily many members of a language.
Given a language the Kleene star of the language is set of all lists which can be found by concatenating arbitrarily many members of the language. This includes the empty list. The Kleene plus of the language is the set of lists which can be obtained by concatenating an arbitrary positive number of members of the language. If the original language had the empty list as a member then its Kleene star and Kleene plus are the same. If not then the Kleene star has the empty set as a member while the Kleene plus does not, but otherwise they are the same.
Given a regular grammar for a language we can find a regular grammar for its Kleene plus by looking through its rules for occurrences of the empty list and then adding alternates to any such rules. The alternates to be added are the alternates of alternates of the start symbol. To get a grammar for the the Kleene star we can apply our union construction discussed earlier to the Kleene plus grammar and the grammar given above for the language with just the empty list.
Applying the construction above to the xy language gives the following right regular grammar for its Kleene plus
%start weird
%%
weird : xxyy
;
xxyy : | x xxyy | y yy
;
yy : | y yy | x error | x xxyy | y yy
;
error : x error | y error
;
and the following grammar for its Kleene star
%start weird
%%
weird : xxyy | empty
;
xxyy : | x xxyy | y yy
;
yy : | y yy | x error | x xxyy | y yy
;
error : x error | y error
;
empty :
;
The constructions above are meant to show that regular grammars can
be constructed for these languages. They do not attempt to find
efficient grammars for them. For example, the Kleene star of the xy
language is just the set of all lists of x
’s and
y
’s. A much simpler grammar for this language is
%start ksxy
%%
ksxy : xyxy
;
xyxy : | x xyxy | y xyxy
;
The reversal of a language is simply the set of the reversals of its members, where the reversal of a list is the list with the same items in reverse order. Given a left regular grammar for a language we can easily construct a right regular grammar for its reversal and vice versa. We just take the alternates in the rules for the grammar and reverse the order of the symbols. Here, for example, is a right regular grammar for the reversed integers, constructed from the left regular grammar for the integers.
integer : zero | nonzero
;
zero : 0 empty
;
nonzero : 0 nonzero | 1 nonzero | 2 nonzero | 3 nonzero | 4 nonzero
| 5 nonzero | 6 nonzero | 7 nonzero | 8 nonzero | 9 nonzero
| 1 head | 2 head | 3 head | 4 head | 5 head
| 6 head | 7 head | 8 head | 9 head
;
head : | - empty
;
empty :
;
What’s less clear is how to construct a left regular grammar for the reversal from a left regular grammar for the original language or a right regular grammar for the reversal from a regular regular grammar for the original. This is a question we’ll return to later.