Regular languages

An important category of languages is the regular languages. These can be characterised in a variety of ways, via regular grammars, finite state automata, regular expressions, or syntactic monoids. It’s important to understand all of these points of view because often a problem which is hard to solve using one description is easy in another.

To simplify things all of our examples in this chapter will be languages where the tokens are single characters and we’ll write lists of tokens as strings. When writing grammars each terminal symbol will have only a single token, i.e. character, and will be denoted by that character. The characters ‘%’, ‘:’, ‘|’, and ‘;’, which have a special meaning in our language for describing languages, will not be tokens in any of our example languages, nor will any whitespace characters. Non-terminal symbols will always be denoted by a string more than one character long. These aren’t limitations imposed by the theory, just ways of making the examples easier for you to read.

List of characters are strings. Because all tokens in our examples are characters all lists of tokens are strings. Strings are more familiar than lists of tokens so I’ll often refer to them as strings when doing examples. I may occasionally make the mistake of referring to strings rather than lists of tokens in the general theory as well.