A regular language

The module enrollment problem we’ve been discussing requires input from staff, about which combinations of modules students should be able to take, and from students, about which modules each student wants to take. So far we only have a language for the input from staff. In reality the students would probably select modules from some sort of web interface, but for the implementer it would be much easier just to provide a language for their input as well. The simplest such language would have statements which are just lists of modules. The statement “Statistics Algebra”, for example, would have the interpretation “I want to take Statistics and Algebra and nothing else”.

If our language includes all such lists of modules then no parsing is really needed. The lexical analyser, which splits the input into tokens, i.e. module names, does all the work.

There’s another option though. At the point where students are entering their module selections the staff have already entered all the information about allowed combinations. We could define a language consisting of precisely those module lists which are allowed. The information collected from the staff implicitly gives this language a grammar and grammatically correct just means allowed by the module selection rules. Of course any change to those rules gives us a new language.

What sort of language is this? It turns out to be regular. It is possible to create a finite state automaton which recognises it. In fact the example of a finite state automaton I gave you earlier is essentially the one which enforces the rule “Probability ∧ Statistics ∨ Algebra ∧ Geometry”. I just shortened each of the module names to just their initial letter to avoid clutter in the diagram.

One way to construct a module enrollment system would be to use the following components:

A parser generator. There are parser generators freely available which efficiently generate efficient parsers so we don’t need to write anything.
A simple lexical analyser. It just needs to distinguish module names from the Boolean operators ∧, ∨ and ¬ so it’s easy to write. It’s helpful if it has an option to throw an error whenever it sees a Boolean operator. That way it can be used, with the option unset, for input from staff entering module selection rules and, with the option set, for input from students selecting modules.
A grammar for the module rule language, which is the same as the language of the propositional calculus, written in the language which the parser generator accepts as input. This is very easy to write since the grammar is very simple.
The parser generated from this grammar. This may be complicated, but it’s generated for you by the parser generator.
A procedure which converts parsed statements to a grammar for the language of module selections for which those statements are true. This is the hardest part and unfortunately is very hard to do efficiently. It’s possible to arrange that the grammar is a regular grammar.
The parser generated by the parser generator from that grammar. Again, this parser will probably be complicated but it’s generated for us by the parser generator. Since the grammar is regular it could generate a finite state machine parser, but might choose not to. The actual parsing isn’t really what’s needed, just the check that the module selection is grammatically correct.

I’m not saying you should construct a module enrollment system this way, merely noting that you can.