Thinking backwards

When I discussed languages earlier it was from the point of view of parsing or at least recognising them. We receive a list of tokens from the lexical analyser and want to piece them together into larger and larger phrases until we have one phrase encompassing the whole input. This process should be entirely deterministic and should terminate at some point.

The way the grammar describes the language is completely the opposite. Its starting point is the start symbol. It then “expands” that into a list of other symbols, which are then further expanded. We can only expand nonterminals. Once we reach a terminal we have to choose from among tokens composing that terminal and no further expansion is possible. I wrote the word “expand” in quotation marks because the “expansion” might not be any larger than what we started with–it could be a single symbol–and it could even be smaller–an empty list of symbols.

You should think of the grammar as describing a method for generating elements of our language. A list of tokens belongs to the language if and only if this process of expansion starting from the start system could eventually produce it. Interesting languages tend to be infinite and the expansion process described above is nondeterministic because of the multiple possibilities for expanding each symbol, and it need not terminate, so this isn’t a definition which is testable in any obvious way even when you have the full grammar specification.

As an approach to linguistics this is called “generative grammar”. It was developed by Dakṣiputra Pāṇini about two and a half millenia ago.