The definition of language given above is deliberately very broad, but it is really too broad to be useful. In this it is similar to notions like binary relation or binary operation discussed earlier. Practically useful examples have more structure. As in abstract algebra, there is a hierarchy of levels of structure. The main levels of this hierarchy, from most restrictive to least, are
finite
regular
deterministic context free
context free
context sensitive
recursive
recursively enumerable
general
The easiest of these to define are finite, which just means a finite set of lists of tokens, and general, which is any set of lists of tokens. The levels in between have more complicated definitions, but are more useful.
Each level in the hierarchy above includes all the lower levels, so every finite language is regular, every regular language is context free, etc. The step which is most likely to cause confusion is that every context free language is context sensitive. “Context sensitive” doesn’t really mean that the language is sensitive to context, merely that it could be, while context free means that it definitely isn’t.
This sort of terminology is often used in mathematics. In the theory of linear equations we make a distinction between homogeneous equations and inhomogeneous equations. Homogeneous equations have zero constant term. Inhomogeneous equations aren’t required to have zero constant term but are certainly allowed to. This means that every homogeneous equation is inhomogeneous. That certainly sounds weird but we define things in this way because there’s simply nothing of interest to be said about equations whose constant term is non-zero which doesn’t apply equally well when the constant term is zero. Similarly, the class of languages which are context sensitive but not context free simply has no interesting properties and therefore isn’t worth naming.
A good rule of thumb when developing a language for a specific purpose is to choose one as low as possible in the hierarchy, and to describe it by a grammar at that level, or not much higher. Most modern programming languages are technically context sensitive, but try to segregate their context sensitive features as much as possible. The remainder is context free, with significant parts which are regular or even finite.