Set inclusion provides a notion of size of sets. A set is at least as large as any of its subsets and is strictly larger than any of its proper subsets. Inclusion is reflexive, transitive and antisymmetric, since \({ A ⊆ A }\), \({ A ⊆ B }\) and \({ B ⊆ C }\) imply \({ A ⊆ C }\) and \({ A ⊆ B }\) and \({ B ⊆ A }\) imply \({ A = B }\). If there were a set of all sets then \({ R = \{ ( A , B ) : A ⊆ B \} }\) would be a partial order on it, but we’ve already seen that there can be no such set. The construction above does work for any set of sets though and was in fact one of our examples in the section on partial orders. It just doesn’t make sense to apply it to the set of sets because there is no such thing.
Inclusion doesn’t really provide a notion of size which agrees with our intuitive notion of size though. If \({ x }\), \({ y }\) and \({ z }\) are distinct then we would like to be able to say that the set \({ \{ x \} }\) is strictly smaller than the set \({ \{ y , z \} }\), even though it’s not one of its proper subsets. The obvious way to do this is to count the members, but we would like a definition which also works for infinite sets. The standard way to do this is to define the notion of size in terms of the existence of injective functions. There is an injective function from \({ \{ x \} }\) to \({ \{ y , z \} }\). In fact there are two, \({ \{ ( x , y ) \} }\) and \({ \{ ( x , z ) \} }\). There is no injective function from \({ \{ y , z \} }\) to \({ \{ x \} }\).
Motivated by this example, we say that \({ A }\) is no larger than \({ B }\) if there is an injective function from \({ A }\) to \({ B }\). We say that \({ A }\) is of the same size as \({ B }\) if \({ A }\) is no larger than \({ B }\) and \({ B }\) is no larger than \({ A }\). We say that \({ A }\) is strictly smaller than \({ B }\) if \({ A }\) is no larger than \({ B }\) and there is not an injective function from \({ B }\) to \({ A }\).
One case where we know there is an injective function is when \({ A }\) is a subset of \({ B }\). In this case \({ Δ _ A }\) is an injective function from \({ A }\) to \({ B }\). So we get the rather unsurprising result that a subset is no larger than the set of which it’s a subset.
Unwrapping the definitions, \({ A }\) is of the same size as \({ B }\) if there is an injective function from \({ A }\) to \({ B }\) and an injective function from \({ B }\) to \({ A }\). This is certainly true if there is a bijective function from \({ A }\) to \({ B }\). If \({ F }\) is such a function then \({ F }\) is an injective function from \({ A }\) to \({ B }\) and \({ F ^ { - 1 } }\) is an injective function from \({ B }\) to \({ A }\). For finite sets this is the only way for two sets to have the same size. In other words, if \({ A }\) and \({ B }\) are finite sets and \({ F }\) is an injective function from \({ A }\) to \({ B }\) and \({ G }\) is an injective function from \({ B }\) to \({ A }\) then \({ F }\) and \({ G }\) are bijective, although it’s not necessarily the case that \({ G = F ^ { - 1 } }\).
For infinite sets the situation is more complicated. It’s possible for there to be an injective but not bijective function function \({ F }\) from \({ A }\) to \({ B }\) and an injective but not bijective function \({ G }\) from \({ B }\) to \({ A }\). In fact it’s possible to give a simple example. Let \({ A = N }\) and \({ B = N }\) and let \({ F = G = \{ ( x , y ) ∈ N × N : y = x + 1 \} }\). This is the increment function. It’s injective because for any natural number \({ x }\) there is a natural number \({ y }\) such that \({ y = x + 1 }\). It’s not surjective because there is a natural number \({ y }\) for which there is no natural number \({ x }\) with \({ y = x + 1 }\). \({ y = 0 }\) is such a number, and is in fact the only such number. So we can certainly have an injective but not bijective function function \({ F }\) from \({ A }\) to \({ B }\) and an injective but not bijective function \({ G }\) from \({ B }\) to \({ A }\). There is however a useful theorem, the Schröder-Bernstein theorem, which says that in such a case there is always some bijective function \({ H }\) from \({ A }\) to \({ B }\). It follows that \({ A }\) and \({ B }\) are of the same size if and only if there is a bijective function from \({ A }\) to \({ B }\). This isn’t the definition of having the same size but it is equivalent to that definition as a consequence of the Schröder-Bernstein theorem.
One other difference between finite and infinite sets, or perhaps more accurately the same difference from a different point of view, is that an infinite set can be of the same size as one of its proper subsets. For example the set of natural numbers and the set of positive integers are of the same size since the increment function is a bijective function from one to the other, but it is a proper subset.
The notion of size based on injective functions is called cardinality. Sets of the same size are said to have the same cardinality and a set which is strictly smaller than another set is said to have a lower cardinality than it.
Cardinality behaves somewhat like a partial order. It is reflexive in the sense that any set \({ A }\) is of the same size as itself, since the identity function is injective. It is transitive in the sense that if \({ A }\) is no larger than \({ B }\) and \({ B }\) is no larger than \({ C }\) then \({ A }\) is no larger than \({ C }\). This is a consequence of the fact that the composition of an injective function from \({ A }\) to \({ B }\) with an injective function from \({ B }\) to \({ C }\) is an injective function from \({ A }\) to \({ C }\). It is sort of antisymmetric in the sense that if \({ A }\) is no larger than \({ B }\) and \({ B }\) is no larger than \({ A }\) then \({ A }\) is of the same size as \({ B }\). For true antisymmetry this would have to imply that \({ A = B }\) rather than merely that they’re of the same size. Of course “is no larger than” isn’t a true relation because there is no set of sets for it to be a relation on. When we restrict it to subsets of a given set it does become a relation though.
The distinction between a partial order on a set and a total order on a set is that the latter has the additional requirement that for all \({ x }\) and \({ y }\) either \({ ( x , y ) }\) or \({ ( y , x ) }\) is a member. Even though there is no set of sets we can still ask whether for all sets \({ A }\) and \({ B }\) it is true that \({ A }\) is no larger than \({ B }\) or \({ B }\) is no larger than \({ A }\). The answer is yes if at least one of the sets is finite. For infinite sets the answer, based on the axioms presented so far, is maybe. It is not possible to prove this but it is also not possible to disprove it.
Suppose \({ A }\) is a set and \({ B }\) is \({ P A }\), i.e. the set of subsets of \({ A }\). Let \({ F }\) be the set of ordered pairs of the form \({ ( x , \{ x \} ) }\) for \({ x }\) in \({ A }\). It’s easy to check that \({ F }\) is both left total and right unique so it is a function. It’s also easy to see that it is left unique and so is an injective function. It is not right total though because there is no \({ w ∈ A }\) such that \({ ( w , ∅ ) ∈ F }\). So \({ F }\) is not surjective. Since there is an injective function from \({ A }\) to \({ B }\) we conclude that \({ A }\) is no larger than \({ B }\).
Is there some other function \({ G }\) from \({ A }\) to \({ B }\) which is surjective? If there were then we could form the set \[ C = \{ x ∈ A : ∃ D ∈ B : ( x , D ) ∈ G ∧ [ ¬ x ∈ D ] \} . \] \({ G }\) was assumed to be surjective so there is a \({ y ∈ A }\) such that \({ ( y , C ) ∈ G }\). Either \({ y }\) is a member of \({ C }\) or it isn’t. If \({ y }\) is a member of \({ C }\) then there is a \({ D }\) such that \({ ( y , D ) ∈ G }\) and \({ ¬ y ∈ D }\). Now \({ y }\) is a member of \({ C }\) but not of \({ D }\) so \({ C }\) and \({ D }\) are not equal. But \({ ( y , C ) }\) and \({ ( y , D ) }\) belong to \({ G }\), which is right unique, so \({ C }\) must be equal to \({ D }\). So the assumption that \({ y }\) is a member of \({ C }\) leads to a contradiction. Suppose then that \({ y }\) is not a member of \({ C }\). Then there is a set \({ D }\) such that \({ ( y , G ) ∈ G }\) and \({ ¬ y ∈ D }\). Indeed \({ D = C }\) has both these properties. But then the definition of \({ C }\) tells us that \({ y ∈ C }\), which contradicts our assumption that \({ y }\) is not a member of \({ C }\). So \({ y }\) is neither a member of \({ C }\) nor not a member of \({ C }\). The only way to resolve this paradox is that the set \({ C }\) does not in fact exist. But the existence of \({ C }\) follows from that of \({ G }\) by Separation, so \({ G }\) does not exist either. In other words there is no surjective function from \({ A }\) to \({ B }\).
The argument above is known as the Cantor diagonalisation argument.
Using the Schröder-Bernstein theorem one can sharpen this result somewhat. We’ve already seen that there is an injective function from \({ A }\) to \({ B }\). If there were an injective function from \({ B }\) to \({ A }\) then the Schröder-Bernstein theorem would imply the existence of a bijective function from \({ A }\) to \({ B }\) and hence a surjective function from \({ A }\) to \({ B }\). We’ve just seen that there is no such function so there can’t be an injective function from \({ A }\) to \({ B }\). In other words \({ A }\) is strictly smaller than \({ B }\).
For finite sets the size is determined by the number of members and if \({ A }\) has \({ m }\) members then \({ B }\) has \({ 2 ^ m }\) members. We can conclude that \({ m < 2 ^ m }\) for all \({ m }\). This is indeed true, but hardly surprising. For infinite sets we get a more interesting conclusion. If \({ A }\) is an infinite set then \({ P A }\) is strictly larger than \({ A }\), so there are infinite sets which are not of the same size. We don’t have to stop there though. \({ P P A }\) is strictly larger than \({ P A }\) and \({ P P P A }\) is strictly larger than \({ P P A }\). There is no limit on the number of infinite sets of different sizes we can construct.
Incidentally, this gives us a different proof of the fact that there is no set of all sets. If there were then every subset of it would be a set and hence a member of itself so the set of sets would contain its own power set. It would therefore have a power set which is no larger than itself, in contradiction to what we’ve just proved.
We’ve just seen that there are infinite sets of different sizes. We want a notion of sets which are not too infinite. A set is said to be countable if it is no larger than the set of natural numbers.
There are unfortunately two conflicting terminologies in use. One convention is the one given above. The other defines the countable sets to be those which are of the same size as the set of natural numbers. Under the convention I’m using finite sets are countable. Under the other convention they are not. Both conventions agree on calling a set uncountable if the set of natural numbers is strictly smaller than it. The alternative convention has the rather unfortunate property that “uncountable” and “not countable” are not synonyms. Finite sets are neither countable nor uncountable in this convention. Perhaps more importantly, the condition that a set is no larger than the set of natural numbers arises more frequently in both the hypotheses and conclusions of theorems than the condition that a set if of the same size as the set of natural numbers so it’s much more convenient to have a short name for the former condition than for the latter.
There are two unfortunate consequences of this terminological confusion. First, if you read the word countable by itself somewhere other than these notes you can’t be sure what the authors mean unless they have explicitly said which convention they follow. Second, if you write the word countable by itself and don’t specify which convention you follow then no one can be sure what you mean. The standard way to avoid the second problem is to refer to sets which are countable according to the definition at the beginning of this section as “at most countable” and to refer to sets which are countable according to the other convention as “countably infinite”. This involves some redundancy. According to the convention of these notes the words “at most” in “at most countable” are redundant. According to the other convention the word “infinite” in “countably infinite” is redundant.
Using the convention described above, finite sets are countable. This is reasonably straightforward to prove by induction on sets. \({ ∅ }\) satisfies all the requirements to be an injective function from \({ ∅ }\) to \({ N }\) so \({ ∅ }\) is no larger than \({ N }\) and is therefore countable. Suppose \({ A }\) is a countable set. Then there is an injective function from \({ A }\) to \({ N }\). If \({ x }\) is a member of \({ A }\) then \({ A ⋃ \{ x \} = A }\) and so \({ F }\) is also an injective function from \({ A ⋃ \{ x \} }\) to \({ N }\). If \({ x }\) is not a member of \({ A }\) then we can define a set of ordered pairs \({ G }\) whose members are \({ ( x , 0 ) }\) and \({ ( y , m + 1 ) }\) for all \({ ( y , m ) }\) in \({ F }\). This \({ G }\) is an injective function from \({ A ⋃ \{ x \} }\) to \({ N }\). So in either case there is an injective function from \({ A ⋃ \{ x \} }\) to \({ N }\) and so \({ A ⋃ \{ x \} }\) is countable. So \({ ∅ }\) is countable and if \({ A }\) is countable then so is \({ A ⋃ \{ x \} }\). By induction on sets it follows that all finite sets are countable.
Subsets of countable sets are countable. Suppose that \({ A }\) is a subset of \({ B }\) and \({ B }\) is countable, i.e. there is an injective function \({ G }\) from \({ B }\) to \({ N }\). Define \({ F }\) to be the subset of ordered pairs in \({ G }\) whose left element is a member of \({ A }\). Then \({ F }\) is an injective function from \({ A }\) to \({ N }\), so \({ A }\) is countable. It follows from this that if \({ B }\) is countable and \({ C }\) is a set then both \({ B ⋂ C }\) and \({ B ∖ C }\) are countable, since they are subsets of \({ B }\).
The union of two countable sets is countable. To see this, suppose \({ A }\) and \({ B }\) are countable. We’ve just seen above that \({ A ∖ B }\) is then countable, i.e. that there is an injective function from \({ A ∖ B }\) to \({ N }\). Let \({ F }\) be such a function. \({ B }\) is countable so there is an injective function \({ G }\) from \({ B }\) to \({ N }\). Define \({ H }\) to be the set of pairs either of the form \({ ( x , 2 · m ) }\) where \({ ( x , m ) ∈ F }\) or of the form \({ ( y , 2 · m + 1 ) }\) where \({ ( y , m ) ∈ G }\). Then \({ H }\) is an injective function from \({ A ⋃ B }\) to \({ N }\) so \({ A ⋃ B }\) is countable.
\({ N }\) itself is countable since \({ Δ _ N }\) is an injective function from \({ N }\) to \({ N }\). Perhaps surprisingly \({ N × N }\) is also countable. It’s possible to write down an injective function from \({ N × N }\) to \({ N }\) explicitly. Such a function is given by the set of ordered pairs \({ ( ( i , j ) , k ) }\) where \[ k = ( i + j ) ( i + j + 1 ) / 2 + j . \] The division by two is permissible because \({ ( i + j ) ( i + j + 1 ) }\) is always even, as we can prove by induction.
The function above may appear mysterious but it is easily explained by the following picture. \[ \begin{array}{c|ccccccc} \vdots \\ 5 & 20 \\ 4 & 14 & 19 \\ 3 & 9 & 13 & 18 \\ 2 & 5 & 8 & 12 & 17 \\ 1 & 2 & 4 & 7 & 11 & 16 \\ 0 & 0 & 1 & 3 & 6 & 10 & 15 \\ \hline & 0 & 1 & 2 & 3 & 4 & 5 & \cdots \end{array} \] The horizontal axis is labelled by the \({ i }\) values, the vertical axis by the \({ j }\) values and the element in the \({ i }\)’th column, \({ j }\), row, counting from the bottom left and starting at \({ 0 }\), is the corresponding \({ k }\) value. You can see that these numbers are obtained by visiting the pairs in a particular order, working one diagonal at a time and going from the lower right to the upper left within that diagonal. Working out how many points in the grid are visited before the given point gives exactly the expression above. and the fact that this function is injective is simply the fact that this procedure never reuses a natural number. This is visually obvious but rather tedious to prove.
More generally, if \({ A }\) and \({ B }\) are countable then so is \({ A × B }\). To see this note that in this case there are injective functions \({ F }\) from \({ A }\) to \({ N }\) and \({ G }\) from \({ B }\) to \({ N }\). Define \({ H }\) to be set of pairs of pairs \({ ( ( x , y ) , ( m , n ) ) }\) such that \({ ( x , m ) ∈ F }\) and \({ ( y , n ) ∈ G }\). Then \({ H }\) is an injective function from \({ A × B }\) to \({ N × N }\). Composing this with the injective function we already have from \({ N × N }\) to \({ N }\) gives an injective function from \({ A × B }\) to \({ N }\), so \({ A × B }\) is countable.
In particular, if \({ A }\) is countable then so is \({ A ^ 2 }\). Since the Cartesian product of two countable sets is countable it follows that \({ A ^ 2 × A }\) is countable. There is an injective function from \({ A ^ 3 }\) to \({ A ^ 2 × A }\) consisting of the ordered pairs of the form \({ ( ( x , y , z ) , ( ( x , y ) , z ) ) }\). This function is also surjective, but we won’t need that. The fact that it is injective means, together with the fact we just proved that \({ A ^ 2 × A }\), implies that \({ A ^ 3 }\) is countable.
Less obviously, if \({ A }\) is countable then so is the set of all lists all of whose items are members of \({ A }\). This fact is of great importance in the study of formal languages. Since subsets of countable sets are countable and languages are sets of lists of tokens it follows that every language with a countable number of tokens is countable.
Here is a sketch of a proof of the statement above. \({ A }\) is countable so there is an injective function \({ F }\) from \({ A }\) to \({ N }\). Define a function \({ G }\) from \({ A ^ * }\) to \({ N ^ 3 }\) as follows. \({ ( w , ( x , y , z ) ) ∈ G }\) if \({ x }\) is the number of elements in the list \({ w }\), \({ y }\) is the least natural number \({ n }\) such that if \({ v }\) is an item in \({ w }\) and \({ ( v , m ) ∈ F }\) then \({ m < n }\), and \({ z }\) is natural number whose base \({ n }\) representation has as it’s \({ j }\)’th digit the number \({ k }\) where \({ ( v , k ) ∈ F }\) and \({ ( v ) }\) is the \({ j }\)’th item in the list. This \({ G }\) is an injective function. There is an injective function \({ H }\) from \({ N ^ 3 }\) to \({ N }\). Then \({ H ∘ G }\) is an injective functions from \({ A ^ * }\) to \({ N }\), so \({ A ^ * }\) is countable.
It’s easy to produce uncountable sets. \({ P N }\) is uncountable. If it were countable then there would be an injective function from \({ P N }\) to \({ N }\) but we’ve already seen that there can be no such function.
Let \({ A }\) be the set of arithmetic sets, i.e. subsets of \({ N }\) for which there is a Boolean expression in our language for arithmetic which is a necessary and sufficient condition for membership in the set. Choose some encoding of that language into \({ N }\). Consider those pairs \({ ( B , x ) }\) with \({ B ∈ A }\) and \({ x ∈ N }\) such that \({ x }\) is the natural number which encodes a Boolean expression characterising membership in \({ B }\). The set of such pairs is an injective function from \({ A }\) to \({ N }\), so \({ A }\) is countable.
\({ A }\) is not \({ P N }\) because \({ A }\) is countable and \({ P N }\) is uncountable. \({ A }\) is a subset of \({ P N }\) so there must therefore be a member of \({ P N }\) which is not a member of \({ A }\). In other words there is a subset of \({ N }\) which is not arithmetic. We’ve already seen an example, without a proof, of such a set, namely the set of encodings of true statements. That’s a hard theorem though while the proof above, while it doesn’t provide any examples, is quite easy.
A similar argument shows that there is a language which has no phrase structure grammar. We choose as our set of tokens a non-empty countable set. Let \({ A }\) be the set of lists of tokens. \({ A }\) is then countable. We can say a bit more than that though. Since the set of tokens is non-empty we can choose one and look at the set of lists using only that token. This, as we discussed, is essentially a copy of \({ N }\). So \({ N }\) is no larger than \({ A }\) but \({ A }\) is also no larger than \({ N }\) because it’s countable. Therefore \({ A }\) is of the same size as \({ N }\). It follows that \({ P A }\), which is the list of languages using only those tokens, is uncountable. Any phrase structure grammar for \({ A }\) is a list of tokens. These tokens belong to the original list of tokens or are tokens like “:”, “|”, or “;” which belong to our language for describing languages. There are only finitely many of the latter though so the full set of tokens is still countable and therefore the set of phrase structure grammars is countable. There are fewer phrase structure grammars than languages so there is a language without a grammar.
As with arithmetic sets, it is possible to give concrete examples of languages with no phrase structure grammar but the proofs are much harder than the simple counting argument above.