Choice

The axioms we’ve seen so far are arguably sufficient for all of computer science and are sufficient for some mathematics, including at least elementary arithmetic. There is some standard mathematics for which they are insufficient though.

From now on I’ll use \({ N }\) to denote the set of natural numbers. It won’t matter what representation of \({ N }\) we use.

Dependent choice

Consider the following two statements.

Suppose \({ R }\) is a left total relation on a set \({ A }\). For every \({ x ∈ A }\) there is a function \({ F }\) from \({ N }\) to \({ A }\) such that \({ ( 0 , x ) ∈ F }\) and if \({ ( n , y ) ∈ F }\) and \({ ( n + 1 , z ) ∈ F }\) then \({ ( y , z ) ∈ R }\).
For every set \({ A }\) and partial order \({ R }\) on \({ A }\) if every chain in \({ A }\) is finite then \({ A }\) contains a maximal element.

The chains and maximal elements in the second statement are understood as being with respect to the partial order \({ R }\).

The meaning of these statements may not be immediately obvious but the first one, at least, has a fairly nice interpretation in terms non-deterministic computations. We think of \({ A }\) as the state space and \({ x }\) as the initial state. The function \({ F }\) simply tells us which state the computation is in after a given number of steps. The relation \({ R }\) is the one which gives the allowed transitions. The condition that if \({ ( n , y ) ∈ F }\) and \({ ( n + 1 , z ) ∈ F }\) then \({ ( y , z ) ∈ R }\) is then the statement that each transition is one of the allowed ones and the condition that \({ ( 0 , x ) ∈ F }\) is of course the condition that that the computation starts in the start state. The condition that \({ R }\) is left total means that for any allowed state there is at least one state to which the system can transition, so the computation is never forced to terminate due to a lack of options. We can therefore give the first statement the interpretation that a non-deterministic computation for which there is at least one transition from each allowed state can run forever.

There are two important conditions under which we can prove the first statement using the axioms we’ve introduced so far. One is when the computation is deterministic, i.e. when \({ F }\) is not just a left total relation but a function. The other is when the set \({ A }\) of possible states is countable. Those two cases are probably sufficient for computer science.

The second statement is harder to attach an intuitive meaning to but it is in some sense simpler. It’s certainly shorter and it doesn’t refer to the natural numbers so it makes sense even without the Axiom of Infinity.

If we assume the Axiom of Infinity then the two statements above are in fact equivalent, in the sense that from either we can prove the other. They are independent of our axioms though in the sense that neither of them can be proved from those axioms. If we want them to be true then we’ll need to take one of them as an axiom, or choose some other axiom which implies them. I’ll choose the second one.

Dependent Choice: For every set \({ A }\) and partial order \({ R }\) on \({ A }\) if every chain in \({ A }\) is finite then \({ A }\) contains a maximal element.

Zorn’s Lemma

If we think of classical mathematics as covering the integral and differential calculus, complex analysis, ordinary and partial differential equations, number theory, differential geometry and classical algebraic geometry then the axioms above are a sufficient foundation. There are some parts of modern mathematics which require something stronger than the Axiom of Dependent Choice though. There are a few different, but equivalent, choices for what this something is but the most popular is the following.

Zorn’s Lemma: For every set \({ A }\) and partial order \({ R }\) on \({ A }\) if every chain in \({ A }\) has an upper bound in \({ A }\) then \({ A }\) contains a maximal element.

Every finite chain has a maximal element and this maximal element is an upper bound so the Axiom of Dependent Choice follows from Zorn’s Lemma. Zorn’s Lemma does not follow from the Axiom of Dependent Choice though. We can therefore think of Zorn’s Lemma as a generalisation of the Axiom of Dependent Choice.

Despite its name, Zorn’s Lemma is normally taken as an axiom and so is not a lemma. It’s also due to Kuratowski rather than to Zorn. The word lemma in its name dates to a time when it was traditional to take the following as an axiom:

Choice: Suppose \({ A }\) is a set of non-empty sets such that if \({ B ∈ A }\) and \({ C ∈ A }\) then \({ B = C }\) or \({ B ∩ C = ∅ }\). Then there is a set \({ D }\) such that for all \({ B ∈ A }\) the intersection \({ B ∩ D }\) has exactly one member.

\({ D }\) can be thought of as selecting a single member from each member of \({ A }\). That’s why it’s called the Axiom of Choice.

Kuratowski showed that the “lemma” is a consequence of this axiom. Zorn stated the axiom also follows from the lemma and promised to prove this, but didn’t. It is nonetheless true. The two statements are equivalent, just as the two statements in the previous section were. Which one to take as an axiom and which to prove as a consequence is then just a matter of convenience. The modern approach is to take the “lemma” as an axiom and prove the “axiom” as a theorem.

Banach-Tarski

Unfortunately the Axiom of Choice has some rather unsettling consequences. Perhaps the most counterintuitive of these is the Banach-Tarski Paradox in geometry. Assuming Zermelo-Fraenkel plus the Axiom of Choice one can show that there are sets \({ A _ 1 }\), \({ A _ 2 }\), \({ A _ 3 }\), \({ A _ 4 }\), \({ A _ 5 }\), \({ B _ 1 }\), \({ B _ 2 }\), \({ B _ 3 }\), \({ C _ 1 }\), \({ C _ 2 }\), \({ C _ 3 }\), \({ C _ 4 }\) and \({ C _ 5 }\) in three dimensional Euclidean space with the following properties.

\({ B _ 1 }\), \({ B _ 2 }\) and \({ B _ 3 }\) are disjoint balls of radius 1.
\({ A _ 1 }\) is congruent to \({ C _ 1 }\), \({ A _ 2 }\) is congruent to \({ C _ 2 }\), \({ A _ 3 }\) is congruent to \({ C _ 3 }\), \({ A _ 4 }\) is congruent to \({ C _ 4 }\), and \({ A _ 5 }\) is congruent to \({ C _ 5 }\).
\({ A _ 1 }\), \({ A _ 2 }\), \({ A _ 3 }\), \({ A _ 4 }\) and \({ A _ 5 }\) are disjoint and their union is \({ B _ 1 ∪ B _ 2 }\).
\({ C _ 1 }\), \({ C _ 2 }\), \({ C _ 3 }\), \({ C _ 4 }\) and \({ C _ 5 }\) are disjoint and their union is \({ B _ 3 }\).

In other words, we can take a ball, split it into five pieces, move those pieces via a rigid motion, i.e. a combination of translations, reflections and rotations, and reassemble them to form two balls of the same radius as the original one.

Most mathematicians are not particular bothered by paradoxes like the one above. In their view it shows that it’s possible to find really weird bounded subsets of Euclidean space, weird enough that one can’t attach a notion of volume to them in any consistent way, but not as a problem with set theory. Some mathematicians reject the Axiom of Choice entirely. Others accept only weaker versions, like the Axiom of Dependent choice, which do not imply the existence of the paradoxical sets appearing in Banach-Tarski theorem.