Axiom(s) of choice

There are a variety of axioms with similar names, all of which involve the word choice in some way. It’s unclear whether any of them are really needed for Computer Science. Large parts of mathematics also don’t require any of them but certain subjects need at least the Axiom of Dependent Choice in order to prove some of their main theorems. There is a stronger axiom, known just as the Axiom of Choice which is often assumed. As we’ll see though, it has some disturbing consequences.

Computational paths

One way to think of the Axiom of Dependent Choice is in terms of the computational paths of non-deterministic state machines. We’ll consider a particular simple type of such machines. These do not read any input. They have a well defined initial state and for each possible state there are one or more possibilities for the next state. Because there is at least one possibility in each state the computation will never terminate. To model such a machine we need a set \({ A }\) of possible states, a particular \({ w ∈ A }\) to serve as the initial state and a relation \({ T }\) on \({ A }\) describing the possible state transitions. More precisely \({ ( x , y ) ∈ T }\) if and only if the machine can transition from \({ x }\) to \({ y }\) in a single step. The condition that there is at least one possible transition from each state is then equivalent to the statement that \({ T }\) is left total. The machine is deterministic if there is also at most one possible transition from each state, i.e. if \({ T }\) is right unique. In this case it is a function from \({ A }\) to \({ A }\).

A simple deterministic state machine of this kind can compute the powers of two. We just take \({ A = N }\), \({ w = 1 }\) and take \({ F }\) to be the set of pairs of the form \({ ( x , 2 · x ) }\). A more interesting example is the Fibonacci sequence. This might not seem to fit the pattern described above because the \({ n }\)’th Fibonacci number depends on the \({ n - 1 }\)’st and the \({ n - 2 }\)’nd. This is easily fixed though. we take \({ A = N ^ 2 }\) and take \({ w = ( 0 , 1 ) }\). \({ T }\) is the set of pairs of the form \({ ( ( j , k ) , ( k , j + k ) ) }\). The left element of the state after \({ n }\) steps is then the \({ n }\)’th Fibonacci number.

By a computational path we will mean a sequence, finite or infinite, of states through which the process can proceed starting from the initial state and making only those transitions allowed by the transition relation. More formally, it’s a function \({ F }\) whose domain is either all of \({ N }\) or a subset of the form \({ \{ 0 , 1 , \ldots , n \} }\) with the properties that

The interpretation is that \({ ( m , x ) ∈ F }\) means that the machine is in the state \({ x }\) after \({ m }\) steps. The first of the statements above expresses the condition that the initial state is \({ w }\) while the second expresses the condition that the transitions from one state to the next are only those allowed by \({ T }\). We also need to express the condition on the domain of \({ F }\) stated above. The simplest way to do this is as follows.

This says that if \({ n }\) is in the domain of \({ F }\) then so are all smaller numbers. Finally we need to express the fact that \({ F }\) is right unique, i.e. that the machine can only be in one state at a given time.

A computational path is any subset of \({ N × A }\) satisfying the conditions above. I’ll denote the set of all computational paths by \({ B }\).

The first thing I’ll show is that there is no upper bound on the length of computational paths. More precisely, for any \({ n ∈ N }\) there is an \({ F ∈ B }\) and an \({ x }\) in \({ A }\) such that \({ ( n , x ) ∈ F }\). This is proved by induction on \({ n }\). For \({ n = 0 }\) it’s true because \({ x = w }\) and \({ F = \{ ( 0 , w ) \} }\) has the required properties. If it’s true for \({ n }\) then we can prove it for \({ n + 1 }\) as follows. Let \({ G }\) be the set consisting of \({ ( k , z ) }\) where either \({ k ≤ n }\) and \({ ( k , z ) ∈ F }\) or \({ k = n + 1 }\) and \({ z = y }\), where \({ y }\) is such that \({ ( x , y ) ∈ T }\). We know there is such a \({ y }\) because \({ T }\) is left total. Then \({ G ∈ B }\) and \({ ( n + 1 , y ) ∈ G }\). This establishes the inductive step.

It would also have been possible to use set induction rather than arithmetic induction above but arithmetic induction is probably more familiar.

If \({ F }\) and \({ G }\) are both members of \({ B }\) their union \({ F ⋃ G }\) may fail to be a member for a variety of reasons, including not being right unique. This can’t happen for deterministic machines though. If \({ T }\) is right unique, \({ F ∈ B }\) and \({ G ∈ B }\) then \({ F ⋃ G ∈ B }\). The hardest part of showing this is the proof that \({ F ⋃ G }\) is right unique. This is proved by induction. More precisely, we show by induction on \({ n }\) that if \({ ( n , v ) ∈ F }\) and \({ ( n , x ) ∈ G }\) then \({ v = x }\). For \({ n = 0 }\) this is certainly true because then we must have \({ v = w }\) and \({ x = w }\). If it’s true for \({ n }\) then it’s true for \({ n + 1 }\). To see this we note that if \({ ( n + 1 , z ) ∈ F }\) and \({ ( n + 1 , y ) ∈ G }\) then \({ ( v , z ) ∈ T }\) and \({ ( x , y ) ∈ T }\). But \({ v = x }\) so the assumed right uniqueness of \({ T }\) implies that \({ y = z }\).

Still assuming that \({ T }\) is right unique, we have \({ ⋃ B ∈ B }\). Again the right uniqueness is the hardest part to prove. If \({ ( n , v ) ∈ ⋃ B }\) and \({ ( n , x ) ∈ ⋃ B }\) then there must be \({ F ∈ B }\) and \({ G ∈ B }\) such that \({ ( n , v ) ∈ F }\) and \({ ( n , x ) ∈ G }\). But we’ve just seen that this implies \({ v = x }\), so \({ ⋃ B }\) is right unique.

We’ve already seen that for every \({ n ∈ N }\) there is an \({ F ∈ B }\) and and \({ x ∈ A }\) such that \({ ( n , x ) ∈ F }\). We then also have \({ ( n , x ) ∈ ⋃ B }\). So \({ n }\) is a member of the domain of \({ ⋃ B }\). This is true for all \({ n ∈ N }\) so the domain of \({ ⋃ B }\) is therefore all of \({ N }\). Thus \({ ⋃ B }\) is a computational path of infinite length. Previously we knew that there was no upper bound on the lengths of computational paths but we didn’t know there was a computational path of infinite length.

Compared to a deterministic state machine a non-deterministic state machine offers at least as many options at each step so it might seem intuitive that if every deterministic state machine has an infinite computational path then so does every non-deterministic one. The proof above used the assumption that \({ T }\) is right unique in an essential way though.

In the case where \({ A }\) is countable one can still prove the existence of infinite computational paths without the assumption that \({ T }\) is right unique. If \({ A }\) is uncountable then this is no longer possible. If we want the statement to be true then we need to add it as an axiom.

Dependent choice

Our new axiom is as follows.

As you can see, this is an exact translation of the statement that non-deterministic state machines of the type considered in the previous section have infinite computational paths.

This formulation of the axiom, which is the standard one, has an intuitive appeal but it’s formalisation is quite complicated because it has buried in it notions like functions and relations and the set of natural numbers. There are equivalent axioms which are less intuitively appealing but easier to formalise. The following simple one appears to be due to Wolk:

By chain here we a mean subset such that the partial order, when restricted to the subset, is a total order. You can check that this usage is consistent with the definition of Kuratowski chains given earlier, in the sense that every Kuratowski chain is a chain, with the partial order being given by set inclusion.

It’s worth noting that for the application to state machines we only really need the Axiom of Dependent Choice when the set of states is uncountatble. If \({ A }\) is countable and \({ T }\) is a left unique function from \({ A }\) to \({ A }\) we can construct a function \({ F }\) from \({ A }\) to \({ A }\) as follows. The countability of \({ A }\) means there is an injective function \({ G }\) from \({ A }\) to \({ N }\). For any \({ x ∈ A }\) the set of \({ z ∈ N }\) for which there is a \({ y ∈ A }\) with \({ ( x , y ) ∈ T }\) and \({ ( y , z ) ∈ G }\) is non-empty, since \({ T }\) and \({ G }\) are left total, and so has a least member. If we call this least member \({ z }\) then there is only one \({ y ∈ A }\) such that ${ \({ ( y , z ) ∈ G }\), since \({ G }\) is injective. We take \({ ( x , y ) }\) to be a member of \({ F }\), and do not take any other ordered pair whose left component is \({ x }\). Then \({ F }\) is a function and \({ F ⊆ T }\). Because \({ F }\) is a function the deterministic state machine for which it is the transition relation must an infinite computional path, even without assumine the Axiom of Dependent Choice. Because \({ F }\) is a subset of \({ T }\) any computational path for this deterministic state machine is also a valid computational path for the non-deterministic machine with transition relation \({ T }\). There aren’t many situations in computer science where you’d want to consider a state machine with uncountably many states. There are quite a few in mathematics though.

The Axiom of Choice

Zermelo had one further axiom, which is not generally included in what we now call Zermelo-Fraenkel set theory. This is the Axiom of Choice. In his initial formulation this axiom said that for any set of disjoint non-empty sets there is a set which has precisely one member from each of those sets. The restriction to non-empty sets is clearly necessary since you can’t have precisely one member from a set with no members.

It turns out that there are many different axioms one could take which are equivalent to this one, in the sense that if one assumes any one of them as an axiom then the rest all become theorems. Some of these formulations have more intuitive appeal than others. Zermelo’s version at least has the advantage that it’s clear why it’s named the Axiom of Choice. The set whose existence it asserts is obtained by choosing one element from each of the sets in our set. There are whole books devoted to equivalents of the Axiom of Choice.

The Axiom of Choice is known to give a consistent formal system, at least if we assume that Zermelo-Fraenkel is a consistent system. It’s also known not to be a theorem in the Zermelo-Fraenkel system, assuming ZF is consistent. In other words one can prove that it can’t be proved. The same, of course, applies to any of the statements known to be equivalent to the Axiom of Choice.

For the Axiom of Dependent Choice there was an equivalent statement in terms of chains. There is one for the Axiom of Choice as well. It goes by the name of Zorn’s lemma, even though it is often taken as an axiom in place of the Axiom of Choice.

A subset \({ B }\) of a set \({ A }\) with a partial order \({ R }\) is said to be bounded if there is a \({ y ∈ A }\) such that \({ ( x , y ) ∈ R }\) for all \({ x ∈ B }\). This might look like the definition of a greatest element but there is a crucial difference. We only require \({ y }\) to be a member of \({ A }\), not of \({ B }\). An example of a bounded subset is the set of rational numbers \({ x }\) such that \({ 0 < x }\) and \({ x < 1 }\). this is a bounded subset of the rationals, with the partial order given by \({ ≤ }\), because \({ x ≤ 1 }\) for all such \({ x }\). In fact this set is also a chain. You may wonder whether this example contradicts Zorn’s lemma, since the rationals do not have a maximal element. It doesn’t. The hypothesis in Zorn’s lemma is that every chain is bounded. This particular one is but some others are not. \({ N }\), the set of natural numbers, is an example of an unbounded chain.

Banach-Tarski

Unfortunately the Axiom of Choice has some rather unsettling consequences. Perhaps the most counterintuitive of these is the Banach-Tarski Paradox in geometry. Assuming Zermelo-Fraenkel plus the Axiom of Choice one can show that there are sets \({ A _ 1 }\), \({ A _ 2 }\), \({ A _ 3 }\), \({ A _ 4 }\), \({ A _ 5 }\), \({ B _ 1 }\), \({ B _ 2 }\), \({ B _ 3 }\), \({ C _ 1 }\), \({ C _ 2 }\), \({ C _ 3 }\), \({ C _ 4 }\) and \({ C _ 5 }\) in three dimensional Euclidean space with the following properties.

In other words, we can take a ball, split it into five pieces, move those pieces via a rigid motion, i.e. a combination of translations, reflections and rotations, and reassemble them to form two balls of the same radius as the original one.

Most mathematicians are not particular bothered by paradoxes like the one above. In their view it shows that it’s possible to find really weird bounded subsets of Euclidean space, weird enough that one can’t attach a notion of volume to them in any consistent way, but not as a problem with set theory. Some mathematicians reject the Axiom of Choice entirely. Others accept only weaker versions, like the Axiom of Dependent choice, which do not imply the existence of the paradoxical sets appearing in Banach-Tarski theorem.