Relations

A binary relation is a set of ordered pairs. From now on I’ll just use relation as shorthand for binary relation unless otherwise specified since we’re mostly concerned with binary relations. The definition above is too general to be of much use. We really need to impose more conditions to get any interesting properties but there are a few useful definitions that make sense in this level of generality.

Basic definitions

A relation \({ R }\) is called diagonal if \({ ( x , y ) ∈ R }\) implies \({ x = y }\). For any set \({ A }\) we can define the relation \({ I _ A }\) as the set of all ordered pairs \({ ( x , x ) }\) for \({ x ∈ A }\). This is a diagonal relation and is called the diagonal relation on \({ A }\). These are in fact the only examples of diagonal relations.

The domain of a relation \({ R }\) is the set of \({ x }\) such that there is a \({ y }\) with \({ ( x , y ) ∈ R }\). The range of \({ R }\) is the set of \({ y }\) such that there is an \({ x }\) with \({ ( x , y ) ∈ R }\). The range and domain of \({ I _ A }\) are just \({ A }\).

The inverse of a relation \({ R }\) is the set of all ordered pairs \({ ( x , y ) }\) such that \({ ( y , x ) ∈ R }\). I’ll denote it by \({ R ^ { - 1 } }\). Note that \({ R ^ { - 1 } }\) is a set of ordered pairs so it is also a relation. We can therefore take its inverse, \({ { R ^ { - 1 } } ^ { - 1 } }\). Now \({ ( x , y ) ∈ { R ^ { - 1 } } ^ { - 1 } }\) if and only if \({ ( y , x ) ∈ { R ^ { - 1 } } }\), which happens if and only if \({ ( x , y ) ∈ R }\). By the Axiom of Extensionality it follows that \[ { R ^ { - 1 } } ^ { - 1 } = R . \]

Given two relations \({ R }\) and \({ S }\) we can define their composition, which is written \({ R ∘ S }\), defined to be the set of ordered pairs \({ ( x , z ) }\) such that there is a \({ y }\) with \({ ( x , y ) ∈ S }\) and \({ ( y , z ) ∈ R }\). The name is standard and the notation somewhat standard, but most authors reverse the roles of \({ R }\) and \({ S }\) in the definition. The problem with doing that is that functions, as we’ll see, are a kind of relation and the standard notation for composition of functions writes them in reverse order, i.e. \({ f ∘ g }\) is the result of applying \({ g }\) and then \({ f }\). To accommodate this convention, which is unfortunate but too well established to attempt to change, it’s necessary to do composition of relations in the reverse order as well. The composition of relations is also a relation, and so can be composed with other relations. This has the associativity property \[ ( R ∘ S ) ∘ T = R ∘ ( S ∘ T ) . \] If the domain of \({ R }\) is a subset of \({ A }\) then \({ R ∘ I _ A = R }\). If the range of \({ R }\) is a subset of \({ A }\) then \({ I _ A ∘ R = R }\).

Another useful identity is \[ ( R ∘ S ) ^ { - 1 } = ( S ^ { - 1 } ) ∘ ( R ^ { - 1 } ) . \]

A relation \({ R }\) is said to be symmetric if \({ R = R ^ { - 1 } }\), i.e. if \({ ( x , y ) ∈ R }\) if and only \({ ( y , x ) ∈ R }\). It’s said to be transitive if \({ R ∘ R ⊆ R }\), i.e. if \({ ( x , z ) ∈ R }\) whenever \({ ( x , y ) ∈ R }\) and \({ ( y , z ) ∈ R }\). The diagonal relation on a set is always symmetric and transitive. If \({ R }\) is transitive then so is \({ R ^ { - 1 } }\).

A relation \({ R }\) is said to be antisymmetric if \({ R ∩ R ^ { - 1 } }\) is diagonal or, equivalently if \({ ( x , y ) ∈ R }\) and \({ ( y , x ) ∈ R }\) imply \({ x = y }\). The terminology is unfortunate since antisymmetric is not the opposite of symmetric. A relation can be symmetric and antisymmetric. Diagonal relations, for example, are both symmetric and antisymmetric. It’s also possible for a relation to be neither symmetric nor antisymmetric. Note that if \({ R }\) is antisymmetric then so is \({ R ^ { - 1 } }\).

A relation \({ R }\) is said to be left unique if \({ R ^ { - 1 } ∘ R }\) is diagonal. In other words, if \({ x = z }\) whenever there is a \({ y }\) such that \({ ( x , y ) ∈ R }\) and \({ ( y , z ) ∈ R ^ { - 1 } }\) or, equivalently, whenever \({ ( x , y ) ∈ R }\) and \({ ( z , y ) ∈ R }\). In other words, for any \({ y }\) there is at most one ordered pair has \({ y }\) as its right element. Similarly \({ R }\) is said to be right unique if \({ R ∘ R ^ { - 1 } }\) is diagonal, which is equivalent to saying that for any \({ x }\) there is at most one ordered pair with \({ x }\) as its left element. This may seem backwards but this use of left and right is standard.

If the relation \({ R }\) is a subset of the Cartesian product \({ A × B }\) then we say that it’s a relation from \({ A }\) to \({ B }\) and if \({ R }\) is a subset of \({ A × A }\) then we say that it’s a relation on \({ A }\). If \({ R }\) is a relation from \({ A }\) to \({ B }\) then \({ R ^ { - 1 } }\) is a relation from \({ B }\) to \({ A }\). In particular if \({ R }\) is a relation on \({ A }\) then so is \({ R ^ { - 1 } }\). If \({ R }\) is a relation from \({ B }\) to \({ C }\) and \({ S }\) is a relation from \({ A }\) to \({ B }\) then \({ R ∘ S }\) is a relation from \({ A }\) to \({ C }\). In particular if \({ R }\) and \({ S }\) are relations on \({ A }\) then so is \({ R ∘ S }\).

Examples

As examples of the properties above, consider the following relations on the set of natural numbers:

\({ R }\) is the set of \({ ( x , y ) }\) with \({ x = y }\).
\({ S }\) is the set of \({ ( x , y ) }\) with \({ x ≤ y }\).
\({ T }\) is the set of \({ ( x , y ) }\) with \({ x < y }\).
\({ U }\) is the set of all \({ ( x , y ) }\).
\({ V }\) is the set of \({ ( x , y ) }\) with \({ x ≠ y }\).

The domain of \({ R }\), \({ S }\), \({ T }\), \({ U }\), and \({ V }\) is the set of natural numbers. The range is also the set of natural numbers in each case, except that of \({ T }\), whose range is the set of positive integers since every positive integer is greater than some natural number and every natural number which is greater than some natural number is a positive integer.

\({ R }\) is diagonal. None of the other relations are. It is also the only one which is left or right unique.

Now

\({ R ^ { - 1 } }\) is the set of \({ ( x , y ) }\) with \({ x = y }\), i.e. just \({ R }\), so \({ R }\) is symmetric and antisymmetric. As mentioned above, diagonal relations are always symmetric and antisymmetric.
\({ S ^ { - 1 } }\) is the set of \({ ( x , y ) }\) with \({ x ≥ y }\), which is not the same as \({ S }\), so \({ S }\) is not symmetric. \({ S ∩ S ^ { - 1 } = R }\) and \({ R }\) is diagonal so \({ S }\) is antisymmetric.
\({ T ^ { - 1 } }\) is the set of \({ ( x , y ) }\) with \({ x > y }\), which is not the same as \({ T }\), so \({ T }\) is also not symmetric. \({ T ∩ T ^ { - 1 } = ∅ }\) and \({ ∅ }\) is diagonal so \({ T }\) is antisymmetric.
\({ U ^ { - 1 } }\) is the set of all \({ ( x , y ) }\), which is the same as \({ U }\), so \({ U }\) is symmetric. \({ U ∩ U ^ { - 1 } = U }\) and \({ U }\) is not diagonal, so \({ U }\) is not antisymmetric.
\({ V ^ { - 1 } }\) is the set of \({ ( x , y ) }\) with \({ x ≠ y }\), which is the same as \({ V }\), so \({ V }\) is also symmetric. \({ V ∩ V ^ { - 1 } = V }\) and \({ V }\) is not diagonal so \({ V }\) is not antisymmetric.

and

\({ R ∘ R }\) is the set of \({ ( x , y ) }\) with \({ x = y }\), i.e. \({ R }\), which is a subset of \({ R }\), so \({ R }\) is transitive. As mentioned above diagonal relations are always transitive.
\({ S ∘ S }\) is the set of \({ ( x , y ) }\) with \({ x ≤ y }\), i.e. \({ S }\), so \({ S }\) is transitive.
\({ T ∘ T }\) is the set of \({ ( x , y ) }\) with \({ x + 1 < y }\), which is a subset of \({ T }\), so \({ T }\) is transitive. Note that \({ T ∘ T }\) is a proper subset of \({ T }\), unlike what we saw for \({ R }\) and \({ S }\), but the relation is still transitive.
\({ U ∘ U }\) is the set of all \({ ( x , y ) }\), i.e. \({ U }\), so \({ U }\) is transitive.
\({ V ∘ V }\) is the set of all \({ ( x , y ) }\), i.e. \({ U }\), which is not a subset of \({ V }\) so \({ V }\) is not transitive.

Most of these are fairly straightforward. If \({ ( x , z ) ∈ S ∘ S }\) then there is a \({ y }\) such that \({ ( x , y ) ∈ S }\) and \({ ( y , z ) ∈ S }\), i.e. such that \({ x ≤ y }\) and \({ y ≤ z }\). It follows that \({ x ≤ z }\), i.e. that \({ ( x , z ) ∈ S }\). So \({ S ∘ S ⊆ S }\). This is all we need for transitivity, but if we want to prove the statement made above that \({ S ∘ S = S }\) then we also need to show the reverse inclusion \({ S ⊆ S ∘ S }\). In other words we need to show that if \({ x ≤ z }\) then there is a \({ y }\) such that \({ x ≤ y }\) and \({ y ≤ z }\). This is easy. Either \({ y = x }\) or \({ y = z }\) will work. The argument for \({ T }\) is similar. If \({ ( x , z ) ∈ T ∘ T }\) then there is a \({ y }\) such that \({ ( x , y ) ∈ T }\) and \({ ( y , z ) ∈ T }\), i.e. such that \({ x < y }\) and \({ y < z }\). It follows that \({ x < z }\), i.e. that \({ ( x , z ) ∈ T }\). So \({ T ∘ T ⊆ T }\) and \({ T }\) is transitive. To prove the stronger statement given above we note that since we’re dealing with natural numbers \({ x < y }\) and \({ y < z }\) imply \({ x + 1 ≤ y }\) and \({ y + 1 ≤ z }\), from which we get \({ x + 2 ≤ z }\) and then \({ x + 1 < z }\). To see that \({ V ∘ V = U }\), note that if \({ ( x , z ) ∈ U }\) then there is a natural number \({ y }\) distinct from \({ x }\) and \({ z }\). To be more concrete, the numbers 0, 1, and 2 are all distinct so at least one of them is unequal to either \({ x }\) or \({ z }\). Call the least such number \({ y }\). Then \({ ( x , y ) ∈ V }\) and \({ ( y , z ) ∈ V }\) so \({ ( x , z ) ∈ V ∘ V }\). So \({ U ⊆ V ∘ V }\). The reverse inclusion is trivial since every relation on the natural numbers is a subset of \({ U }\), essentially by definition.

Functions

If \({ R }\) is a relation from \({ A }\) to \({ B }\) then the domain of \({ R }\) is a subset of \({ A }\) and the range of \({ R }\) is a subset of \({ B }\). We say that \({ R }\) is left total if the domain of \({ R }\) is all of \({ A }\) and that it’s right total if the range of \({ R }\) is all of \({ B }\). It follows that \({ R ^ { - 1 } }\) is left total if \({ R }\) is right total and vice versa.

Some properties of a relation from \({ A }\) to \({ B }\) depend only on the relation, i.e. the set of ordered pairs and others depend on the sets \({ A }\) and \({ B }\). Left and right uniqueness, for example, depend only on the relation while left and right totality depend on \({ A }\) and \({ B }\) as well.

A relation which is left total and right unique is called a function. If \({ F }\) a function from \({ B }\) to \({ C }\) and \({ G }\) is a function from \({ A }\) to \({ B }\) then \({ F ∘ G }\) is a function from \({ A }\) to \({ C }\). It’s possible for \({ F ∘ G }\) to be a function without \({ F }\) or \({ G }\) begin functions though. The precise conditions needed are

For each \({ x }\) in \({ A }\) there is a \({ y }\) in the domain of \({ G }\) such that \({ ( x , y ) ∈ F }\).
If \({ ( x , y ) ∈ F }\), \({ ( y , z ) ∈ F }\), \({ ( x , w ) ∈ F }\) and \({ ( w , v ) ∈ F }\) then \({ v = z }\).

Although this might seem like an obscure way to construct a function compared to composition of functions it is in fact frequently used.

Every function is left total by definition. Functions which are also right total are called surjective. Every function is right unique by definition. Functions which are also left unique are called injective. Functions which are both right total and left unique are called bijective, or invertible. Every function is a relation and so has an inverse, which is also a relation. For bijective relations this inverse relation is also a function. If \({ F }\) is a bijective function from \({ A }\) to \({ B }\) then \({ F ∘ F ^ { - 1 } = I _ B }\) and \({ F ^ { - 1 } ∘ F = I _ A }\).

If you’re accustomed to thinking of functions as being defined by algorithms then the definition above does not correspond to your intuition. Different algorithms can certainly give the same function. For example, taking a number and adding it to itself and taking a number and multiplying it by two are different algorithms but they correspond to the same function according to the definition above. Later we will see that there are functions for which there is no corresponding algorithm as well. If you’re used to thinking of functions in terms of graphs, on the other hand, then the definition above is exactly your intuition. Functions are simply defined as graphs. Note that this is the one place in these notes where I use the word graph in the sense that it’s used in algebra and calculus. Everywhere else it will by used in the same sense as in graph theory.

There are relatively few terminological conflicts between computer science and related fields like mathematics, logic and linguistics. Sometimes computer scientists use a different term for the same concept but its rare for them to use the same term for a different concept. This is unfortunately one of the exceptions. Computer scientists refer to the algorithmic notion of functions as functions. What word do they use for the graph notion of functions? Also function! This is confusing, but less of a problem than it might appear since the algorithmic notion is much more common. Logicians are the only people who have an adequate terminology. They refer to the algorithmic notion as intensional functions and the graph notion as extensional functions. Function without an adjective normally means extensional unless otherwise specified. Note that intensional is not a typo for intentional. The term from logic has an s rather than a t.

A useful fact about finite sets is that if \({ A }\) is a finite set and \({ F }\) is an injective function from \({ A }\) to \({ A }\) then \({ F }\) is also a surjective function from \({ A }\) to \({ A }\). This can be proved by set induction. The only function from \({ ∅ }\) to \({ ∅ }\) is \({ ∅ }\), because there are no ordered pairs \({ ( x , y ) }\) with \({ x ∈ ∅ }\) and \({ y ∈ ∅ }\). Now \({ ∅ }\) is trivially right total so every injective function \({ ∅ }\) to \({ ∅ }\) is a surjective function from \({ ∅ }\) to \({ ∅ }\). It therefore suffices to prove that if every injective function from \({ A }\) to itself is surjective then every injective function from \({ A ∪ \{ x \} }\) to itself is surjective. This is certainly true if \({ x ∈ A }\) so we can limit our attention the case where \({ x }\) is not a member of \({ A }\). Assume then that \({ F }\) is an injective function from \({ A ∪ \{ x \} }\) to itself and \({ x }\) is not a member of \({ A }\). \({ F }\) is left total so there is a \({ y ∈ A }\) such that \({ ( x , y ) ∈ F }\). \({ F }\) is left unique so there is no \({ w ∈ A }\) such that \({ ( w , y ) ∈ F }\). \({ F }\) is left total and right unique so for all \({ w ∈ A }\) there is a unique \({ z ∈ A ∪ \{ x \} }\) such that \({ ( w , z ) ∈ F }\). If \({ y = x }\) then this \({ z }\) is not \({ x }\) and so must be in \({ A }\). In this case the set of pairs \({ ( w , z ) }\) with \({ w ∈ A }\) is an injective function from \({ A }\) to itself. It must therefore be surjective. There is then, for each \({ z ∈ F }\) a \({ w ∈ A }\) such that \({ ( w , z ) ∈ F }\). \({ F }\) is injective so we can’t have \({ ( x , z ) ∈ F }\). and therefore \({ y = x }\). In other words, \({ x = y }\) if and only if there is, for each \({ w ∈ A }\), a \({ z ∈ A }\) such that \({ ( w , z ) ∈ F }\), and in this case every member of \({ A }\) is in the range of \({ F }\) and so is \({ x }\) so \({ F }\) is a surjective function from \({ A ∪ \{ x \} }\) to itself. It remains to consider the case where \({ y }\) is not equal to \({ x }\), and so is member of \({ A }\), and there is some \({ w ∈ A }\) for which there is no \({ z ∈ A }\) with \({ ( w , z ) ∈ F }\). \({ F }\) is left total so there is some \({ z ∈ A ∪ \{ x \} }\) with \({ ( w , z ) ∈ F }\) and so we must have \({ z = x }\) for such \({ w }\). Since \({ F }\) is left unique there is at most one such \({ w }\) and we already know there’s at least one so there must be exactly one. Let \[ G = F ∪ \{ ( w , y ) \} ∖ \{ ( w , x ) , ( x , y ) \} . \] This is an injective function from \({ A }\) to itself and so is also a surjective function. So for all \({ z ∈ A }\) there is a \({ v ∈ A }\) such that \({ ( v , z ) ∈ G }\). If \({ z }\) is not \({ y }\) then \({ ( v , z ) ∈ F }\) so \({ z }\) is in the range of \({ F }\). If \({ z }\) is \({ y }\) then \({ z }\) is also in the range of \({ F }\) because then \({ ( x , z ) ∈ F }\). So all members of \({ A }\) are in the range of \({ F }\). \({ x }\) is also in the range of \({ F }\) since \({ ( w , x ) ∈ F }\) so all of \({ A ∪ \{ x \} }\) is in the range of \({ F }\), which therefore must be surjective. \({ F }\) was an arbitrary injective function from \({ A ∪ \{ x \} }\) to itself so all injective functions from \({ A ∪ \{ x \} }\) to itself are surjective. We’ve shown that all injective functions from \({ ∅ }\) to itself are surjective and that if all injective functions from \({ A }\) to itself are surjective then all injective functions from \({ A ∪ \{ x \} }\) to itself are surjective. By induction on sets it follows that all injective functions from a finite set to itself are surjective.

Replacement

As was mentioned earlier, it would be nice to be able to define a set by giving a Boolean expression with a single free variable and define a set as those values of the variable for which the expression is true, but this doesn’t work because we could use it to show the existence of the set of all sets, which leads to contradiction. That’s why the Axiom of Separation has the form that it does.

Similarly, we would like to be able to define a function by giving a Boolean expression with two free variables, one representing the argument of the function and one representing the value of the function at that argument. We would need some additional restrictions to make sure that the relation we get in this way is left total and right unique, but we also need to use some set as the domain, to avoid accidentally creating the set of all sets. We could, for example, assume the following.

Replacement Suppose \({ P }\) is an expression in which \({ x }\) and \({ y }\) are free variables and \({ A }\) and \({ F }\) do not appear. For all \({ A }\) the statement that for each \({ x ∈ A }\) there is unique \({ y }\) such that \({ P }\) holds implies that there is a set \({ F }\) such that \({ ( x , y ) ∈ F }\) if and only if \({ x ∈ A }\) and \({ P }\) holds.

Formally, the statement that for each \({ x }\) there is unique \({ y }\) such that \({ P }\) holds is the result of linking with a \({ ∧ }\) the two statements \[ [ ∀ x ∈ A . [ ∃ y . P ] ] \] and \[ [ ∀ x ∈ A . [ ∀ y . [ ∀ z . [ [ P ∧ Q ] ⊃ [ y = z ] ] ] ] ] \] where \({ Q }\) is \({ P }\) with all free occurrences of \({ y }\) replaced by \({ z }\). We can simplify this a little by combining the \({ ∀ x ∈ A }\)’s in the two statements: \[ [ ∀ x ∈ A . [ [ ∃ y . P ] ∧ [ ∀ y . [ ∀ z . [ [ P ∧ Q ] ⊃ [ y = z ] ] ] ] ] ] . \] The first statement is what gives us left totality and the second is what gives us right uniqueness. For this to work we need to make the additional assumption that \({ z }\) does not appear in \({ P }\). We can simplify this a little by combining the \({ ∀ x ∈ A }\)’s in the two statements: \[ [ ∀ x ∈ A . [ [ ∃ y . P ] ∧ [ ∀ y . [ ∀ z . [ [ P ∧ Q ] ⊃ [ y = z ] ] ] ] ] ] \]

The statement about \({ F }\) is the result of joining \[ [ ∀ x . [ ∀ y . [ [ [ x ∈ A ] ∧ P ] ⊃ [ ( x , y ) ∈ F ] ] ] ] \] and \[ [ ∀ x . [ ∀ y . [ [ ( x , y ) ∈ F ] ⊃ [ [ x ∈ A ] ∧ P ] ] ] ] \] Again, we can simplify this slightly by combining the quantifiers they have in common: \[ [ ∀ x . [ ∀ y . [ [ [ [ x ∈ A ] ∧ P ] ⊃ [ ( x , y ) ∈ F ] ] ∧ [ [ ( x , y ) ∈ F ] ⊃ [ [ x ∈ A ] ∧ P ] ] ] ] ] . \] The full formal version is then \[ \begin{split} & [ ∀ A . [ [ ∀ x ∈ A . [ [ ∃ y . P ] ∧ [ ∀ y . [ ∀ z . [ [ P ∧ Q ] ⊃ [ y = z ] ] ] ] ] ] \\ & \quad {} ⊃ [ ∃ F . [ ∀ x . [ ∀ y . [ [ [ [ x ∈ A ] ∧ P ] ⊃ [ ( x , y ) ∈ F ] ] ∧ [ [ ( x , y ) ∈ F ] ⊃ [ [ x ∈ A ] ∧ P ] ] ] ] ] ] ] ] . \end{split} \] Like Separation, Replacement is really an axiom schema rather than a single axiom. There is one axiom for each choice of \({ P }\).

For finite sets we don’t need a new axiom since we can prove the statement above from the axioms we already have by set induction. For infinite sets though this doesn’t follow from our existing axioms. In contrast to Separation, which is used all the time, Replacement is almost never used outside of set theory, and rarely used even within it. It is fairly plausible though and it can be shown that if the system is sound without it then it is also sound with it so we can be sure that we haven’t accidentally introduced contradictions if we adopt it. This would not be the case if we didn’t require the domain of the function to be specified.

The axiom schema above isn’t quite the usual one. The usual choice gives the existence of the range of \({ F }\) rather than \({ F }\) itself. This is equivalent, in the sense that from either axiom we can derive the other. The usual choice has the advantage that we don’t need to mention ordered pairs, just sets, and so the axiom schema can be introduced much earlier, before we’ve defined ordered pairs. On the other hand the axiom schema isn’t needed earlier and the version with functions makes the motivation for its introduction much clearer.

Order relations, equivalence relations

A relation \({ R }\) on a set \({ A }\) is called reflexive if \({ I _ A ⊆ R }\), i.e. if \({ ( x , x ) ∈ R }\) for all \({ x ∈ A }\). Note that if \({ R }\) is reflexive then so is \({ R ^ { - 1 } }\). Of our earlier examples \({ R }\), \({ S }\) and \({ U }\) are reflexive while \({ T }\) and \({ V }\) are not.

A relation which is reflexive, transitive and antisymmetric is called a partial order. We just noted that if \({ R }\) is reflexive then so is \({ R ^ { - 1 } }\). We’ve previously seen that if \({ R }\) is transitive then so is \({ R ^ { - 1 } }\) and that if \({ R }\) is antisymmetric then so is \({ R ^ { - 1 } }\). It follows that if \({ R }\) is a partial order then so is \({ R ^ { - 1 } }\). It’s said to be a total order if in addition \({ R ∪ R ^ { - 1 } = A × A }\), i.e. if for all \({ x ∈ A }\) and \({ y ∈ A }\) at least one of \({ ( x , y ) ∈ R }\) or \({ ( y , x ) ∈ R }\) holds.

Of our earlier example relations, \({ R }\) and \({ S }\) are partial orders and \({ S }\) is a total order. None of the others are partial orders and \({ R }\) is not a total order.

If we restrict a partial order to a subset then the result will always be a partial order on the subset and may or may not be a total order on the subset. If it is a total order then that subset is called a chain.

A relation \({ R }\) on a set \({ A }\) is said to be an equivalence relation if it is reflexive, transitive and symmetric. Of our earlier examples, both \({ R }\) and \({ U }\) are equivalence relations, while \({ S }\), \({ T }\) and \({ V }\) are not. There is an important equivalence relation, equivalence modulo \({ n }\), on the set of natural numbers, defined for each natural number \({ n }\). This is the set of ordered pairs \({ ( x , y ) }\) for which there is a natural number \({ m }\) such that either \({ x = y + m · n }\) or \({ x + m · n = y }\). The special case \({ n = 0 }\) gives the relation \({ R }\) from earlier and the special case \({ n = 1 }\) gives the relation \({ U }\) but the cases where \({ n > 1 }\), and particularly where \({ n }\) is prime, are more important.

Suppose \({ R }\) is a partial order on \({ A }\). \({ y ∈ A }\) is said to be a greatest member of \({ A }\) if \({ ( x , y ) ∈ R }\) for all \({ x ∈ A }\). \({ y ∈ A }\) is said to be a maximal member if \({ z ∈ A }\) and \({ ( y , z ) ∈ R }\) imply \({ y = z }\). \({ x ∈ A }\) is said to be a least member of \({ A }\) if \({ ( x , y ) ∈ R }\) for all \({ y ∈ A }\). \({ x }\) is said to be a minimal member of \({ A }\) if \({ w ∈ A }\) and \({ ( w , x ) ∈ R }\) imply \({ w = x }\).

If there is a greatest member then there is only one and it is also a maximal member. If there is a least member then there is only one and it is also a minimal member.

If \({ A }\) is a set of sets then \({ R = \{ ( B , C ) ∈ A × A : B ⊆ C \} }\) is an order relation since \({ B ⊆ B }\) for all \({ B ∈ A }\), \({ B ⊆ D }\) if \({ B ⊆ C }\) and \({ C ⊆ D }\), and \({ B ⊆ C }\) and \({ C ⊆ B }\) imply \({ B = C }\).

As an example of the definitions above, suppose \({ A }\) is the set of non-empty finite subsets of some non-empty set \({ E }\).

For any \({ x ∈ E }\) we have \({ \{ x \} ∈ A }\). \({ \{ x \} }\) is in fact a minimal member since if \({ B }\) is a finite non-empty subset of \({ \{ x \} }\) then \({ B = \{ x \} }\). If \({ x }\) is the only member of \({ E }\) then \({ \{ x \} }\) is also a least member of \({ A }\), but if there is some \({ y ∈ E }\) with \({ x ≠ y }\) then \({ A }\) has no least member. A least member would have to be a subset of every member of \({ A }\) and hence a subset of both \({ \{ x \} }\) and \({ \{ y \} }\). The only set with this property is \({ ∅ }\), but it is not a member of \({ A }\).

If \({ E }\) is finite then \({ E ∈ A }\) and \({ E }\) is a greatest member of \({ A }\) since every member of \({ A }\) is a subset of \({ E }\). If \({ E }\) is infinite then \({ E }\) is not a member of \({ A }\) and so can’t be greatest member or maximal member. In fact there is no maximal member in this case and hence also no greatest member. Suppose \({ B }\) is a member of \({ A }\). Then \({ B }\) is finite and \({ E }\) is infinite so \({ B }\) is not \({ E }\). \({ B }\) is a member of \({ A }\) and all members of \({ A }\) are subsets of \({ E }\) so \({ B }\) is a subset of \({ E }\) and must be a proper subset since \({ B }\) is not equal to \({ E }\). There is therefore some \({ x }\) which is a member of \({ E }\) but not of \({ B }\). Let \({ C = B ∪ \{ x \} }\). Then \({ B }\) is a subset of \({ C }\) and \({ C }\) is a subset of \({ E }\). It’s a finite subset. We proved that earlier. It’s non-empty since \({ x ∈ C }\). So \({ C ∈ A }\). From this and \({ B ⊆ C }\) it would follow that \({ B = C }\) if \({ B }\) were maximal, but \({ x }\) is a member of \({ C }\) and not of \({ B }\) so this is impossible. Therefore \({ B }\) is not maximal. Since \({ B }\) was an arbitrary member of \({ A }\) it follows that no member of \({ A }\) is maximal.

When defining finiteness earlier I used the terms minimal and maximal. You can check that the definitions given there agree with the definitions of minimal and maximal given above, with the relation being the set inclusion relation.

For any non-empty finite set \({ A }\) and partial order \({ R }\) on \({ A }\) there is a minimal member and a maximal member. This is proved by induction on sets. Let \({ B }\) be the set of subsets \({ C }\) of \({ A }\) such that \({ C }\) is empty or has a minimal and maximal member. Then \({ ∅ ∈ B }\). If \({ C ∈ B }\) then \({ C ∪ \{ x \} ∈ B }\) for all \({ x ∈ A }\). This is proved as follows. If \({ C = ∅ }\) then \({ x }\) is both a minimal and maximal member of \({ C ∪ \{ x \} }\). If \({ C }\) is not empty then it has a minimal and maximal member. Let \({ z }\) be a minimal member of \({ C }\). Then \({ y ∈ C }\) and \({ ( y , z ) ∈ R }\) imply \({ y = z }\). \({ ( x , z ) }\) either is or isn’t a member of \({ R }\). If it isn’t then \({ y ∈ C ∪ \{ x \} }\) and \({ ( y , z ) ∈ R }\) imply \({ y = z }\) so \({ z }\) is a minimal member of \({ C ∪ \{ x \} }\). If \({ ( x , z ) ∈ R }\) then for any \({ y ∈ C }\) such that \({ ( y , x ) ∈ R }\) we have \({ ( y , z ) ∈ R }\) by the transitivity of \({ R }\) and so \({ y = z }\), since \({ z }\) is a minimal member of \({ C }\). But \({ ( x , z ) ∈ R }\) so \({ ( x , y ) ∈ R }\). \({ R }\) is antisymmetric so \({ ( x , z ) ∈ R }\) and \({ ( z , x ) ∈ R }\) imply \({ z = x }\). In other words, whenever \({ y ∈ C }\) such that \({ ( y , x ) ∈ R }\) we have \({ y = x }\). Therefore \({ y ∈ C ∪ \{ x \} }\) and \({ ( y , x ) ∈ R }\) imply \({ y = x }\). In other words \({ x }\) is a minimal member of \({ C ∪ \{ x \} }\). So either \({ x }\) or \({ z }\) is a minimal member of \({ C ∪ \{ x \} }\). A similar argument shows that \({ C ∪ \{ x \} }\) has a maximal member.

If \({ R }\) is an equivalence relation on a set \({ A }\) then we say that \({ B }\) is an equivalence class if \({ B }\) is a subset of \({ A }\), for all \({ x ∈ B }\) and \({ y ∈ B }\) we have \({ ( x , y ) ∈ R }\), and if \({ ( x ∈ B ) }\) and \({ y ∈ B }\) then \({ x ∈ B }\).

Every element of \({ A }\) is a member of exactly one equivalence class. In fact, if \({ x ∈ A }\) then \({ B = \{ y ∈ A : ( x , y ) ∈ R \} }\) is an equivalence class of which \({ x }\) is a member and if \({ C }\) is an equivalence class with \({ x ∈ C }\) then \({ C = B }\).

From a partial order we can construct an equivalence relation in a natural way. If \({ R }\) is a partial order on \({ A }\) then \({ S = R ∩ R ^ { - 1 } }\) is an equivalence relation. Another way to state this equation is to say that \({ ( x , y ) ∈ S }\) if and only if \({ ( x , y ) ∈ R }\) and \({ ( y , x ) ∈ R }\).

Notation

You have no doubt noticed that this is not the usual way to write functions or relations. In place of \({ ( x , y ) ∈ F }\) or \({ ( x , y ) ∈ R }\) we usually write \({ y = F ( x ) }\) or \({ x R y }\). This is convenient, but dangerous. As I’ve mentioned before, first order logic does not cope well with meaningless expressions, like \({ F ( x ) }\) where \({ x }\) is not in the domain of \({ F }\). For this reason I’ll be careful not to use the usual notation in this chapter, although I will use it in later chapters. You should be aware though that some rules of inference which are sound if we stick to the ordered pair notation become unsound when the usual functional notation is used. The most important of these is substitution. For real numbers, for example, we have the basic fact, known as the Law of Trichotomy, that \[ [ ∀ y . [ [ y < 0 ] ∨ [ y = 0 ] ∨ [ y > 0 ] ] ] . \] If we substitute the numerical expression \({ F ( x ) }\) for \({ y }\) we get \[ [ [ F ( x ) < 0 ] ∨ [ F ( x ) = 0 ] ∨ [ F ( x ) > 0 ] ] ] . \] This is fine if \({ x }\) is in the domain of \({ F }\) but the usual way of interpreting a statement like \({ F ( x ) = 0 }\) is what we’ve written above as \({ ( x , 0 ) ∈ F }\), i.e. that \({ x }\) is in the domain of \({ F }\) and the value of \({ F }\) at \({ x }\) is \({ 0 }\). So the statement \[ [ [ F ( x ) < 0 ] ∨ [ F ( x ) = 0 ] ∨ [ F ( x ) > 0 ] ] ] \] carries an implicit assumption that \({ x }\) is in the domain of \({ F }\), which may not be true. This turns substitution from a mechanical process into one which requires actual thought, checking that the expressions which are given as arguments to functions represent values within the domains of those functions. Mathematicians generally consider the ease of use of the usual functional notation to be worth the extra work but it’s important to realise that there is a trade-off here.