Natural deduction

Some people find proving theorems in formal systems like Nicod’s or Łukasiewicz’s an entertaining sort of puzzle. Other people do not. What’s undeniable is that such proofs have a very different flavour from those of the rest of mathematics. There was a reaction against these and similar axiomatic systems which led to what’s known as ``natural deduction’’. One of the most important people behind this reaction was Łukasiewicz himself. Natural deduction systems are still formal systems, but their rules better reflect the way mathematicians typically think.

A formal system for natural deduction

There are a wide variety of natural deduction systems. We’ll use one due to Douglas Hofstadter, with a few extra rules of inference. Having more rules of inference makes constructing proofs in the formal system easier, but it comes at the expense of making most proofs of statements about the formal system harder. The language of this system includes only the operators \({ ∧ }\), \({ ∨ }\), \({ ¬ }\), and \({ ⊃ }\). It has no axioms! In contrast it has a lot of rules of inference:

  1. From statements \({ P }\) and \({ Q }\) we can deduce the statement \({ ( P ∧ Q ) }\). Also, from any statement of the form \({ ( P ∧ Q ) }\) we can deduce the statement \({ P }\) and the statement \({ Q }\).
  2. From the statement \({ P }\) we can deduce the statement \({ ( P ∨ Q ) }\), where \({ Q }\) is any expression.
  3. From \({ P }\) and \({ ( P ⊃ Q ) }\) we can deduce \({ Q }\).
  4. The expressions \({ [ ¬ ( ¬ P ) ] }\) and \({ P }\) are freely interchangeable. In other words, anywhere an expression of one of these forms appears in a statement we may deduce the statement where it has been replaced by the other.
  5. The expressions \({ ( P ⊃ Q ) }\) and \({ [ ( ¬ Q ) ⊃ ( ¬ P ) ] }\) are freely interchangeable.
  6. The expressions \({ [ ( ¬ P ) ∧ ( ¬ Q ) ] }\) and \({ [ ¬ ( P ∨ Q ) ] }\) are freely interchangeable.
  7. The expressions \({ [ ( ¬ P ) ∨ ( ¬ Q ) ] }\) and \({ [ ¬ ( P ∧ Q ) ] }\) are freely interchangeable.
  8. The expressions \({ ( P ∨ Q ) }\) and \({ [ ( ¬ P ) ⊃ Q ] }\) are freely interchangeable, as are \({ ( P ⊃ Q ) }\) and \({ [ ( ¬ P ) ∨ Q ] }\)
  9. The “Rule of Fantasy”, to be described below.
  10. The “Rule of Substitution”, subject to restrictions to be discussed below.

The first few rules specify the behaviour of the four logical operators. They are closely related to our tableaux rules. Half of the first rule, which is called the rule of joining and separation, can be thought of an equivalent to the tableau rule \[ \begin{array}{r|l} ( P ∧ Q ) & \cr \hline P & \cr Q & \end{array} \] for example. It reflects the fact that if \({ ( P ∧ Q ) }\) is true then \({ P }\) and \({ Q }\) are true. It’s important to remember though that that’s a property of the intended interpretation, or rather interpretations, of the system, while the rule above is a rule of inference within the system. The third rule is one we’ve met before, under the name modus ponens. The fourth rule above is related to the tableau rule \[ \begin{array}{r|l} ( ¬ P ) & \cr \hline & P \end{array} \qquad \begin{array}{r|l} & ( ¬ P ) \cr \hline P & \end{array} \] with the first one applied with \({ ( ¬ P ) }\) in place of \({ P }\). It reflects the “fact” that if \({ P }\) is not not true then it is true. The quotation marks reflect the reality that not everyone accepts this a logical principle. This is one of the dividing lines between “classical” and “intuitionist” logic. It’s tableau counterpart is more complicated. It consists of following both branches from a \({ ( P ⊃ Q ) }\) but then using the \({ P }\) to immediately close off the left branch, which would have a \({ P }\) to the right of the bar.

The fifth rule is called the rule of the contrapositive, it is the basis of proofs by contradiction. It is another dividing line between “classical” and “intuitionist” logic. The sixth and seventh rules are two known as De Morgan’s laws. The eighth rule of inference is really just the observation we made when discussing Łukasiewicz’s system that \({ ∨ }\) is expressible in terms of \({ ¬ }\) and \({ ⊃ }\).

Introducing and discharging hypotheses

The ninth rule, the “rule of fantasy” in Hofstadter’s terminology, is more complicated to explain, but reflects a common practice in informal proofs. We often say “Suppose \({ P }\)”. We then reason for a while under that assumption and reach a conclusion \({ Q }\). We then conclude “So if \({ P }\) then \({ Q }\).” There are two common circumstances in which we do this. One is proof by contradiction, where we then immediately apply the rule of the contrapositive. The other is case by case analysis, which was the basis of our tableau method. In such applications there will be a separate application of the rule of fantasy for each possible case. Writers of informal proofs are under no obligation to tell you which of these two uses they have in mind and sometimes you have to read the whole proof to find out but often a clue is is in the verb forms used. In a proof by contradiction people are more likely to write “Suppose … were true” rather than “Suppose … is true”. But this is not something you can entirely rely on.

I’ve just described what the rule of fantasy is intended to do, but not the precise rules governing it. They’re just a formalised version of the rules which mathematicians follow in informal arguments. We need some terminology. The step of saying “Suppose \({ P }\)” is called introducing the assumption or hypothesis \({ P }\). The step of saying “So if \({ P }\) then \({ Q }\)” is called discharging this hypothesis or assumption. Everything in between is called the scope of the hypothesis. It’s possible, and indeed common, to introduce further hypotheses within the scope of an existing one, and so have nested scopes. In arguments based on tableaux this corresponds to branches within branches.

Scope determines which statements are accessible for use by the other rules at any point in a proof. When you enter a new scope by introducing a hypothesis you retain access to everything in the scope you were in. When you leave that scope by discharging that hypothesis you lose access to all statements since you entered it. The only trace of any of the reasoning which took place within that scope is the single statement \({ ( P ⊃ Q ) }\) generated by discharging the hypothesis. This restriction on the accessible statements is needed to ensure that you can’t deduce a statement by introducing \({ P }\) as a hypothesis unless it’s of the form \({ ( P ⊃ Q ) }\). Otherwise you could prove all statements, true or false, by introducing them as a hypothesis and then using them outside of the scope of that hypothesis. All the other rules are to be interpreted as implicitly subject to this restriction. So when we say that from \({ ( P ∧ Q ) }\) we can deduce \({ P }\) and \({ Q }\) we mean that in a scope where \({ ( P ∧ Q ) }\) is accessible we can deduce \({ P }\) and \({ Q }\). It doesn’t allow us to deduce \({ P }\) or \({ Q }\) if the statement \({ ( P ∧ Q ) }\) appeared after some hypothesis but has already been discharged.

Statements outside the scope of any hypotheses are said to have “global” scope. Only such statements are theorems.

In informal proofs it can be difficult to spot where hypotheses are introduced and where they’re discharged and therefore difficult to know which ones are accessible. This is particularly problematic in proofs by contradiction. The whole point of a proof by contradiction is that the hypothesis which is introduced will later be shown to be false. Everything in the scope of that hypothesis could, and usually does, depend on that false hypothesis and therefore should never be used outside that particular proof by contradiction argument. This is a common source of error for students. If you scan through a textbook looking for things which might be useful in problem you’re attempting then you may find useful statements in the proof of a theorem. They’ll typically depend on various hypotheses which have been introduced though and can’t safely be used in a context where those hypotheses aren’t known to be true.

In a formal system we need some way to indicate the introduction and discharging of hypotheses.

The safest way to do it would be to include a list of all active, i.e. not yet discharged, hypotheses before each statement. That solves the problem described above of using statements outside of their scope, but at the cost of making proofs very long and repetitive.

Jaśkowski, one of the founders of the theory of natural deduction, used boxes. A box encloses all the statements within a give scope and it’s straightforward to see which statements are available within it. Starting from wherever we currently are we have available any statement we can reach by crossing zero or more box boundaries from inside to outside, but we are not allowed to cross any from outside to inside. This notational convention would probably have been more popular if it weren’t a nightmare to typeset.

A popular alternative is to use indentation. Every time we introduce a hypothesis we increase the indentation and every time we discharge one we restore it to its previous value. The first statement after the indentation is increased is the newly introduced hypothesis. The first statement after the indentation has been restored is the result of discharging the hypothesis, i.e. the statement \({ ( P ⊃ Q ) }\) where \({ P }\) is the hypothesis and \({ Q }\) is the last statement before the indentation was restored. This is a very compact notation but using spaces for indentation can cause problems. Screen readers will generally ignore spaces. Even for sighted readers judging the number of spaces at the start of a line is error-prone. It’s better to use a non-whitespace character. We’ll use dots.

The “rule of fantasy” is Hofstadter’s terminology. It accurately reflects the use of the rule, to explore the consequences of a statement not known to be true, but don’t expect anyone to understand you if you use the term outside of this module.

There is some redundancy in the rules above, in the sense that there are proper subsets of those rules with the property that any statement which has a proof using the full set also has a proof using only the subset. But the point of a natural deduction system is to formalise something close to the way mathematicians actually reason rather than to have an absolutely minimal system. If you like minimal systems then you’re better off with Nicod.

There are, as we’ll see some restrictions needed on the rule of substitution but I’ll get to those once we have some examples of proofs. In the interim I will be careful not to use that rule.

Some proofs

The shortest proof in this system is \[ \begin{array}{l@{\quad}l} 1 & . \quad p \cr 2 & ( p ⊃ p ) \end{array} \] which introduces \({ p }\) as a hypothesis and immediately discharges it. The line numbering isn’t technically part of the proof; I’ve just added it to make it easier to refer to individual lines. This proof shows that the statement \({ ( p ⊃ p ) }\) is a theorem in this system. We saw that it was a theorem in Łukasiewicz’s system as well, but with a considerably longer proof.

As another example, consider this proof of the statement \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\). \[ \begin{array}{l@{\quad}l} 1 & . \quad p \cr 2 & . \quad . \quad q \cr 3 & . \quad . \quad ( p ∧ q ) \cr 4 & . \quad [ q ⊃ ( p ∧ q ) ] \cr 5 & \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} \end{array} \] Our first step is again to introduce a hypothesis. In this system the first step is always to introduce a hypothesis. There are no axioms and every other rule deduces a statement from previous statements, of which we have none. It’s not hard to guess which hypothesis to introduce. The statement we want to prove is \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) and it starts with \({ p ⊃ }\) so if we introduce \({ p }\) and then manage to prove \({ [ q ⊃ ( p ∧ q ) ] }\) then we will be done. So we introduce \({ p }\). What next? We want to prove \({ [ q ⊃ ( p ∧ q ) ] }\). It starts with \({ q ⊃ }\) so try the same thing, introducing \({ q }\) as a further hypothesis. If we can prove \({ ( p ∧ q ) }\) within the scope of this hypothesis then we will have proved \({ [ q ⊃ ( p ∧ q ) ] }\) within the scope of the hypothesis \({ p }\) and therefore will have proved \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) within the global scope, i.e. in the absence of any hypotheses. At this point we have two statements available within our current scope \({ p }\) and \({ q }\). We just introduced \({ q }\). We inherited \({ p }\) from the outer scope. This is what I meant when I said that when you enter a new scope by introducing a hypothesis you retain access to everything in the scope you were in. So we have \({ p }\) and \({ q }\) and want \({ ( p ∧ q ) }\). Fortunately our first rule of inference, the rule of joining and separation, does exactly this. So the proof may look strange at first but really at each stage we do the only thing we can and it works.

As another example, consider \({ [ p ∨ ( ¬ p ) ] }\), which is often called the “law of the excluded middle”. In this case it’s less obvious how to start. For the reasons discussed above we must start by introducing a hypothesis, but what hypothesis? The statement isn’t of the form \({ ( P ⊃ Q ) }\) for any choice of expressions \({ P }\) and \({ Q }\). Looking at our rules though we see that one of them allows us to derive \({ [ p ∨ ( ¬ p ) ] }\) from \({ [ ( ¬ p ) ⊃ ( ¬ p ) ] }\). That is easily proved with the fantasy rule. We introduce the hypothesis \({ ( ¬ p ) }\) and then immediately discharge it. The complete proof is \[ \begin{array}{l@{\quad}l} 1 & . \quad ( ¬ p ) \cr 2 & [ ( ¬ p ) ⊃ ( ¬ p ) ] \cr 3 & [ p ∨ ( ¬ p ) ] \end{array} \]

A slightly more complicated example is the following proof of \({ \{ [ p ∧ ( ¬ p ) ] ⊃ q \} }\). This time I’ll just indicate which rule is used in each line and not explain the strategy behind it. \[ \begin{array}{l@{\quad}l} 1 & . \quad [ p ∧ ( ¬ p ) ] \cr 2 & . \quad p \cr 3 & . \quad ( ¬ p ) \cr 4 & . \quad . \quad ( ¬ q ) \cr 5 & . \quad . \quad [ ¬ ( ¬ p ) ] \cr 6 & . \quad \{ ( ¬ q ) ⊃ [ ¬ ( ¬ p ) ] \} \cr 7 & . \quad [ ( ¬ p ) ⊃ q ] \cr 8 & . \quad q \cr 9 & \{ [ p ∧ ( ¬ p ) ] ⊃ q \} \end{array} \]

The first line is the introduction of a hypothesis, also known as the rule of fantasy. The hypothesis is \({ \quad [ p ∧ ( ¬ p ) ] }\). The second and third lines are from our first rule of inference, “From statements \({ P }\) and \({ Q }\) we can deduce the statement \({ ( P ∧ Q ) }\). Also, from any statement of the form \({ ( P ∧ Q ) }\) we can deduce the statement \({ P }\) and the statement \({ Q }\).” This is second part of it, applied to the first line. The fourth line introduces the hypothesis \({ ( ¬ q ) }\). The fifth line uses the fourth rule: “The expressions \({ [ ¬ ( ¬ P ) ] }\) and \({ P }\) are freely interchangeable. In other words, anywhere an expression of one of these forms appears in a statement we may deduce the statement where it has been replaced by the other.” This is applied to the second line. The sixth line discharges the hypothesis \({ ( ¬ q ) }\) from the fourth line. The seventh line uses the fifth rule of inference: “The expressions \({ ( P ⊃ Q ) }\) and \({ [ ( ¬ Q ) ⊃ ( ¬ P ) ] }\) are freely interchangeable.’’ Here \({ P }\) is the expression \({ ( ¬ p ) }\) and \({ Q }\) is the expression \({ q }\). The eighth line uses the third rule of inference:”From \({ P }\) and \({ ( P ⊃ Q ) }\) we can deduce \({ Q }\).” This is the rule known as modus ponens. It’s applied to the third and seventh lines. The ninth and final line discharges the hypothesis \({ [ p ∧ ( ¬ p ) ] }\).

The theorem \({ \{ [ p ∧ ( ¬ p ) ] ⊃ q \} }\) is known as the “Principle of Explosion”. Unlike other fanciful names, like the “Rule of Fantasy”, this name is quite standard and people should understand what you’re talking about if you use it. This theorem shows that from a contradiction it’s possible to derive anything at all. In a theory with contradictions all statements are theorems.

A useful substitution instance of \({ \{ [ p ∧ ( ¬ p ) ] ⊃ q \} }\) is \({ \{ [ p ∧ ( ¬ p ) ] ⊃ p \} }\).

Here are the proofs of some other useful theorems. It would be a good idea to go through at least some of them and check that you can identify which rule of inference is being used at each step. \[ \begin{array}{l@{\quad}l} 1 & . \quad ( ¬ p ) \cr 2 & . \quad [ ( ¬ p ) ∧ ( ¬ p ) ] \cr 3 & . \quad [ ¬ ( p ∨ p ) ] \cr 4 & . \quad \{ ¬ [ ( ¬ p ) ⊃ p ] \} \cr 5 & [ ( ¬ p ) ⊃ \{ ¬ [ ( ¬ p ) ⊃ p ] \} ] \cr 6 & \{ [ ( ¬ p ) ⊃ p ] ⊃ p \} \end{array} \] \[ \begin{array}{l@{\quad}l} 1 & . \quad q \cr 2 & . \quad [ ( ¬ p ) ∨ q ] \cr 3 & . \quad ( p ⊃ q ) \cr 4 & [ q ⊃ ( p ⊃ q ) ] \cr 5 & \{ [ ¬ ( p ⊃ q ) ] ⊃ ( ¬ q ) ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} 1 & . \quad ( ¬ p ) \cr 2 & . \quad [ ( ¬ p ) ∨ q ] \cr 3 & . \quad ( p ⊃ q ) \cr 4 & [ ( ¬ p ) ⊃ ( p ⊃ q ) ] \cr 5 & \{ [ ¬ ( p ⊃ q ) ] ⊃ [ ¬ ( ¬ p ) ] \} \cr 6 & \{ [ ¬ ( p ⊃ q ) ] ⊃ p ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} 1 & . \quad ( p ⊃ q ) \cr 2 & . \quad [ ( ¬ q ) ⊃ ( ¬ p ) ] \cr 3 & . \quad [ q ∨ ( ¬ p ) ] \cr 4 & \{ ( p ⊃ q ) ⊃ [ q ∨ ( ¬ p ) ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} 1 & . \quad [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] \cr 2 & . \quad ( p ⊃ r ) \cr 3 & . \quad [ ( ¬ r ) ⊃ ( ¬ p ) ] \cr 4 & . \quad ( q ⊃ r ) \cr 5 & . \quad [ ( ¬ r ) ⊃ ( ¬ q ) ] \cr 6 & . \quad . \quad ( ¬ r ) \cr 7 & . \quad . \quad ( ¬ p ) \cr 8 & . \quad . \quad ( ¬ q ) \cr 9 & . \quad . \quad [ ( ¬ p ) ∧ ( ¬ q ) ] \cr 10 & . \quad . \quad [ ¬ ( p ∨ q ) ] \cr 11 & . \quad \{ ( ¬ r ) ⊃ [ ¬ ( p ∨ q ) ] \} \cr 12 & . \quad [ ( p ∨ q ) ⊃ r ] \cr 13 & \{ [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] ⊃ [ ( p ∨ q ) ⊃ r ] \} \end{array} \]

As a final example we consider the tautology \({ \{ [ (p ⊃ q) ∧ (q ⊃ r) ] ⊃ (p ⊃ r) \} }\) we encountered earlier. A proof is \[ \begin{array}{l@{\quad}l} 1 & . \quad [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \cr 2 & . \quad . \quad p \cr 3 & . \quad . \quad ( p ⊃ q ) \cr 4 & . \quad . \quad ( q ⊃ r ) \cr 5 & . \quad . \quad q \cr 6 & . \quad . \quad r \cr 7 & . \quad ( p ⊃ r ) \cr 8 & \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} \end{array} \]

Although formal proofs may look baffling when you first look at them it’s often easy to translate them into informal proofs of a familiar type. For example, the proof above can be translated as follows.

Suppose \({ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] }\) is true. Then \({ ( p ⊃ q ) }\) and \({ ( q ⊃ r ) }\) are both true. If \({ p }\) is true then \({ q }\) is true and therefore \({ r }\) is true. So \({ ( p ⊃ r ) }\). This holds under our assumption that \({ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] }\) is true, so \({ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} }\).

Substitution

This is where we should talk about the rule of substitution. A very common style of mathematical argument, and one which we formalised in the tableau method, is case by case analysis. The simplest type of case by case analysis is where we have only two cases, one where a certain statement is true and one where it’s false. If we can prove a certain conclusion in both of those cases then that conclusion must be true. Or at least it must be if you accept the law of the excluded middle. Intuitionists don’t.

But to apply the case by case analysis above we need the law of the excluded middle for expressions and not just for variables. In other words we need \({ [ P ∨ ( ¬ P ) ] }\) to be a theorem for every expression \({ P }\). There are three ways of accomplishing this. One is to run exactly the same argument as above with \({ p }\) replaced everywhere by \({ P }\). The rules we used, of which there were only two, refer to expressions rather to variables and so no change is needed. We could do the same with \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\), by the way. For any expressions \({ P }\) and \({ Q }\) we could replace every \({ p }\) by the expression \({ P }\) and every \({ q }\) by the expression \({ Q }\) in the proof of \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) and obtain a proof of \({ \{ P ⊃ [ Q ⊃ ( P ∧ Q ) ] \} }\). A disadvantage of this approach is that we need to repeat the argument for each \({ P }\) and \({ Q }\) we need to result for. We can’t just do it with the letters \({ P }\) and \({ Q }\) in place of the letters \({ p }\) and \({ q }\) because \({ P }\) and \({ Q }\) aren’t even elements of our language, just elements of the language we use to talk about our language. We have to substitute the actual expressions, and so we’ll need to redo that work whenever we need the result for a different pair of expressions.

Another option is to leave the realm of formal proof and enter the realm of semiformal proof. The argument above shows that for any expressions \({ P }\) and \({ Q }\) the statements \({ [ P ∨ ( ¬ P ) ] }\) and \({ \{ P ⊃ [ Q ⊃ ( P ∧ Q ) ] \} }\) are theorems. Anything you can derive from them using our rules of inference is also a theorem. In other words, substitution is a derived rule of inference for this system, just as it was for the Nicod system. But now we’re not giving proofs of statements but rather proofs that statements have proofs. That’s a semiformal proof rather than a formal one.

The third option is to bring in the rule of substitution, which was a rule of inference in the Łukasiewicz system. This seems redundant, since we’ve just seen how one can get around the lack of a rule of substitution by just substituting expressions for variables within a proof, but it’s convenient to have and we already decided we aren’t trying for a minimal system.

There’s a subtle danger here though. Consider the following proof: \[ \begin{array}{l@{\quad}l} 1 & . \quad p \cr 2 & . \quad q \cr 3 & ( p ⊃ q ) \end{array} \] The first step is to introduce the hypothesis \({ p }\). The second is to apply the rule of substitution, replacing the variable \({ p }\) with the expression \({ q }\). Variables are expressions so this is okay. The third step is to discharge the hypothesis. But \({ ( p ⊃ q ) }\) is not a tautology. There is an interpretation under which it is false, namely the one where \({ p }\) is assigned the value true and \({ q }\) is assigned the value false, and therefore it should not be a theorem.

What went wrong? There’s a difference in interpretation between statements in the Nicod or Łukasiewicz systems and in a natural deduction system. In the Nicod or Łukasiewicz systems every statement is meant to be unconditionally true. We could stop a proof at any point and have a proof of the last statement before we stopped. This is not the case for natural deduction systems. Only statements in global scope are meant to be unconditionally true. All other statements are meant to be true if all the active hypotheses for their scope are true. In other words they’re only conditionally true. That’s why I had to specify that only statements in global scope are theorems. The problem with the argument above is that once we’ve introduced \({ p }\) as a hypothesis it’s no longer just any variable. It’s the specific variable whose truth everything will be dependent upon until we discharge that hypothesis. Replacing it with some other variable, or some other expression, can’t safely be allowed.

How can we repair this? We could sacrifice the rule of fantasy but the rule of fantasy is the foundation of our system. It is literally impossible to prove anything without it. We could limit our rule of substitution by saying that only statements with global scope are available for substitution. This is a sound rule of inference. We know it’s sound because the other rules are sound and this one is redundant. Anything we could prove with it we could also prove without it, by the technique we discussed earlier of repeating the proof, but with expressions substituted in for the variables. It’s unnecessarily restrictive though, since it can sometimes be safe to substitute for some variables in a statement within the scope of a hypothesis. The precise rule is that in any available statement we may substitute any expression for any variable which does not appear in any hypothesis which was active in its scope. In the global scope no hypotheses are active and so we can substitute for any variable, but elsewhere certain variables will not be substitutable. Note that which variables are substitutable depends on the scope of the statement into which we’re substituting, not on our current scope at the point where we want to make the substitution.

This is the first example we’ve seen of a phenomenon where not all variables are equally variable. Some have special status in a particular context which restricts what we can do with them. It won’t be the last such example. Something similar will happen when we move on to first order logic.

Although I’ve put in some effort above to ensure that you can substitute into statements within the scope of a hypothesis in those cases where it’s safe you shouldn’t generally structure your proofs in a way which makes that necessary. If you’re going to need multiple substitution instances of a statement then you should prove that statement in global scope so that it’s available everywhere. Often that means writing your proofs out of order. Once you release you’ll need multiple instances of a statement you need to go back and insert a proof of that statement at the start of your argument, before introducing any hypotheses which are not needed in its proof.

When reading proofs it’s useful to know that people do this. If an author starts by proving a bunch of random facts whose usefulness isn’t immediately apparent and which don’t reappear for several pages that’s not necessarily just bad exposition. It may well be that their being proved there to make it clear that they’re in global scope, i.e. that their truth is not contingent on hypotheses which will be made later.

From tableaux to proofs

We can generate a formal proof in our natural deduction system from a tableau. This generally won’t give a particularly efficient proof, but it will be a formal proof.

There’s a preamble which we can use for all tableaux, consisting of proofs of theorems we’ve already proved or easy consequences of those, whose substitution instances we’ll find useful. This preamble is \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 1 & . \quad [ p ∧ ( ¬ p ) ] \cr 2 & . \quad p \cr 3 & . \quad ( ¬ p ) \cr 4 & . \quad . \quad ( ¬ q ) \cr 5 & . \quad . \quad [ ¬ ( ¬ p ) ] \cr 6 & . \quad \{ ( ¬ q ) ⊃ [ ¬ ( ¬ p ) ] \} \cr 7 & . \quad [ ( ¬ p ) ⊃ q ] \cr 8 & . \quad q \cr 9 & \{ [ p ∧ ( ¬ p ) ] ⊃ q \} \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 10 & . \quad ( ¬ p ) \cr 11 & . \quad [ ( ¬ p ) ∨ q ] \cr 12 & . \quad ( p ⊃ q ) \cr 13 & [ ( ¬ p ) ⊃ ( p ⊃ q ) ] \cr 14 & \{ [ ¬ ( p ⊃ q ) ] ⊃ [ ¬ ( ¬ p ) ] \} \cr 15 & \{ [ ¬ ( p ⊃ q ) ] ⊃ p ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 16 & . \quad q \cr 17 & . \quad [ ( ¬ p ) ∨ q ] \cr 18 & . \quad ( p ⊃ q ) \cr 19 & [ q ⊃ ( p ⊃ q ) ] \cr 20 & \{ [ ¬ ( p ⊃ q ) ] ⊃ ( ¬ q ) ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 21 & . \quad [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] \cr 22 & . \quad ( p ⊃ r ) \cr 23 & . \quad [ ( ¬ r ) ⊃ ( ¬ p ) ] \cr 24 & . \quad ( q ⊃ r ) \cr 25 & . \quad [ ( ¬ r ) ⊃ ( ¬ q ) ] \cr 26 & . \quad . \quad ( ¬ r ) \cr 27 & . \quad . \quad ( ¬ p ) \cr 28 & . \quad . \quad ( ¬ q ) \cr 29 & . \quad . \quad [ ( ¬ p ) ∧ ( ¬ q ) ] \cr 30 & . \quad . \quad [ ¬ ( p ∨ q ) ] \cr 31 & . \quad \{ ( ¬ r ) ⊃ [ ¬ ( p ∨ q ) ] \} \cr 32 & . \quad [ ( p ∨ q ) ⊃ r ] \cr 33 & \{ [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] ⊃ [ ( p ∨ q ) ⊃ r ] \} \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 34 & ( \{ [ ( ¬ p ) ⊃ r ] ∧ ( q ⊃ r ) \} ⊃ \{ [ ( ¬ p ) ∨ q ] ⊃ r \} ) \cr 35 & \{ [ ( ¬ p ) ⊃ r ] ∧ ( q ⊃ r ) \} ⊃ [ ( p ⊃ q ) ⊃ r ] \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 36 & ( \{ [ ( ¬ p ) ⊃ r ] ∧ [ ( ¬ q ) ⊃ r ] \} ⊃ \{ [ ( ¬ p ) ∨ ( ¬ q ) ] ⊃ r \} ) \cr 37 & ( \{ [ ( ¬ p ) ⊃ r ] ∧ [ ( ¬ q ) ⊃ r ] \} ⊃ \{ [ ¬ ( p ∧ q ) ] ⊃ r \} ) \end{array} \] \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 38 & . \quad \{ ( ¬ p ) ⊃ [ q ∧ ( ¬ q ) ] \} \cr 39 & . \quad \{ [ q ∧ ( ¬ q ) ] ⊃ p \} \cr 40 & . \quad ( \{ ( ¬ p ) ⊃ [ q ∧ ( ¬ q ) ] \} ∧ \{ [ q ∧ ( ¬ q ) ] ⊃ p \} ) \cr 41 & . \quad \{ ( \{ ( ¬ p ) ⊃ [ q ∧ ( ¬ q ) ] \} ∧ \{ [ q ∧ ( ¬ q ) ] ⊃ p \} ) ⊃ [ ( ¬ p ) ⊃ p ] \} \cr 42 & . \quad [ ( ¬ p ) ⊃ p ] \cr 43 & . \quad \{ [ ( ¬ p ) ⊃ p ] ⊃ p \} \cr 44 & . \quad p \cr 45 & ( \{ ( ¬ p ) ⊃ [ q ∧ ( ¬ q ) ] \} ⊃ p ) \end{array} \] The point of the preamble is to have lines 9, 14, 20, 33, 35, 37, and 45 available for appropriate substitutions. Most of these are previously used examples.

After copying the preamble our strategy will be to work our way through the tableau statement by statement, with each statement in tableau eventually appearing, either as is, in the case of statements to the left of the vertical line, or negated, in the case of statements to the right of the line. To start things off we introduce the negation of the statement we want to prove as a hypothesis. If we ignore branches for the moment then everything is fairly straightforward. Every statement in the tableau is derived from an earlier one using a tableau rule, of which there is one for each of the Boolean operators. When it’s \({ ¬ }\) either there’s nothing to do or we use our double negation rule to peel off two \({ ¬ }\)’s. When the operator is \({ ∧ }\) we can use our splitting rule. A slightly more complicated case is \({ ∨ }\). This must be on the right of the vertical line, since we’re still ignoring the branching cases. In this case we use De Morgan’s laws to move the \({ ¬ }\) inside the \({ ∨ }\), making it into an \({ ∧ }\), and then use splitting. The most complicated case is \({ ⊃ }\). For that we use a substitution instance of line 15 or 20, and then use modus ponens.

Branches are somewhat trickier. Whenever we branch we introduce the two possibilities as new hypotheses in turn, i.e. discharging the first one before we introducing the second. We start from a tableau proof, meaning a closed tableau, so we will eventually reach a contradiction in each branch. This contraction will be a pair of statements of the form \({ P }\) and \({ ( ¬ P ) }\) for some expression \({ P }\). Once this happens we join the statements to get \({ [ P ∧ ( ¬ P ) ] }\) and then discharge the hypothesis. After discharging the second hypothesis, i.e. the one from the right branch, what we do next depends on whether there are further undischarged hypotheses. If there are then we a substitution instance of one of lines 33, 35, or 37, depending on whether the operator which caused the branching was an \({ ∨ }\), \({ ⊃ }\), or \({ ∧ }\) to get a contradiction in our current scope. Once we are in the global scope, having discharged all hypotheses, we can use a substitution instance of line 45 and modus ponens to get the statement we were trying to prove, which is now in global scope and is not negated so we have a formal proof in the natural deduction system.

I’ll illustrate this with the tableau for \({ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} }\), which we saw earlier.

An analytic tableau for { { [ (p ⊃ q) ∧ (q ⊃ r) ] ⊃ (p ⊃ r) } }

As outlined above, we start by copying the preamble and then introducing the negation of the statement we want to prove as a hypothesis. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 46 & . \quad ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) \end{array} \] This is then followed by a substitution instance of line 15, \({ \{ [ ¬ ( p ⊃ q ) ] ⊃ p ] \} }\), specifically the one where we substitute \({ [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] }\) for \({ p }\) and \({ [ ( p ∨ q ) ⊃ r ] }\) for \({ q }\), and then modus ponens \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 47 & . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} \cr 48 & . \quad [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \end{array} \] We can then do essentially the same thing, but with line 20 rather than line 15, \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 49 & . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ¬ ( p ⊃ r ) ] \} \cr 50 & . \quad [ ¬ ( p ⊃ r ) ] \end{array} \] We can then apply splitting, twice, to line 48, \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 51 & . \quad ( p ⊃ q ) \cr 52 & . \quad ( q ⊃ r ) \end{array} \]

If you look at lines 46, 48, 50, 51 and 52 you’ll see that they match the first five lines of the tableau, except that statements to the right of the vertical line are negated. As explained above, this is our strategy, to follow the tableau, using our rules of inference, and sometimes substitution instances of statements from the preamble, to derive each statement from earlier statements. We continue by deriving \({ p }\) and \({ ¬ r }\), which will take us up to the first branch point. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 53 & . \quad \{ [ ¬ ( p ⊃ r ) ] ⊃ p \} \cr 54 & . \quad p \cr 55 & . \quad \{ [ ¬ ( p ⊃ r ) ] ⊃ ( ¬ r ) \} \cr 56 & . \quad ( ¬ r ) \end{array} \]

As described earlier, the strategy for dealing with branches is to introduce a hypothesis for each branch in turn and derive a contradiction under that hypothesis. We start the left branch by introducing the hypothesis \({ ( ¬ p ) }\) \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 57 & . \quad . \quad ( ¬ p ) \end{array} \] At this point we have both \({ p }\) and \({ ( ¬ p ) }\) available within the current scope, and can join them, and use a substitution instance of line 9, before discharging the hypothesis. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 58 & . \quad . \quad [ p ∧ ( ¬ p ) ] \cr 59 & . \quad . \quad \{ [ p ∧ ( ¬ p ) ] ⊃ [ s ∧ ( ¬ s ) ] \} \cr 60 & . \quad . \quad [ s ∧ ( ¬ s ) ] \cr 61 & . \quad \{ ( ¬ p ) ⊃ [ s ∧ ( ¬ s ) ] \} \end{array} \]

Having dealt with the left branch we now proceed to the right branch. Again we introduce a hypothesis. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 62 & . \quad . \quad q \end{array} \] In the tableau there’s another branch at this point, which we deal with in the same way as the previous one. The first step is to introduce a further hypothesis. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 63 & . \quad . \quad . \quad ( ¬ q ) \end{array} \] Within the current scope we have a \({ q }\) and a \({ ( ¬ q ) }\), from the previous two lines of the proof, which we can treat in the same way we treated \({ p }\) and \({ ( ¬ p ) }\) earlier. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 64 & . \quad . \quad . \quad [ q ∧ ( ¬ q ) ] \cr 65 & . \quad . \quad . \quad \{ [ q ∧ ( ¬ q ) ] ⊃ [ s ∧ ( ¬ s ) ] \} \cr 66 & . \quad . \quad . \quad [ s ∧ ( ¬ s ) ] \cr 67 & . \quad . \quad \{ ( ¬ q ) ⊃ [ s ∧ ( ¬ s ) ] \} \end{array} \]

That was the left subbranch of the right branch of the tableau. The right subbranch is handled similarly. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 68 & . \quad . \quad . \quad r \cr 69 & . \quad . \quad . \quad [ r ∧ ( ¬ r ) ] \cr 70 & . \quad . \quad . \quad \{ [ r ∧ ( ¬ r ) ] ⊃ [ s ∧ ( ¬ s ) ] \} \cr 71 & . \quad . \quad . \quad [ s ∧ ( ¬ s ) ] \cr 72 & . \quad . \quad \{ r ⊃ [ s ∧ ( ¬ s ) ] \} \end{array} \]

Now we need to combine the two subbranches. We do this by joining lines 67 and 72, substituting in line 35, applying modus ponens twice, and then discharging the hypothesis \({ q }\). \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 73 & . \quad . \quad ( \{ ( ¬ q ) ⊃ [ s ∧ ( ¬ s ) ] \} ∧ \{ r ⊃ [ s ∧ ( ¬ s ) ] \} ) \cr 74 & . \quad . \quad [ ( \{ ( ¬ q ) ⊃ [ s ∧ ( ¬ s ) ] \} ∧ \{ r ⊃ [ s ∧ ( ¬ s ) ] \} ) \cr & . \quad \quad \quad ⊃ \{ [ ¬ ( q ⊃ r ) ] ⊃ [ s ∧ ( ¬ s ) ] \} ] \cr 75 & . \quad . \quad \{ [ ( q ⊃ r ) ] ⊃ [ s ∧ ( ¬ s ) ] \} ] \cr 76 & . \quad . \quad [ s ∧ ( ¬ s ) ] \} \cr 77 & . \quad \{ q ⊃ [ s ∧ ( ¬ s ) ] \} \end{array} \] Then we combine the two branches in a similar way. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 78 & . \quad ( \{ ( ¬ p ) ⊃ [ s ∧ ( ¬ s ) ] \} ∧ \{ q ⊃ [ s ∧ ( ¬ s ) ] \} ) \cr 79 & . \quad [ ( \{ ( ¬ p ) ⊃ [ s ∧ ( ¬ s ) ] \} ∧ \{ q ⊃ [ s ∧ ( ¬ s ) ] \} ) ⊃ \{ [ ¬ ( q ⊃ r ) ] \cr & \quad \quad {} ⊃ [ s ∧ ( ¬ s ) ] \} ] \cr 80 & . \quad \{ [ ( p ⊃ q ) ] ⊃ [ s ∧ ( ¬ s ) ] \} ] \cr 81 & . \quad [ s ∧ ( ¬ s ) ] \} \cr 82 & \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ s ∧ ( ¬ s ) ] \} \end{array} \] Now we apply a substitution instance of line 45 and then modus ponens to complete the proof. \[ \begin{array}{l@{\quad}l} \hphantom{99} & \hphantom{ . \quad \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \} } \cr 83 & ( \{ ( ¬ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) ⊃ [ s ∧ ( ¬ s ) ] \} \cr & {} ⊃ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} ) \cr 84 & \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} \end{array} \]

After 84 lines we’ve finally proved \({ \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} }\). We proved it earlier in 8 lines, so why give a proof which is more than ten times longer? The point is that this proof was constructed in a purely mechanical way from the tableau. We can use the same method on any closed tableau to get a formal proof. The preamble will always be the same, as will the strategies for dealing with branches and contradictions, and the use of line 9 to complete the proof. This means tableaux can now be regarded as semiformal proofs rather than informal proofs, since a tableau tells us not just that a statement is true but also that it’s a theorem in our formal system.