Natural deduction

Some people find proving theorems in formal systems like Nicod’s or Łukasiewicz’s an entertaining sort of puzzle. Other people do not. What’s undeniable is that such proofs have a very different flavour from those of the rest of mathematics. There was a reaction against these and similar axiomatic systems which led to what’s known as ``natural deduction’’. One of the most important people behind this reaction was Łukasiewicz himself. Natural deduction systems are still formal systems, but their rules better reflect the way mathematicians typically think.

A formal system for natural deduction

There are a wide variety of natural deduction systems. We’ll use one whose language includes only the operators \({ ∧ }\), \({ ∨ }\), \({ ¬ }\), and \({ ⊃ }\). It has no axioms! In contrast it has a lot of rules of inference:

From statements \({ P }\) and \({ Q }\) we can deduce the statement \({ ( P ∧ Q ) }\). Also, from any statement of the form \({ ( P ∧ Q ) }\) we can deduce the statement \({ P }\) and the statement \({ Q }\).
From the statement \({ P }\) we can deduce the statement \({ ( P ∨ Q ) }\), where \({ Q }\) is any expression.
The expressions \({ [ ¬ ( ¬ P ) ] }\) and \({ P }\) are freely interchangeable. In other words, anywhere an expression of one of these forms appears in a statement we may deduce the statement where it has been replaced by the other.
From \({ P }\) and \({ ( P ⊃ Q ) }\) we can deduce \({ Q }\).
The expressions \({ ( P ⊃ Q ) }\) and \({ [ ( ¬ Q ) ⊃ ( ¬ P ) ] }\) are freely interchangeable.
The expressions \({ [ ( ¬ P ) ∧ ( ¬ Q ) ] }\) and \({ [ ¬ ( P ∨ Q ) ] }\) are freely interchangeable.
The expressions \({ [ ( ¬ P ) ∨ ( ¬ Q ) ] }\) and \({ [ ¬ ( P ∧ Q ) ] }\) are freely interchangeable.
The expressions \({ ( P ∨ Q ) }\) and \({ [ ( ¬ P ) ⊃ Q ] }\) are freely interchangeable.
The “Rule of Fantasy”, to be described below.
The “Rule of Substitution”, subject to restrictions to be discussed below.

The first four rules specify the behaviour of the four logical operators. They are closely related to our tableaux rules. Half of the first rule, which is called the rule of joining and separation, can be thought of an equivalent to the tableau rule \[ \begin{array}{r|l} ( P ∧ Q ) & \cr \hline P & \cr Q & \end{array} \] for example. It reflects the fact that if \({ ( P ∧ Q ) }\) is true then \({ P }\) and \({ Q }\) are true. It’s important to remember though that that’s a property of the intended interpretation, or rather interpretations, of the system, while the rule above is a rule of inference within the system. Similarly the third rule above is related to the tableau rule \[ \begin{array}{r|l} ( ¬ P ) & \cr \hline & P \end{array} \qquad \begin{array}{r|l} & ( ¬ P ) \cr \hline P & \end{array} \] with the first one applied with \({ ( ¬ P ) }\) in place of \({ P }\). It reflects the “fact” that if \({ P }\) is not not true then it is true. The quotation marks reflect the reality that not everyone accepts this a logical principle. This is one of the dividing lines between “classical” and “intuitionist” logic. The fourth rule is one we’ve met before, under the name modus ponens. It’s tableau counterpart is more complicated. It consists of following both branches from a \({ ( P ⊃ Q ) }\) but then using the \({ P }\) to immediately close off the left branch, which would have a \({ P }\) to the right of the bar.

The fifth rule is called the rule of the contrapositive, it is the basis of proofs by contradiction. It is another dividing line between “classical” and “intuitionist” logic. The sixth and seventh rules are two known as De Morgan’s laws. The eighth rule of inference is really just the observation we made when discussing Łukasiewicz’s system that \({ ∨ }\) is expressible in terms of \({ ¬ }\) and \({ ⊃ }\).

Introducing and discharging hypotheses

The ninth rule, the “rule of fantasy” in Hofstadter’s terminology, is more complicated to explain, but reflects a common practise in informal proofs. We often say “Suppose \({ P }\)”. We then reason for a while under that assumption and reach a conclusion \({ Q }\). We then conclude “So if \({ P }\) then \({ Q }\).” There are two common circumstances in which we do this. One is proof by contradition, where we then immediately apply the rule of the contrapositive. The other is case by case analysis, which was the basis of our tableau method. In such applications there will be a separate application of the rule of fantasy for each possible case. Writers of informal proofs are under no obligation to tell you which of these two uses they have in mind and sometimes you have to read the whole proof to find out but often a clue is is in the verb forms used. In a proof by contradiction people are more likely to write “Suppose … were true” rather than “Suppose … is true”. But this is not something you can entirely rely on.

I’ve just described what the rule of fantasy is intended to do, but not the precise rules governing it. They’re just a formalised version of the rules which mathematicians follow in informal arguments. We need some terminology. The step of saying “Suppose \({ P }\)” is called introducing the assumption or hypothesis \({ P }\). The step of saying “So if \({ P }\) then \({ Q }\)” is called discharging this hypothesis or assumption. Everything in between is called the scope of the hypothesis. It’s possible, and indeed common, to introduce further hypotheses within the scope of an existing one, and so have nested scopes. In arguments based on tableax this corresponds to branches within branches.

Scope determines which statements are accessible for use by the other rules at any point in a proof. When you enter a new scope by introducing a hypothesis you retain access to everything in the scope you were in. When you leave that scope by discharging that hypothesis you lose access to all statements since you entered it. The only trace of any of the reasoning which took place within that scope is the single statement \({ ( P ⊃ Q ) }\) generated by discharging the hypothesis. This restriction on the accessible statements is needed to ensure that you can’t deduce a statement by introducing \({ P }\) as a hypothesis unless it’s of the form \({ ( P ⊃ Q ) }\). Otherwise you could prove all statements, true or false, by introducing them as a hypothesis and then using them outside of the scope of that hypothesis. All the other rules are to be interpreted as implicitly subject to this restriction. So when we say that from \({ ( P ∧ Q ) }\) we can deduce \({ P }\) and \({ Q }\) we mean that in a scope where \({ ( P ∧ Q ) }\) is accessible we can deduce \({ P }\) and \({ Q }\). It doesn’t allow us to deduce \({ P }\) or \({ Q }\) if the statement \({ ( P ∧ Q ) }\) appeared after some hypothesis was introduced and before it was discharged.

Statements outside the scope of any hypotheses are said to have “global” scope. Only such statements are theorems.

In informal proofs it can be difficult to spot where hypotheses are introduced and where they’re discharged and therefore difficult to know which ones are accessible. This is particularly problematic in proofs by contradiction. The whole point of a proof by contradiction is that the hypothesis which is introduced will later be shown to be false. Everything in the scope of that hypothesis could, and usually does, depend on that false hypothesis and therefore should never be used outside that particular proof by contradiction argument. This is a common source of error for students. If you scan through a textbook looking for things which might be useful in problem you’re attempting then you may find useful statements in the proof of a theorem. They’ll typically depend on various hypotheses which have been introduced though and can’t safely be used in a context where those hypotheses aren’t known to be true.

In a formal system we need some way to indicate the introduction and discharging of hypotheses.

The safest way to do it would be to include a list of all active, i.e. not yet discharged, hypotheses before each statement. That solves the problem described above of using statements outside of their scope, but at the cost of making proofs very long and repetitive.

Jaśkowski, one of the founders of the theory of natural deduction, used boxes. A box encloses all the statements within a give scope and it’s straightforward to see which statements are available within it. Starting from wherever we currently are we have available any statement we can reach by crossing zero or more box boundaries from inside to outside, but we are not allowed to cross any from outside to inside. This notationational convention would probably have been more popular if it weren’t a nightmare to typeset.

A popular alternative is to use indentation. Every time we introduce a hypothesis we increase the indentation and every time we discharge one we restore it to its previous value. The first statement after the indentation is increased is the newly introduced hypothesis. The first statement after the indentation has been restored is the result of discharging the hypothesis, i.e. the statement \({ ( P ⊃ Q ) }\) where \({ P }\) is the hypothesis and \({ Q }\) is the last statement before the indentation was restored. This is a very compact notation but using spaces for indentation can cause problems. Screen readers will generally ignore spaces. Even for sighted readers judging the number of spaces at the start of a line is error-prone. It’s better to use a non-whitespace character. We’ll use dots.

The “rule of fantasy” is Hofstadter’s terminology. It accurately reflects the use of the rule, to explore the consequences of a statement not known to be true, but don’t expect anyone to understand you if you use the term outside of this module.

There is some redundancy in the rules above, in the sense that there are proper subsets of those rules with the property that any statement which has a proof using the full set also has a proof using only the subset. But the point of a natural deduction system is to formalise something close to the way mathematicians actually reason rather than to have an absolutely minimal system. If you like minimal systems then you’re better off with Nicod.

There are, as we’ll see some restrictions needed on the rule of substitution but I’ll get to those once we have some examples of proofs. In the interim I will be careful not to use that rule.

Some proofs

As an example, consider this proof of the statement \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\). \[ \begin{array}{l} . \quad p \cr . \quad . \quad q \cr . \quad . \quad ( p ∧ q ) \cr . \quad [ q ⊃ ( p ∧ q ) ] \cr \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} \end{array} \] Our first step is to introduce a hypothesis. In this system the first step is always to introduce a hypothesis. There are no axioms and every other rule deduces a statement from previous statements, of which we have none. It’s not hard to guess which hypothesis to introduce. The statement we want to prove is \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) and it starts with \({ p ⊃ }\) so if we introduce \({ p }\) and then manage to prove \({ [ q ⊃ ( p ∧ q ) ] }\) then we will be done. So we introduce \({ p }\). What next? We want to prove \({ [ q ⊃ ( p ∧ q ) ] }\). It starts with \({ q ⊃ }\) so try the same thing, introducing \({ q }\) as a further hypothesis. If we can prove \({ ( p ∧ q ) }\) within the scope of this hypothesis then we will have proved \({ [ q ⊃ ( p ∧ q ) ] }\) within the scope of the hypothesis \({ p }\) and therefore will have proved \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) within the global scope, i.e. in the absence of any hypotheses. At this point we have two statements available within our current scope \({ p }\) and \({ q }\). We just introduced \({ q }\). We inherited \({ p }\) from the outer scope. This is what I meant when I said that when you enter a new scope by introducing a hypothesis you retain access to everything in the scope you were in. So we have \({ p }\) and \({ q }\) and want \({ ( p ∧ q ) }\). Fortunately our first rule of inference, the rule of joining and separation, does exactly this. So the proof may look strange at first but really at each stage we do the only thing we can and it works.

As another example, consider \({ [ p ∨ ( ¬ p ) ] }\), which is often called the “law of the excluded middle”. In this case it’s less obvious how to start. For the reasons discussed above we must start by introducing a hypothesis, but what hypothesis. The statement isn’t of the form \({ ( P ⊃ Q ) }\) for any choice of expressions \({ P }\) and \({ Q }\). Looking at our rules though we see that one of them allows us to derive \({ [ p ∨ ( ¬ p ) ] }\) from \({ [ ( ¬ p ) ⊃ ( ¬ p ) ] }\). That is easily proved with the fantasy rule. We introduce the hypothesis \({ ( ¬ p ) }\) and then immediately discharge it. The complete proof is \[ \begin{array}{l} . \quad ( ¬ p ) \cr [ ( ¬ p ) ⊃ ( ¬ p ) ] \cr [ p ∨ ( ¬ p ) ] \end{array} \]

Substitution

This is where we should talk about the rule of substitution. A very common style of mathematical argument, and one which we formalised in the tableau method, is case by case analysis. The simplest type of case by case analysis is where we have only two cases, one where a certain statement is true and one where it’s false. If we can prove a certain conclusion in both of those cases then that conclusion must be true. Or at least it must be if you accept the law of the excluded middle. Intuitionists don’t.

But to apply the case by case analysis above we need the law of the excluded middle for expressions and not just for variables. In other words we need \({ [ P ∨ ( ¬ P ) ] }\) to be a theorem for every expression \({ P }\). There are three ways of accomplishing this. One is to run exactly the same argument as above with \({ p }\) replaced everywhere by \({ P }\). The rules we used, of which there were only two, refer to expressions rather to variables and so no change is needed. We could do the same with \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\), by the way. For any expressions \({ P }\) and \({ Q }\) we could replace every \({ p }\) by the expression \({ P }\) and every \({ q }\) by the expression \({ Q }\) in the proof of \({ \{ p ⊃ [ q ⊃ ( p ∧ q ) ] \} }\) and obtain a proof of \({ \{ P ⊃ [ Q ⊃ ( P ∧ Q ) ] \} }\). A disadvantage of this approach is that we need to repeat the argument for each \({ P }\) and \({ Q }\) we need to result for. We can’t just do it with the letters \({ P }\) and \({ Q }\) in place of the letters \({ p }\) and \({ q }\) because \({ P }\) and \({ Q }\) aren’t even elements of our language, just elements of the language we use to talk about our language. We have to substitute the actual expressions, and so we’ll need to redo that work whenever we need the result for a different pair of expressions.

Another option is to leave the realm of formal proof and enter the realm of semiformal proof. The argument above shows that for any expressions \({ P }\) and \({ Q }\) the statements \({ [ P ∨ ( ¬ P ) ] }\) and \({ \{ P ⊃ [ Q ⊃ ( P ∧ Q ) ] \} }\) are theorems. Anything you can derive from them using our rules of inference is also a theorem. But now we’re not giving proofs of statements but rather proofs that statements have proofs. That’s a semiformal proof rather than a formal one.

The third option is to bring in the rule of substitution, which was a rule of inference in both the Nicod and Łukasiewicz systems. This seems redundant, since we’ve just seen how one can get around the lack of a rule of substitution by just substituting expressions for variables within a proof, but it’s convenient to have and we already decided we aren’t trying for a minimal system.

There’s a subtle danger here though. Consider the following proof: \[ \begin{array}{l} . \quad p \cr . \quad q \cr ( p ⊃ q ) \end{array} \] The first step is to introduce the hypothesis \({ p }\). The second is to apply the rule of substitution, replacing the variable \({ p }\) with the expression \({ q }\). Variables are expressions so this is okay. The third step is to discharge the hypothesis. But \({ ( p ⊃ q ) }\) is not a tautology. There is an interpretation under which it is false, namely the one where \({ p }\) is assigned the value true and \({ q }\) is assigned the value false, and therefore it should not be a theorem.

What went wrong? There’s a difference in interpretation between statements in the Nicod or Łukasiewicz systems and in a natural deduction system. In the Nicod or Łukasiewicz systems every statement is meant to be unconditionally true. We could stop a proof at any point and have a proof of the last statement before we stopped. This is not the case for natural deduction systems. Only statements in global scope are meant to be unconditionally true. All other statements are meant to be true if all the active hypotheses for their scope are true. In other words they’re only conditionally true. That’s why I had to specify that only statements in global scope are theorems. The problem with the argument above is that once we’ve introduced \({ p }\) as a hypothesis it’s no longer just any variable. It’s the specific variable whose truth everything will be dependent upon until we discharge that hypothesis. Replacing it with some other variable, or some other expression, can’t safely be allowed.

How can we repair this? We could sacrifice the rule of fantasy but the rule of fantasy is the foundation of our system. It is literally impossible to prove anything without it. We could limit our rule of substitution by saying that only statements with global scope are available for substitution. This is a sound rule of inference. We know it’s sound because the other rules are sound and this one is redundant. Anything we could prove with it we could also prove without it, by the technique we discussed earlier of repeating the proof but with expressions substituted in for the variables. It’s unnecessarily restrictive though, since it can sometimes be safe to substitute for some variables in a statement within the scope of a hypothesis. The precise rule is that in any available statement we may substitute any expression for any variable which does not appear in any hypothesis which was active in its scope. In the global scope no hypotheses are active and so we can substitute for any variable, but elsewhere certain variables will not be substitutable. Note that which variables are substitutible depends on the scope of the statement into which we’re substituting, not on our current scope at the point where we want to make the substitution.

This is the first example we’ve seen of a phenomenon where not all variables are equally variable. Some have special status in a particular context which restricts what we can do with them. It won’t be the last such example. Something similar will happen when we move on to first order logic.

Although I’ve put in some effort above to ensure that you can substitute into statements within the scope of a hypothesis, in those cases where it’s safe, you shouldn’t generally structure your proofs in a way which makes that necessary. If you’re going to need multiple substitution instances of a statement then you should prove that statement in global scope so that it’s available everywhere. Often that means writing your proofs out of order. Once you release you’ll need multiple instances of a statement you need to go back and insert a proof of that statement at the start of your argument.

When reading proofs it’s useful to know that people do this. If an author starts by proving a bunch of random facts whose usefulness isn’t immediately apparent and which don’t reappear for several pages that’s not necessarily just bad exposition. It may well be that their being proved there to make it clear that they’re in global scope, i.e. that their truth is not contingent on hypotheses which will be made later.

More proofs

Here’s another fairly short proof. Try going through each line and seeing if you can identify which rule is being used. \[ \begin{array}{l} . \quad ( p ⊃ q ) \cr . \quad [ ( ¬ q ) ⊃ ( ¬ p ) ] \cr . \quad [ q ∨ ( ¬ p ) ] \cr \{ ( p ⊃ q ) ⊃ [ q ∨ ( ¬ p ) ] \} \end{array} \]

The next proof is a bit trickier to find but is very useful. \[ \begin{array}{l} . \quad [ p ∧ ( ¬ p ) ] \cr . \quad p \cr . \quad ( ¬ p ) \cr . \quad . \quad ( ¬ q ) \cr . \quad . \quad [ ¬ ( ¬ p ) ] \cr . \quad \{ ( ¬ q ) ⊃ [ ¬ ( ¬ p ) ] \} \cr . \quad [ ( ¬ p ) ⊃ q ] \cr . \quad q \cr \{ [ p ∧ ( ¬ p ) ] ⊃ q \} \end{array} \] The theorem \({ \{ [ p ∧ ( ¬ p ) ] ⊃ q \} }\) is known as the “principle of explosion”. Unlike other fanciful names, like the “rule of fantasy”, this name is quite standard and people should understand what you’re talking about if you use it. This theorem shows that from a contradiction it’s possible to derive anything at all. In a theory with contradictions all statements are theorems. A useful substitution instance of this \({ \{ [ p ∧ ( ¬ p ) ] ⊃ p \} }\).

Here are two more complicated examples. \[ \begin{array}{l} . \quad [ ¬ ( p ⊃ q ) ] \cr . \quad [ ¬ ( \{ ¬ q \} ⊃ \{ ¬ p \} ) ] \cr . \quad [ ¬ ( q ∨ \{ ¬ p \} ) ] \cr . \quad [ ( ¬ q ) ∧ ( ¬ \{ ¬ p \} ) ] \cr . \quad [ ( ¬ q ) ∧ p ] \cr \{ [ ¬ ( p ⊃ q ) ] ⊃ [ ( ¬ q ) ∧ p ] \} \end{array} \] and \[ \begin{array}{l} . \quad [ p ∧ ( ¬ p ) ] \cr . \quad p \cr . \quad ( ¬ p ) \cr . \quad . \quad ( ¬ q ) \cr . \quad . \quad [ ¬ ( ¬ p ) ] \cr . \quad \{ ( ¬ q ) ⊃ [ ¬ ( ¬ p ) ] \} \cr . \quad [ ( ¬ p ) ⊃ q ] \cr . \quad q \cr \{ [ p ∧ ( ¬ p ) ] ⊃ q \} \cr \{ [ r ∧ ( ¬ r ) ] ⊃ r \} \cr . \quad \{ ( p ∨ q ) ∧ [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] \} \cr . \quad ( p ∨ q ) \cr . \quad [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] \cr . \quad ( p ⊃ r ) \cr . \quad ( q ⊃ r ) \cr . \quad [ ( ¬ p ) ⊃ q ] \cr . \quad . \quad ( ¬ p ) \cr . \quad . \quad q \cr . \quad . \quad r \cr . \quad [ ( ¬ p ) ⊃ r ] \cr . \quad [ ( ¬ r ) ⊃ p ] \cr . \quad . \quad ( ¬ r ) \cr . \quad . \quad p \cr . \quad . \quad r \cr . \quad [ ( ¬ r ) ⊃ r ] \cr . \quad r \cr ( \{ ( p ∨ q ) ∧ [ ( p ⊃ r ) ∧ ( q ⊃ r ) ] \} ⊃ r ) \end{array} \] You might notice that the first part simply repeats the proof given earlier for \({ \{ [ p ∧ ( ¬ p ) ] ⊃ q \} }\). One annoying feature of formal systems is that they start the proof of every theorem from scratch. No mechanism is provided for reusing previously proved theorems other than repeating their proofs.

As a final example we consider the tautology \({ \{ [ (p ⊃ q) ∧ (q ⊃ r) ] ⊃ (p ⊃ r) \} }\) we encountered earlier. A proof is \[ \begin{array}{l} . \quad [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] \cr . \quad . \quad p \cr . \quad . \quad ( p ⊃ q ) \cr . \quad . \quad ( q ⊃ r ) \cr . \quad . \quad q \cr . \quad . \quad r \cr . \quad ( p ⊃ r ) \cr \{ [ ( p ⊃ q ) ∧ ( q ⊃ r ) ] ⊃ ( p ⊃ r ) \} \end{array} \]

Soundness, consistency and completeness

Because natural deduction systems were developed to mimic exactly the sort of reasoning we formalised with the method of analytic tableau it is possible, though somewhat tedious, to give an algorithm for converting a tableau into a proof in this system. The construction of a tableau for a tautology can also be automated. Combining these we see that there is an algorithm which takes tautologies and generates formal proofs for them. So every tautology is a theorem and therefore the system is complete. It’s also sound because all the axioms are true and all the rules of inference preserve truth. It’s easy to see that all the axioms are true because there aren’t any! Showing that the rules of inference are sound is fairly straightforward until we get to the rule of fantasy and the rule of substitution. You can read through the descriptions of those rules and convince yourself that they can’t be used to prove true statements from untrue ones. There’s a good chance you could have convinced yourself even if I’d left out the restrictions on substitution though and neglected to mention that the unrestricted version makes \({ p ⊃ q }\) into a theorem, so I’m not sure how far you can trust your intuition on these things. Unfortunately I can’t give a formal proof of soundness though because we don’t have a formal language in which we can state it. Soundness, as before, implies consistency.

It’s also possible for the standard axiomatic systems, like that of Nicod or Łukasiewicz, to give algoritms for converting proofs in a natural deduction system to proofs within the axiomatic system. This is generally the easiest way of showing that those systems, which we already know to be consistent, are complete.