% Nb: This file must be passed through cweave before LaTeX-ing.
%    cweave -j Universal.w
%    latex Universal
% To create the Universal Java class:
%    ctangle -j Universal.w

\documentclass[12pt,a4paper]{cweb}
% \usepackage[hypertex]{hyperref}
\usepackage{url}
\usepackage{amsmath}
\usepackage{amsfonts}
%\usepackage{mathrsfs}
\usepackage{moreverb}

\usepackage{amsthm}
\newtheorem{theorem}{Theorem}
\newtheorem{proposition}{Proposition}
\newtheorem{corollary}{Corollary}
\newtheorem{lemma}{Lemma}
\newtheorem{program}{Program}

\usepackage{drop}
\font\largefont=yinitas

\addtolength{\textheight}{20mm}
\addtolength{\topmargin}{-10mm}

\newtheorem{definition}{Definition}
\newcommand{\norm}[1]{\Vert #1 \Vert}
\newcommand{\code}[1]{\langle #1 \rangle}
\newcommand{\Code}[1]{[#1]}
\newcommand{\vv}{|}

\def\defin{define} % As in webmac.tex
\def\C{\textsc{C}}
\def\Java{\textsc{Java}}
\def\D{\defin{define}} % macro definition

\newcommand{\BB}{\mathbb{B}}
\renewcommand{\S}{\mathbb{S}}

%\newcommand{\TL}{\mathscr{T}}
%\newcommand{\SL}{\mathscr{S}}
\newcommand{\TL}{\mathcal{T}}
\newcommand{\SL}{\mathcal{S}}

\newcommand{\action}{\operatorname{action}}
\newcommand{\transition}{\operatorname{transition}}

\newenvironment{preamble}{\begin{center}\begin{tabular}{|p{0.8\textwidth}|}%
\hline\\[-5pt]}%
{\\[3pt]\hline\end{tabular}\end{center}}

\catcode`\"=\active
\def"#1"{\ifmmode\mathtt{#1}\else\texttt{#1}\fi}

\def\CwebRankNoEject{1}

\title{A Universal Machine}

\author{Timothy Murphy\\
\texttt{<tim@@maths.tcd.ie>}\\
School of Mathematics, Trinity College Dublin}

\date{\today}

\begin{document}

\maketitle

\tableofcontents

\begin{preamble}
\drop{C}{{\scshape ertain Turing machines}} have the remarkable property
that they can emulate, or imitate, every other Turing machine.
These \emph{universal} machines play a central r\^ole in our theory.
From our previous work, we can construct such a machine
in any of four ways: as a Turing machine,
as a program in the language $\TL$,
as a stack machine,
or as a program in the language $\SL$.
We opt for the last,
writing a program for a 4-stack machine $U$
with the required universal property.
\end{preamble}

@* Pre-requisites.

We have previously introduced the notion of a \emph{Turing machine} $T$
(following Chaitin's model),
and the \emph{Turing function}
\[
T: \S^\ast \to \S^\ast
\]
which it implements.
Here $\S$ denotes the set of all finite strings of 0's and 1's,
while $\S^\ast = \S \cup \bot$,
where $T(s) = \bot$ means that $T(s)$ is undefined ---
either $T$ enters an infinite loop when $s$ is input,
or $T$ tries to read past the end of $s$,
or $T$ halts before reading in the whole of $s$.
(By convention we set $T(\bot) = \bot$.)

We have also considered 3 other ways of implementing Turing functions:
\begin{itemize}
\item
by a program in the language $\TL$
(which is just another way of defining a Turing machine);
\item
by a stack machine $S$ 
(using a number of stacks rather than a tape);
\item
by a program in the language $\SL$
(which is just another way of defining a stack machine).
\end{itemize}

We shall define our machine by a program in $\SL$.
Recall that such a program consists of a number of statements,
each of which takes one of the forms
\[
"put0", "put1", "read", "write", "push"i, "pop"i, "jump"\pm j
\]
followed by `$;$'.
Here $i$ is the number of a stack (in our case 0--3),
while $"jump"\pm j$ invokes a jump from statement number $n$
to statement number $n \pm j$.

We also allow labelled statements, eg
\[
|loop|: "read";
\]
with a corresponding "jumpto", eg
\[
"jumpto"\ |loop|;
\]
However, we do not regard this as part of the language $\SL$ (or $\TL$).
It is understood that a `pre-processor' will change the "jumpto"
to an appropriate "jump".

We also assume it known that
\begin{enumerate}
\item
Turing machines (ala Chaitin) and 2-stack machines are equivalent
in the sense that to each Turing machine $T$ there corresponds a 2-stack machine $S$ such that
\[
S(s) = T(s) \quad \forall s \in \S^\ast;
\]
while conversely to each to each 2-stack machine $S$
there corresponds a Turing machine $T$ such that
\[
T(s) = S(s) \quad \forall s \in \S^\ast.
\]

\item
For each $n \ge 2$, $n$-stack machines and 2-stack machines
are equivalent, ie to each to each n-stack machine $S$
there corresponds a 2-stack machine $S'$ such that
\[
S'(s) = S(s) \quad \forall s \in \S^\ast.
\]
\end{enumerate}

@* Universal Turing machines.

\begin{definition}
The Turing machine $U$ is said to \emph{emulate} the Turing machine $T$
if there exists a string $t$ such that
\[
U(ts) = f(s)
\]
for every string $s\in\S$.

The Turing machine $U$ is said to be \emph{universal}
if it implements every Turing machine $T$.
\end{definition}

@* Existence.

\begin{theorem}
There exists a universal Turing machine.
\end{theorem}

\emph{Proof.}
We have seen that each Turing function $f(s)$
can be defined by a 2-stack machine $S$:
\[
f(s) = S(s)
\]
for all $s \in \S$.

We are going to construct a 4-stack machine $U$
such that
\[
U(\code{S}s) = S(s)
\]
for every 2-stack machine $S$
and every $s \in \S$,
where $\code{S}$ is the code for $S$ described below.

This will establish the result,
since we know that every stack machine
implements a Turing function.

We shall define the stack machine $U$
through a program in our language $\SL$.

@* A code for 2-stack machines.

Recall that a stack machine $S$ is defined by a map
\[
S: Q \times \BB \to A \times Q,
\]
where $Q$ is the set of states $\{q_0,\dots,q_n\}$,
$\BB = \{0,1\}$ are the bits that can appear in the accumulator,
and $A$ is the set of possible actions.
Thus $S(q_i,b) = (a,q_j)$ means: if the machine is in state $q_i$
and bit $b$ is in the accumulator
then action $a$ is taken and the machine moves into state $q_j$.

In the case of a 2-stack machine there are 8 possible actions,
which we code by 3 bits according to the following table:
\begin{center}
\begin{tabular}{\vv l \vv l \vv}
\hline
"put0" & 000\\
"put1" & 001\\
"read" & 010\\
"write" & 011\\
"push0" & 100\\
"push1" & 101\\
"pop0" & 110\\
"pop1" & 111\\
\hline
\end{tabular}
\end{center}

Equivalently, if the machine has $n$ states
then it can be defined by $2n$ `rules'
\[
R[0,0], R[0,1], R[1,0], R[1,1], \dots ,R[n-1,0], R[n-1,1],
\]
where the rule $R[i,b]$ specifies the action $a = A(i,b)$ to be taken
if the machine is in state $i$ with $b$ in the accumulator,
as well as the new state $j = Q(i,b)$ it moves into.

We code this rule by
\[
\code{R} = \code{a}\Code[j] = \code{a}1^j0,
\]
where $\code{a}$ is the code for the action defined above;
and we code the 2-stack $n$-state machine $S$
\[
\code{S} = \code{R[0,0]}1\code{R[0,1]}1\code{R[1,0]}1\cdots\code{R[n-1,1]}0.
\]

Note that we mark the end of a rule with a $1$,
allowing us to signal the end of the rules with a $0$.

Our universal machine $U$ will have the property that
\[
U(\code{S}s) = S(s)
\]
for every 2-stack machine $S$ and every string $s \in \S$.
Note that this implies in particular
that $U(\code{S}s)$ is defined if and only if $S(s)$ is defined.

@* The stack convention.

We follow the (usual) convention
of denoting the contents of a stack
by a string $s$,
with \emph{the top of the stack
corresponding to the end of the string}.
Thus if
\[
"stack"0 = 011,
\]
then bit 1 is on the top of the stack;
so $"pop"0$ would pop the 1 from the top of the stack
and place it in the accumulator, eg
\[
"pop"0: (|accumulator|,"stack"0) = (0,011) \mapsto (1,01),
\]
while
\[
"push"0: (0,011) \mapsto (0,0110).
\]

@* Winding and rewinding.

When we read a string into a stack ---
or transfer a string from one stack to another ---
the order of its bits is reversed:
\[
s \mapsto \overline{s}.
\]
Thus, if we want the string in its original order,
we must transfer it twice.

We shall use the term \emph{winding}
to describe the transfer of a rule (or other string)
from reverse order on one stack (normally stack 2)
to correct order on another stack (normally stack 3);
and we shall use the term \emph{rewinding}
for the converse operation ---
transfer of a string in correct order on stack 3
to reverse order on stack 2. 

When a rule appears in reverse order,
we need to modify its code slightly, to
\[
\overline{\code{R}} = \Code[j] \overline{\code{a}} = 1^j0 \overline{\code{a}};
\]
and we modify the code for the machine similarly to
\[
\overline{\code{S}} = 1\overline{\code{R[n-1,1]}}0\cdots0\overline{\code{R[0,0]}}.\]


@* The 4 stacks and their functions.

Each of the 4 stacks in $U$
is assigned a specific function, as follows: 
\begin{description}
\item[Stack 0]
will hold the contents of $S$'s stack 0.
\item[Stack 1]
will hold the contents of $S$'s stack 1.
\item[Stack 2]
(the \emph{rule stack}) 
will hold the ---
or part of the code ---
$\overline{\Code{S}}$
for the stack machine $S$ begin emulated,
in reverse order
(so that rule $R[0,0]$ is at the top of the stack).
\item[Stack 3]
(the \emph{auxiliary stack}) 
will store part of $\code{S}$
while we examine the appropriate rule on stack 2.
\end{description}

Stacks have the nice property that material can be stored temporarily
on the top of the stack,
provided we ensure that the material is cleared
when we want to access the `real' data at the bottom of the stack.

We use the tops of stacks 0 and 1 in this way.
For example, we store the current $S$-state $i$ temporarily
as a string $01^i$ on top of stack 0;
and we store the current $S$-bit on top of stack 1.
Of course we must remove these temporary strings
before emulating the action of $S$,
which might involve pushing onto or popping from
stacks 0 or 1.

@* Literate programming.

While developping \TeX, 
Donald Knuth introduced what he called ``literate programming'',
a system in which a single file (called a \textsc{Web} file)
contains both documentation and program code.
The \textsc{Web} file can be processed in two ways;
by the program |weave| to produce the documentation
(as for example this document);
or by the program |tangle| to produce the program code.

The \textsc{Web} file consists of a number of \emph{chunks},
each of which has a \TeX\ part and a program part
(either of which may be empty).
One program chunk can contain references to other program chunks,
with the understanding that the latter are to be incorporated
into the former when writing the program code.

The present document illustrates this system in practice.
The program code produced by |tangle| is listed at the end
of the document.

@* Program structure.

Our program for $U$ starts by reading the rules $\Code{S}$ 
for the 2-state machine $S$ that we are emulating
into the `rule-stack', stack 2.
We want the rules to appear on this stack \emph{in reverse order}, 
so that rule $R[0,0]$ is on top of the stack.
To achieve this we first read the rules onto the auxiliary stack, stack 3,
and then rewind them onto the rule stack, stack 2.

The machine $U$ then enters the main loop,
each circuit of which corresponds to emulation of
a single step of $S$.

@(UniversalMachine.S@>=
@<Initialize stacks@>@;
@<Read the rules of $S$ onto the auxiliary stack@>@;
@<Rewind rules onto rule stack@>@;
MainLoop:
  @<Wind through rule stack to correct rule@>@;
  @<Take specified action, and save $S$-bit on stack 1@>@;
  @<Check for end of program@>@;
  @<Save output $S$-state on stack 0@>@;
  @<Rewind rules onto rule stack@>@;
  put1;@+ jumpto MainLoop;

@* Initialization.

As explained above,
we make temporary use of stacks 0 and 1 ---
between emulating steps of $S$ ---
to store the current $S$-state and $S$-bit.

Of course these `appendices' must be removed
before we emulate each $S$-action.

Initially $S$ is in state $q_0$,
with 0 in its accumulator.
To this end we place 0 initially at the bottom
of stacks 0 and 1.
We also place 0 at the bottom of stacks 2 and 3,
to mark the end of the rules.
(This is not actually necessary in the case of stack 2,
the rule stack,
since we assume that the rules never specify an output state
outside the range for which rules are given.)

@<Initialize stacks@>=
put0;@+ push0;@+ push1;@+ push2;@+ push3;

@* Reading in the rules.

Recall that we start by reading the rules
into the auxiliary stack (stack 3),
before rewinding them onto the rule stack (stack 2).

After we have read into stack 3, stacks 0, 1 and 2 contain a single 0,
while stack 3 holds
\[
0\code{R[0,0]}'1\code{R[0,1]}'1\code{R[1,0]}'1\cdots
1\code{R[n-1,1]}'1,
\]
where
\[
\code{R[i,b]}' = \code{A[i,b]}01^{Q[i,b]},
\]
with the 0 on the other side of the $1^q$
since now we shall be reading from the top of the stack,
ie from the end of the string.

@<Read the rules of $S$ onto the auxiliary stack@>=
  @<Read action@>@;
  @<Read |outState|@>@;
  @<Read bit; if 1 read in next rule@>@;

@*1 Reading the action.

Recall that we code the action in 3 bits.

@<Read action@>=
  read;@+ push3;
  read;@+ push3;
  read;@+ push3;

@*1 Reading the out-state.

We read the out-state $\Code{j} = 1^j0$,
and push it (onto stack 3) as $01^j$.
To simplify the code, we push the final 0 onto the stack,
and then pop it.

@<Read |outState|@>=
  put0;@+ push3;
  read;@+ push3;@+ jump -2;
  pop3;

@*1 Checking for the last rule.

Recall that the last rule is signalled by a 0,
earlier rules being succeeded by a 1.
On the stack we replace the final 0 by a 1.

@<Read bit...@>=
  read;@+ push3;@+ jump -14;
  pop3;

@* Rewinding the rules.

It is convenient to express this in a slightly more general form
for later use;
we assume that a certain number of rules
$R[0,0], R[0,1], \dots, R[i,b]$
are on stack 3
(with $R[0,0]$ on the bottom),
while the remaining rules are already on stack 2.

At the end of this phase, 
all the rules have been transferred to stack 2;
stacks 0, 1 and 3 contain a single 0,
while stack 2 holds
\[
0\code{R[n-1,1]}^\ast1\code{R[n-1,0]}^\ast1\code{R[n-2,1]}^\ast1\cdots
1\code{R[0,0]}^\ast,
\]
where
\[
\code{R[i,b]}^\ast = 01^{Q]i,b]}\overline{\code{A[i,b]}},
\]
$\overline{s}$ denoting the reverse of string $s$.

Note that each of the rules on stack 3 is preceded by a 1
(reading from the top of the stack),
with the end of the rules signalled by a 0.
For simplicity, we start near the end of our cycle,
so that we can drop out when we see a 0.

@<Rewind rules...@>=
@<Rewind state@>@;
@<Rewind action@>@;
@<Check for bottom of stack, otherwise read in new rule@>@;

@ @<Rewind state@>=
put0;@+ push2;
pop3;@+ push2;@+ jump -2;
pop2;

@ @<Rewind action@>=
pop3;@+ push2;
pop3;@+ push2;
pop3;@+ push2;

@ @<Check for bottom of stack...@>=
pop3;@+ push2;@+ jump -14;
pop2;@+ push3;

@* The main cycle.

At the start of the cycle,
if $S$ is in state $q_i$
with $b$ in the accumulator
then the top of stack 0 holds $01^{2i+b}$.

Our first task is to wind through $2i+b$ rules ---
the number of 1's on top of stack 0 ---
to reach the appropriate rule $R[i,b]$.

Note that this process will remove
the $01^{2r+b}$ on top of stack 0.

@ @<Wind through rule stack to correct rule@>=
put1;@+ jumpto Test;
NextRule: @<Wind single rule onto auxiliary stack@>@;
Test: pop0; jumpto NextRule;

@ @<Wind single rule onto auxiliary stack@>=
@<Wind action@>@;
@<Wind state@>@;
@<Wind marker bit@>@;

@ @<Wind action@>=
pop2;@+ push3;
pop2;@+ push3;
pop2;@+ push3;

@ @<Wind state@>=
put0;@+ push3;
pop2;@+ push3;@+ jump -2;
pop3;

@ @<Wind marker bit@>=
pop2;@+ push3;

@* Taking action.

The 3-bit code specifying the action
is now on top of stack 2.
Essentially, we have to implement a simple jump-table ---
remembering to save the action-bits on stack 3.

@ @<Take specified action...@>=
pop2;@+ push3;@+ jumpto PushAndPop;
@<Put and IO@>@;
PushAndPop: @<Push and pop@>@;
Done: 

@ @<Put and IO@>=
pop2;@+ push3;@+ jump 7;
@<Put 0 or 1@>@;
@<Read or write@>@;

@ @<Put 0 or 1@>=
pop1;
pop2;@+ push3;
push1;
put1;@+ jumpto Done;

@ @<Read or write@>=
pop2;@+ push3;@+ jump 6;
pop1;@+ read;@+ push1;
put1;@+ jumpto Done;
pop1;@+ write;@+ push1;
put1;@+ jumpto Done;

@ @<Push and pop@>=
pop2;@+ push3;@+ jump 14;
@<Push onto stack 0 or 1@>@;
@<Pop from stack 0 or 1@>@;

@ @<Push onto stack 0 or 1@>=
pop2;@+ push3;@+ jump 6;
pop1;@+ push0;@+ push1;
put1;@+ jumpto Done;
pop1;@+ push1;@+ push1;
put1;@+ jumpto Done;

@ @<Pop from stack 0 or 1@>=
pop2;@+ push3;@+ jump 6;
pop1;@+ pop0;@+ push1;
put1;@+ jumpto Done;
pop1;@+ pop1;@+ push1;
put1;@+ jumpto Done;

@* Checking if the program has ended.

Recall that the $S$-program ends
when the machine enters state 0
with 1 in the accumulator.
This is signalled by a 0 on the top of stack 2,
and a 1 on the top of stack 1.

@<Check for end of program@>=
pop2;@+ push2;@+ jump 7;
pop1;@+ push1;@+ jump 3;
put1;@+ jump 2;
halt;

@* Saving the output state.

The output state $j$ is now at the top of stack 2,
while the accumulator bit $b$ is on top of stack 1.
We want to store $01^{2r+b}$ on top of stack 0,
to indicate the number of the rule we need to access
during the next cycle.

@<Save output $S$-state...@>=
put0;@+ push0;@+ push3;
pop2;@+ push3;@+ push0;@+ push0;@+ jump -4;
pop0;@+ pop0;@+ pop3;
pop1;@+ push1;@+ push0;@+ jump 2;
pop0;

@* The program in $\SL$.

\verbatiminput{UniversalMachine.S}

@* Example.

The following code $\code{S}$ defines a 2-stack machine $S$
which implements the function
\[
[m] [n] \mapsto [mn],
\]
where
\[
[n] = \underbrace{1\dots1}_{\text{$n$ 1's}}0.
\]

\verbatiminput{multiply.stackcode}

Now let us append the line
\begin{verbatim}
1110111110
\end{verbatim}
to this file.
Then the command
\begin{verbatim}
java StackProgram UniversalMachine < multiply.stackcode
\end{verbatim}
will result in the output
\begin{verbatim}
1111111111111110
\end{verbatim}

@* Summary.

We have constructed a universal machine $U$
which performs exactly like the 2-state machine $S$,
provided the program is prefixed by
the code $\code{S}$ for the machine:
\[
U\left(\code{S}s\right) = S(s).
\]

\end{document}