20170523

The Mersenne Twister

The Mersenne Twister is a 623-dimensionally equidistributed uniform pseudorandom number generator. This is how it works.

The paper first defines a $k$-distribution as a very reasonable definition of randomness. Read the paper first.

The Linear Recurrence

Equation 2.1 in M&N 1998 is the linear recurrence $$ \mathbf{x}_{k + n} := \mathbf{x}_{k + m} \oplus (\mathbf{x}_k^u | \mathbf{x}^l_{k + 1}) A $$ ($k = 0, 1, \ldots$). The paper says the following.

We have several constants: an integer $n$, which is the degree of the recurrence, an integer $r$ (hidden in the definition of $\mathbf{x}_k^u$), $0 \leq r \leq w - 1$, an integer $m$, $1 \leq m \leq n$, and a constant $w \times w$ matrix $A$ with entries in $\mathbb{F}_2$.

From the paper, $w$ is the “word length”, so $\mathbf{x}_i \, \in \, \mathbb{F}_2^w$ are “word vectors” of length $w$.

On the right, we have this $\mathbf{x}_k^u | \mathbf{x}^l_{k + 1}$ term. This is the concatonation of $\mathbf{x}_k^u$ with $\mathbf{x}^l_{k + 1}$. $\mathbf{x}_k^u$ is the “upper $w - r$ bits” of $\mathbf{x}_k$ and $\mathbf{x}^l_{k + 1}$ is the “lower $r$ bits” of $\mathbf{x}_{k + 1}$, i.e. denoting $$ \mathbf{x}_k = (x_{w - 1}, \ldots, x_{r}, x_{r - 1}, \ldots, x_0), $$ then $$ \mathbf{x}_k^u = (x_{w - 1}, \ldots, x_{r}), \qquad \mathbf{x}_k^l = (x_{r - 1}, \ldots, x_0). $$

Given $n$ initial seeds $\mathbf{x}_0, \ldots, \mathbf{x}_{n - 1}$, we can generate $$ \mathbf{x}_{n} := \mathbf{x}_{m} \oplus (\mathbf{x}_0^u | \mathbf{x}^l_{1}) A $$ (where $k = 0$). The only constants here are $n$, $m$, $r$ and $A$.

$A$ is the sparse matrix $$ A = \begin{pmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & & 0 \\ \vdots & \vdots & & \ddots & \vdots \\ 0 & 0 & & & 1 \\ a_{w - 1} & a_{w - 2} & a_{w - 3} & \cdots & a_0 \end{pmatrix} $$ where we denote the bottom row as $\mathbf{a} = (a_{w - 1}, \ldots, a_0)$.

For $\mathbf{x} = (x_{w - 1}, \ldots, x_0)$, $$ \mathbf{x}A = (x_0 a_{w - 1}, x_{w - 1} + x_0 a_{w - 2}, \ldots, x_1 + x_0 a_0). $$

Notice that if $x_0 = 0$, this simplifies to $$ \mathbf{x}A = (0, x_{w - 1}, \ldots, x_1) $$ which amounts to a right bitshift, i.e. $\mathbf{x}A = \mathbf{x} \gg 1$. If $x_0 = 1$ instead, then $$ \mathbf{x}A = (a_{w - 1}, x_{w - 1} + a_{w - 2}, \ldots, x_1 + a_0) $$ and this reduces to an addition of $\mathbf{a}$, i.e. $\mathbf{x}A = (\mathbf{x} \gg 1) \oplus \mathbf{a}$. (Recall that this is an xor, since we are working in $\mathbb{F}_2$.)

For MT19937, the parameters are $w = 32$, $n = 624$, $m = 397$, $r = 31$ and $\mathbf{a} = \mathtt{0x9908B0DF}$ (M&N 1998, Table II).

Initialisation

There is an undocumented parameter $f$ which is used to initialise the internal state as $$ \mathbf{x}_i = f \cdot (\mathbf{x}_{i - 1} \oplus (\mathbf{x}_{i - 1} \gg (w - 2))) + i. $$ This is given the value $f = 1812433253$.

Tempering

The linear operator $T$ is a $w \times w$ invertible matrix. We define “tempering” to be the operation $$ \mathbf{x} \mapsto \mathbf{z} = \mathbf{x} T. $$ $T$ consists of the following bitwise operations. $$\begin{aligned} \mathbf{y} &:= \mathbf{x} \oplus (\mathbf{x} \gg u), \\ \mathbf{y} &:= \mathbf{y} \oplus \bigl((\mathbf{y} \ll s) \otimes \mathbf{b}\bigr), \\ \mathbf{y} &:= \mathbf{y} \oplus \bigl((\mathbf{y} \ll t) \otimes \mathbf{c}\bigr), \\ \mathbf{z} &:= \mathbf{y} \oplus (\mathbf{y} \gg l) \end{aligned}$$ (equations 2.2 to 2.5). These parameters are $u = 11$, $s = 7$, $\mathbf{b} = \mathtt{0x9D2C5680}$, $t = 15$, $\mathbf{c} = \mathtt{0xEFC60000}$ and $l = 18$ (M&N 1998, Table II).

An Implementation

This is the Mersenne Twister in Python. The linear recurrence occurs in _twist and the tempering in temper.


#!/usr/bin/env python3
"""The Mersenne Twister"""

class MT19937:
    u = 11
    s, b = 7, 0x9d2c5680
    t, c = 15, 0xefc60000
    l = 18

    def __init__(self, seed: int) -> None:
        self.index = 624
        self.state = [0] * 624
        self.state[0] = seed & 0xffffffff
        for i in range(1, 624):
            self.state[i] = 0x6c078965 \
                            * (self.state[i - 1] ^ self.state[i - 1] >> 30) \
                            + i & 0xffffffff

    @staticmethod
    def temper(y: int) -> int:
        y ^= y >> MT19937.u
        y ^= y << MT19937.s & MT19937.b
        y ^= y << MT19937.t & MT19937.c
        y ^= y >> MT19937.l
        return y

    def random(self) -> int:
        if self.index >= 624:
            self._twist()
        y = self.temper(self.state[self.index])
        self.index += 1
        return y

    def _twist(self) -> None:
        for i in range(624):
            y = (self.state[i] & 0x80000000) \
                + (self.state[(i + 1) % 624] & 0x7fffffff)
            self.state[i] = self.state[(i + 397) % 624] ^ y >> 1
            if y % 2 != 0:
                self.state[i] ^= 0x9908b0df
        self.index = 0

def main() -> None:
    x = MT19937(5489)      # Seed from mt19937ar.c
    for _ in range(1000):
        print(x.random())

if __name__ == "__main__":
    main()

Untempering

The Mersenne Twister is not cryptographically secure since the transformation $T$ is bijective. Due to the existence of $T^{-1}$, we can create an “untempering” function. If we can record $n$ consecutive outputs of the Mersenne Twister and untemper them, this will give us the internal state of the random number generator and allow us to predict all future values.


#!/usr/bin/env python3
"""Clone an MT19937 RNG from its output"""

from mt import MT19937

def untemper(y: int) -> int:
    y ^= y >> MT19937.l
    y ^= y << MT19937.t & MT19937.c
    for _ in range(7):
        y ^= y << MT19937.s & MT19937.b
    for _ in range(3):
        y ^= y >> MT19937.u
    return y

def clone_mt(mt: MT19937) -> MT19937:
    mt_clone = MT19937(0)
    for i in range(624):
        mt_clone.state[i] = untemper(mt.random())
    return mt_clone

def main() -> None:
    mt = MT19937(5489)
    mt_clone = clone_mt(mt)

    for _ in range(2000):
        assert mt_clone.random() == mt.random()

    print(mt_clone)

if __name__ == "__main__":
    main()

See m21.py and m23.py on my GitHub account.