## Sunday, 31 October 2010

### BSDNT - v0.21 PRNGs

In this update we are going to replace the el cheapo random generator in
bsdnt with something of higher quality.

Some time ago, Brian Gladman brought to my attention the fact that on 32
bit machines, the test code for bsdnt actually caused Windows to hang!

The issue required some sleuthing work on Brian's part to track down.
He eventually discovered the cause of the problem, and it was, oddly
enough, the pseudo-random number generator (PRNG) I had used.

Brian suspected the PRNG immediately because of his past experience as a
professional cryptographer. In fact, it turns out that PRNG's of the
kind that I had used, aren't particularly good even if they don't have
bugs!

The particular kind of PRNG I had used is called a linear congruential
PRNG. They start with the PRNG initialised to some random seed value,
n = seed. Then each time they are called, they apply the transformation
n = (n*c1 + c2) % p for some explicit constants c1, c2 and a large
enough "prime" p.

One can take c2 = 0 in the transformation and it is also common to see
p = 2^b for some b (e.g. b = WORD_BITS, and yes, I know p = 2^b is not
usually prime). When p = 2^b it is usually the case that the top half of
the bits output have reasonable random properties, but the bottom half
do not. In this case it is acceptable to stitch two LC PRNG's together to
give the full number of bits.

When p is an actual prime, these PRNG's are called prime modulus linear
congruential PRNG's and they aren't too bad when implemented properly.
They still fail to meet some standards of random quality.

To return a whole word of random bits, one either needs to use a prime
p that is larger than a word, which is usually impractical, or again
stitch the output of two prime modulus LC PRNG's together.

However, one needs to be careful, as the period of the generator is p-1,
so if one is on a 32 bit machine, it doesn't do to use a prime p just
over half the size of a word (the first mistake I made), otherwise the
period is just over 65536. That isn't too good for generating random
values for test code!

But how was my LC PRNG causing Windows to crash!? The reason was that
in some of the test functions we required bignums that didn't overflow
when summed together. This of course depends almost entirely on the top
few bits of the bignums being added together.

The problem was that in the expression n = (n*c1 + c2) % p, I was using
values of c1 and c2 which were not reduced mod p. It turns out that this
is crucial to correct operation. It might seem that the result ends up
being reduced mod p anyway, and indeed it would be if n*c1 fit in a word.
However, because it doesn't you actually get ((n*c1 + c2) % 2^32) % p
which causes the binary representation of the value generated to be quite
regular.

Anyhow, on Windows (and probably on other 32 bit machines) the test code
generates length 90 bignums over and over at some point, looking in vain
to find pairs of such bignums which when added do not overflow. As these
are garbage collected at the end of the test function, memory starts
filling up with the orphaned bignums that are discarded by the test code
as it looks for appropriate values. This eventually overwhelms the heap
allocator on Windows and causes the entire OS to crash!

The problem of writing decent PRNG's has been studied extensively, and one
expert in the subject is George Marsaglia. He famously turned up on a
usenet forum in January of 1999 and dumped not one, but piles of fast, high
quality PRNG's which do not suffer from the problems that other PRNG's do.

Amazingly, many of the PRNG's in common usage today are either written by
George, or based on ones he wrote. So he's some kind of legend!

Anyhow, we will make use of two of his PRNG's, Keep It Simple Stupid (KISS)
and Super KISS (SKISS) and a third PRNG called Mersenne Twister, due to
Makoto Matsumoto and Takuji Nishimura in 1997.

George's generators are in turn based on some simpler PRNG's. He begins by
defining a linear congruential generator, with c1 = 69069 and c2 = 1234567.
This is taken p = mod 2^32 (yes, it's not prime). This has good properties
on its top 16 bits, but not on its bottom 16 bits, and for this reason had
been widely used before George came along. This generator has period 2^32.

Next he defines a pair of multiply with carry (MWC) generators. These are
of the form n = c1*lo(n) + hi(n) where lo(n) is the low 16 bits of n, hi(n)
is the high 16 bits and c1 is an appropriately chosen constant.

He stitches together a pair of these MWC PRNG's mod 2^16 to give 32 random
bits. For simplicity we'll refer to this combined random generator as MWC.
This has a period of about 2^60.

Thirdly he defines a (3-)shift-register generator (SHR3). This views the
value n as a binary vector of 32 bits and applies linear transformations
generated from 32 x 32 matrices L and R = L^T according to
n = n(I + L^17)(I + R^13)(I + L^5) where I is the 32 x 32 identity matrix.

In order to speed things up, special transformations are chosen that can
be efficiently implemented in terms of XOR and shifts. This is called an
Xorshift PRNG. We'll just refer to it as SHR3.

Now given appropriate seed values for each of these PRNG's Marsaglia's
KISS PRNG is defined as MWC ^ CONG + SHR3. This generator passes a whole
slew of tests and has a period of 2^123. In this update we make it the
default random generator for bsdnt.

Super KISS is a random generator defined by George later in 2009. It gives
immense periods by adding together the output of three PRNG's, one with a
massive order. It is basically defined by SKISS = SUPR + CONG + SHR3.

Here, the new generator SUPR is based on a prime p = a*b^r + 1 such that
the order of b mod p has magnitude quite near to p - 1.

It starts with a seeded vector z of length r, all of whose entries are
less than b and an additional value c which is less than a.

One then updates the pair (z, c) by shifting the vector z to the left by
one place and setting the right-most entry to (b - 1) - ((a*z1 + c) mod b)
where z1 is the entry shifted out at the left of z. Then c is set to t/b.

Naturally in practice one uses b = 2^32 so that all the intermediate
reductions mod b are trivial.

As with most generators which have massive periods the "state" held by this
generator is large. It requires data mod p for a multiprecision p.

Note the similarity with the MWC generator except for the "complement" mod
b that occurs. This is called a CMWC (Complemented-Multiply-With-Carry)
generator.

George proposed using the prime p = 640*b^41265+1, where the order of b is
5*2^1320481. The period of the CMWC generator is then greater than
2^1300000.

Of course, at each iteration of the algorithm, 41265 random words are
generated in the vector. Once these are exhausted, the next iteration of
the algorithm is made.

The algorithm SUPR in the definition of SKISS is thus just a simple
array lookup to return one of the words of the vector z. Each time SKISS
is run, the index into the array is increased until all words of the array
are exhausted, at which point the CMWC algorithm is iterated to refill the
array.

We now come to describing the Mersenne twister.

It is based on the concept of a feedback shift register (FSR). An FSR shifts
its value left by 1 bit, feeding at the right some linear combination of the
bits in its original value. The Mersenne twister is conceptually a
generalisation of this.

The difference with the Mersenne twister is that the "feedback" is effected
by a certain "twist". This is effected by applying a "linear transformation"
A of a certain specific form, with multiplication by A having addition
replaced by xor in the matrix multiplication. The twist can be described
more straightforwardly, and we give the more straightforward description
below.

One sets up a Mersenne twister by picking a recurrence degree n, a "middle
word" 1 <= m <= n and a number of bits for a bitmask, 0 <= r <= 32. One
picks these values so that p = 2^(n*w - r) - 1 is a Mersenne prime (hence
the name of this PRNG). Given a vector of bits a = [a0, a1, ..., a{w-1}] of length
w and a sequence x of words of w bits, the Mersenne twister is defined by a
recurrence relation x[k+n] = x[k+m] ^ ((upper(x[k]) | lower(x[k+1])) A)
where upper and lower return the upper w - r and lower r bits of their
operands, and where A is the "twist" spoken of and defined below, in terms of
a. Of course ^ here is the xor operator, not exponentiation. For a vector X of w
bits, XA is given by X>>1 if X[0] == 0 otherwise it is given by (X>>1) ^ a.

Some theory is required to find an A such that the Mersenne twister will have
maximum theoretical period 2^(n*w - r) - 1.

To finish off, the Mersenne twister is usually "tempered". This tempering
simply mangles the bits in a well understood way to iron out some of the
known wrinkles in the MT algorithm.

Only a couple of sets of parameters are in common use for Mersenne twisters.
These are referred to as MT19937 for 32 bit words and MT19937-64 for 64 bit
words.

As with all PRNG's, there is a whole industry around "cracking" these things.
This involves starting with a short sequence of values from a PRNG and
attempting to find the starting constants and seed values.

Obviously, in crytographic applications, there is not much point generating
"secure" keys with a PRNG with a single source of entropy. Even if your key
is generated by multiplying primes of many words in length, if those words
were generated from a PRNG seeded from the current time, it may only take
a few iterations and a good guess as to which PRNG you used, to determine
the constants used in the PRNG and thus your entire key. And that's
irrespective of which constants you chose in your PRNG!

So if you are doing crypto, you need to take additional precautions to
generate secure keys. Just seeding a PRNG from the time probably won't cut
it!

Some PRNG's are more "secure" than others, meaning that knowing a
few output values in a row doesn't give terribly much information about
which values may follow. But if you rely on a PRNG to be secure, you
are essentially betting that because you don't know how to get the
next few values and nor does anyone else that has written about the
subject, then no one at all knows. Of course one needs to ask oneself
if they would tell you if they did.

Another assumption one should never make is that no one has the computing
power to brute force your PRNG.

Some PRNG's are designed for cryptographic applications, and maybe one can
believe that these are "safe" to use, for some definition of safe.

Anyhow, we only care about random testing at this point. In today's update
32 and 64 bit KISS, SKISS and MT PRNG's are added in the directory rand.
Our randword, randinit, and randclear functions are all replaced with
appropriate calls to KISS functions.

There is also an option to change the default PRNG used by bsdnt. Is it my
imagination or does the test code now run faster, even on a 64 bit machine!

At some point we will add some tests of the new PRNG's. These will compare
the outputs with known or published values to check that they are working as
designed for a large number of iterations.

Brian Gladman contributed to this article and also did most of the work
in implementing Marsaglia's PRNG's in bsdnt. The Mersenne twisters were
originally written by Takuji Nishimura and Makoto Matsumoto and made available
under a BSD license. Brian did most of the work in adapting these for bsdnt.

The code for today's update is here: v0.21

Previous article: v0.20 - redzones