The next division function we'll introduce is so-called Hensel division, or

right-to-left division. This starts at the least significant word and applies

the usual division algorithm back towards the most significant word.

Of course, now any "remainder" will be at the most significant word end and

have a completely different meaning.

Hensel division is actually just division mod B^m. Thus if q is the Hensel

quotient, you have a = q * d mod B^m. In fact, if r is the Hensel remainder,

you have a = q*d + r*B^m.

If a fits in m words and the division is exact (i.e. a = q*d), the Hensel

division and ordinary (euclidean) division return the same quotient. If the

division is not exact, this is no longer the case.

However, it may be that chaining Hensel divisions together does produce an

exact division, even when division over the bottom part of the chain

wouldn't produce an exact division. Thus, the ability to chain Hensel

divisions, and thus the ability to return the Hensel remainder, is

important.

Well, now we come to the first issue if we want to implement this. What is

the C operator for Hensel division? Actually, there doesn't seem to be one.

We have to implement it ourselves.

If a0 is a word from our dividend, and d is our single word divisor, then

we require q0 such that d * q0 = a0 mod B. In other words, we want q0 and

r0 such that d * q0 = a0 + r0*B.

Suppose we can find a value dinv such that dinv * d = 1 mod B. Then

we have that dinv * d * q0 = dinv * a0 mod B, i.e. q0 = dinv * a0 mod B,

which is a single word-by-word mullow.

Once we have q0 we can compute d * q0, the high word of which is the Hensel

"remainder" r0. We need to subtract this from the next limb of our divisor

before continuing.

We'll make the convention that carry-ins and carry-outs from our Hensel

division are positive rather than negative. We'll just remember to subtract

them. This way, if q is our Hensel quotient and r our Hensel remainder,

then d * q = a + r * B^n.

So how do we compute dinv? Firstly, we better require that d is odd, else

dinv * d = 1 mod B is an impossibility. With this restriction, we have

gcd(d, B) = 1 and therefore the extended euclidean algorithm lets us find

s, t such that s*d + t*B = 1. Then modulo B we have s*d = 1 mod B, i.e.

dinv = s is the precomputed inverse that we are after.

Another way to solve dinv * d = 1 mod B is to use Hensel lifting. This

works by first solving the equation mod 2, then use that solution to

solve it mod 4 and so on. This can all be done with nothing more than

multiplications.

Firstly, we know a solution mod 2, namely 1 (as d is odd), i.e. if v1 = 1

then we have v1 * d = 1 mod 2. Now suppose we have a solution vk mod 2^k,

i.e. vk * d = 1 mod 2^k. We'd like to find a value wk such that

(vk + 2^k * wk) * d = 1 mod 2^2k. This is always possible -- I omit the

easy proof -- so we only need to solve this equation for wk, i.e.

(d * vk - 1) = -2^k * wk * d. We can get rid of the d on the right hand

side by multiplying by vk, i.e. 2^k * w = ((1 - d * vk) * vk) (mod 2^2k).

In other words, two multiplications will suffice to compute

2^k * wk (mod 2^2k) from vk. Then v{k+1} = vk + 2^k * wk is the inverse

of d mod 2^2k.

As a further optimisation of all this, one can just work mod 2^64 all

the way through the computation. In other words, start with v1 = 1

and compute v{k+1} = vk + ((1 - d * vk) * vk) (mod 2^64), six times.

To get from a solution mod 2 to a solution mod 2^64 clearly only takes

six Hensel lifting steps. Of course, with a lookup table mod 2^8 to start

from, instead of starting from a solution mod 2, one can do it in three

steps, requiring just six word-by-word multiplications (mullow only).

However, today I am feeling very lazy and I leave it as an exercise for

someone to implement the Hensel lifting lookup table method.

We add a simple type to hold our precomputed Hensel inverse, and a function

for computing it.

The actual nn_divrem_hensel_1_preinv function is again a straightforward

loop. We ensure our remainder is positive at each step and ensure that

we propagate any borrows from subtractions at each step.

The test code for the Hensel division is very similar to that for the

ordinary euclidean division except that d must be odd and the chaining

test will proceed right-to-left.

OK, that is enough brain strain for one day!

Here v0.9 is the github repository for today's post.

Previous post: v0.8 - divrem1_preinv

Next post: v0.10 - mod1_preinv

Does Blogger not support LaTeX math?

ReplyDelete