The next division function we'll introduce is so-called Hensel division, or
right-to-left division. This starts at the least significant word and applies
the usual division algorithm back towards the most significant word.
Of course, now any "remainder" will be at the most significant word end and
have a completely different meaning.
Hensel division is actually just division mod B^m. Thus if q is the Hensel
quotient, you have a = q * d mod B^m. In fact, if r is the Hensel remainder,
you have a = q*d + r*B^m.
If a fits in m words and the division is exact (i.e. a = q*d), the Hensel
division and ordinary (euclidean) division return the same quotient. If the
division is not exact, this is no longer the case.
However, it may be that chaining Hensel divisions together does produce an
exact division, even when division over the bottom part of the chain
wouldn't produce an exact division. Thus, the ability to chain Hensel
divisions, and thus the ability to return the Hensel remainder, is
important.
Well, now we come to the first issue if we want to implement this. What is
the C operator for Hensel division? Actually, there doesn't seem to be one.
We have to implement it ourselves.
If a0 is a word from our dividend, and d is our single word divisor, then
we require q0 such that d * q0 = a0 mod B. In other words, we want q0 and
r0 such that d * q0 = a0 + r0*B.
Suppose we can find a value dinv such that dinv * d = 1 mod B. Then
we have that dinv * d * q0 = dinv * a0 mod B, i.e. q0 = dinv * a0 mod B,
which is a single word-by-word mullow.
Once we have q0 we can compute d * q0, the high word of which is the Hensel
"remainder" r0. We need to subtract this from the next limb of our divisor
before continuing.
We'll make the convention that carry-ins and carry-outs from our Hensel
division are positive rather than negative. We'll just remember to subtract
them. This way, if q is our Hensel quotient and r our Hensel remainder,
then d * q = a + r * B^n.
So how do we compute dinv? Firstly, we better require that d is odd, else
dinv * d = 1 mod B is an impossibility. With this restriction, we have
gcd(d, B) = 1 and therefore the extended euclidean algorithm lets us find
s, t such that s*d + t*B = 1. Then modulo B we have s*d = 1 mod B, i.e.
dinv = s is the precomputed inverse that we are after.
Another way to solve dinv * d = 1 mod B is to use Hensel lifting. This
works by first solving the equation mod 2, then use that solution to
solve it mod 4 and so on. This can all be done with nothing more than
multiplications.
Firstly, we know a solution mod 2, namely 1 (as d is odd), i.e. if v1 = 1
then we have v1 * d = 1 mod 2. Now suppose we have a solution vk mod 2^k,
i.e. vk * d = 1 mod 2^k. We'd like to find a value wk such that
(vk + 2^k * wk) * d = 1 mod 2^2k. This is always possible -- I omit the
easy proof -- so we only need to solve this equation for wk, i.e.
(d * vk - 1) = -2^k * wk * d. We can get rid of the d on the right hand
side by multiplying by vk, i.e. 2^k * w = ((1 - d * vk) * vk) (mod 2^2k).
In other words, two multiplications will suffice to compute
2^k * wk (mod 2^2k) from vk. Then v{k+1} = vk + 2^k * wk is the inverse
of d mod 2^2k.
As a further optimisation of all this, one can just work mod 2^64 all
the way through the computation. In other words, start with v1 = 1
and compute v{k+1} = vk + ((1 - d * vk) * vk) (mod 2^64), six times.
To get from a solution mod 2 to a solution mod 2^64 clearly only takes
six Hensel lifting steps. Of course, with a lookup table mod 2^8 to start
from, instead of starting from a solution mod 2, one can do it in three
steps, requiring just six word-by-word multiplications (mullow only).
However, today I am feeling very lazy and I leave it as an exercise for
someone to implement the Hensel lifting lookup table method.
We add a simple type to hold our precomputed Hensel inverse, and a function
for computing it.
The actual nn_divrem_hensel_1_preinv function is again a straightforward
loop. We ensure our remainder is positive at each step and ensure that
we propagate any borrows from subtractions at each step.
The test code for the Hensel division is very similar to that for the
ordinary euclidean division except that d must be odd and the chaining
test will proceed right-to-left.
OK, that is enough brain strain for one day!
Here v0.9 is the github repository for today's post.
Previous post: v0.8 - divrem1_preinv
Next post: v0.10 - mod1_preinv
Does Blogger not support LaTeX math?
ReplyDelete