main application is in doing exact division.

For that reason, we might want to focus on Hensel division without

remainder first.

This would take an m word integer and divide it by an n word integer,

division proceeding from right to left. Essentially it gives us division

modulo B^m.

However, before we implement this, it makes sense to think about what its

"inverse" operation might be.

If q is the Hensel quotient of a by d mod B^m, then the inverse operation

can be thought of as multiplication of q by d mod B^m.

Hensel division gives an m word quotient q, so this would imply that its

inverse should be a multiplication of {d, n} by {q, m} mod B^m. We might call

this inverse operation mullow, as it returns the low m words of the product

d*q.

However, we need to be careful with this kind of multiplication. We'd also

like to have a mulhigh which returns the high part of the multiplication,

and we'd like the sum of mullow and mulhigh to be the same as a full mul.

However, there is a problem if mullow merely returns the product mod B^m. Any

carries out of mullow will have been lost. Also, all the word by word

multiplications that contribute to the high word of the product mod B^m

will be thrown away.

To rectify the problem we accumulate an "overflow" out of the mullow

corresponding to the sum of all these high words and carries. As this

overflow is essentially the sum of arbitrary words it may take up two words.

Thus, instead of mullow yielding m words it will yield m + 2 words. We'd

like to pass the extra two words as an "overflow-in" to mulhigh, thus the

logical thing is to return these two words from mullow separately from the

product mod B^m itself.

Hensel division will also return two overflow words. After all, what it

essentially does to the dividend is subtract a mullow of the quotient by the

divisor. So, the overflow from Hensel division will be defined as precisely

the overflow from mullow(q*d).

We manage the overflow by accumulating it in an dword_t. However, as we don't

wish the user to have to deal with dword_t's (these are used in our internal

implementations only), we split this dword_t into two separate words at the

end and return them as an array of two words representing the "overflow".

Today we shall only implement mullow and mulhigh. The first of these is a lot

like a full multiplication except that the addmul1's become shorter as the

algorithm proceeds and the carry-out's have to be accumulated in two words, as

explained.

At the same time we implement mulhigh. This takes two "overflow-in" words and

computes the rest of the product, again in a fashion similar to a full

multiplication.

Our test code simply stitches a mullow and mulhigh together to see that the

chain is the same as a full multiplication.

we have to be careful in that if one does an n by n mullow, the mulhigh that

we wish to chain with this must start with an n-1 by 1 multiplication,

not an n by 1, otherwise the sum of the mullow and mulhigh would contain the

cross product of certain terms twice.

We also have to be careful in the case where the full multiplication is only

a single word by a single word. Here the overflow out of the mullow part is

only a single word and there is no mulhigh to speak of. It merely passes

the overflow from the mullow straight through.

Both mullow and mulhigh must accept all nonzero lengths, as per full

multiplication. This causes a few cases to deal with in mulhigh. This doesn't

seem particularly efficient or elegant, but there seems to be little we

can do about that.

An interesting question is what the inverse of mulhigh is. Essentially,

this is our euclidean divapprox.

There's something slightly unsatisfying here though. Recall that the divapprox

algorithm proceeds by subtracting values q1*d' where q1 is the current quotient

word and d' is what is currently left of the divisor. We throw away one word of

the divisor at each iteration until finally we are left with just two words.

It would be wholly more satisfying if we didn't require this extra word of the

divisor throughout. We'd then be working right down to a single word in the

divisor so that we will really have subtracted a mulhigh by the time the

algorithm completes.

Any modification I can think of making to the euclidean division to make this

seem more natural also makes it much less efficient.

Perhaps some further thought will lead to a more satisfying way of thinking about

these things, which isn't also less efficient in practice.

The code for today is here: v0.16

Previous article: v0.15 - divapprox_classical

Previous article: v0.15 - divapprox_classical

Next article: v0.17 - div_hensel

## No comments:

## Post a Comment