During the past few weeks, Brian Gladman has been doing

some tremendous updates, including some random number generators and

making bsdnt work on Windows (MSVC).

We discuss all matters related to bsdnt on our development list:

Anyhow, now for today's update.

We'd now like to implement a variant of our divrem_classical algorithm. This

time we'd like to just return a quotient, with no remainder. The question is,

can this be done in less time than a full divrem?

At first sight, the answer seems to be no. As we saw in the post on the divrem

algorithm, every word of both the dividend and divisor counts, and we need

to keep the dividend completely updated as the algorithm proceeds, otherwise

we will get the wrong quotient.

So, with the exception of perhaps the final update (which is only needed to

determine the remainder), there doesn't seem to be much we can save.

But what if we allowed the quotient to be approximate, say within 1 of the actual

quotient? In fact, let's demand that if q' is our approximate quotient, that |a -

d*q'| < d. In other words, we allow the quotient to be too large by 1, but not

too small by 1.

Ideally, what we would like is to be doing about half the updating work.

Specifically, we'd like to be truncating both the dividend and divisor as we

go.

Ordinarily we start with something like

AAAAAAAAAAAAAAA /

DDDDDDDDD

To get the first quotient word we shift the divisor and pad with zeroes, thus

AAAAAAAAAAAAAAA /

DDDDDDDD000000

After one iteration, a word of the dividend has been removed, and we then

shift the divisor again.

AAAAAAAAAAAAAA /

DDDDDDDD00000

We continue until we have

AAAAAAAAA /

DDDDDDDD

Now there is only one quotient word left. But we notice that we don't use

most of the remaining dividend or the divisor to determine this quotient

word. In fact, we could almost truncate to

AA /

D

What if we had truncated at this same point all along?

In fact, if we truncate so that the final division is a two word by one

word division (here we have to be careful, in that we are talking about the

number of words *after* normalisation), then clearly our quotient could be

out by as much as two on that final division, by what we have said in an

earlier post. That is of course ignoring any accumulated error along the way.

As we don't wish to multiply the entire thing out to see what we have to do

to correct it, it is clear that this amount of truncation is too great.

So let's truncate one further word to the right in both the dividend and

divisor, so that the final division (to get the final quotient word) is a

three word by two word division.

In fact, in the example above, as there will be five quotient words, there

will be five iterations of the algorithm, after which we want two words of

the divisor remaining. So, we will start with

AAAAAAA.A /

DDDDDD.D

(The decimal points I have inserted are arbitrary, and only for notational

purposes in what follows.)

After five iterations, throwing away one more word of the divisor each time,

we'll be left with

AA.A /

D.D

The first thing to notice is that our previous divrem algorithms, with the

adjustments they made as they went, gave the precise quotient given the

data they started with.

The second thing to notice is that truncating both the dividend and the

divisor at the same point, as above, will not yield a quotient that is too

small. In fact, the quotient we end up with will be the same as what we would

have obtained if we had not truncated the dividend at all, and only truncated

the divisor. Additional places in the dividend can't affect the algorithm.

Truncating the divisor, on the other hand, may result in a different quotient

than we would have obtained without truncation. In fact, as we end up

subtracting less at each update than we would if all those words were still

there, we may end up with a quotient which is too large. The divisor may also

divide more times, because of the truncation, than it would have if it had not

been truncated.

However, it is not enough to merely consider how the quotient changes with

truncation in order to see how far we can be out. We'll likely end up with a

very pessimistic estimate if we do this, because we may suppose that the

quotient can be one too large at each iteration, which is not true.

Instead, the quantity to keep track of is the original dividend minus the

product of the full divisor d and the computed quotient q. At the end of the

algorithm, this is the actual remainder we'll end up with, and we'd like to

keep track of how much our *computed* remainder (what's left of the dividend

after the algorithm completes) differs from this actual remainder.

Essentially, we accumulate an error in the computed remainder due to the

truncation.

Clearly, at each iteration, the word after the decimal point in what remains

of the dividend may be completely incorrect. And we may miss a borrow out of

this place into the place before the decimal point. So after n iterations of

the algorithm, the dividend may become too large by n. Of course n.0 is much

smaller than our original (normalised) divisor d (also considered as a decimal

D.DD...).

At the final step of the algorithm, we will have a dividend which is too large

by at most this amount, and we'll be using a divisor which is truncated to

just two words. However, the latter affects the computed remainder by an

amount much less than the original d (if Q is the final quotient word, it is

as though we added q*0.0DDDDDD to our divisor, so that the full divisor would

go Q times where it otherwise would only go Q-1 times).

So these two sources of error only increase the computed value q'*d + r (where

q' is the computed quotient and r is the computed remainder) by an amount less

than d. Thus, the computed quotient q' can be at most one larger than the

actual quotient q.

This is equivalent to the required |a - d*q'| < d.

So it seems that if we truncate out dividend at the start of the algorithm,

and our divisor after each iteration, we can get an approximate quotient q'

within the required bounds.

We'll leave it to another time to describe the usefulness of an algorithm

which computes a quotient which may be out by one. What we will note is that

we've done computations with much smaller integers. It therefore costs us

significantly less time than a full divrem.

In this week's branch we implement this algorithm. In the test code, we check

the required quotient is the same as the one returned by divrem, or at most

one too large.

The trickiest part is ensuring we truncate at the right point. We want to

finish on the last iteration with two words *after* normalisation of the

divisor.

Actually, if we are really smart, we realise that if d does not need

to be shifted by much to normalise it, we can get away with finishing with

just two *unnormalised* words in the divisor. The error will still be much

less than d.

To be safe, if the number of quotient words is to be n, I check if the

leading word of the unnormalised divisor is more than 2*n. If not, too much

normalisation may be required, and I set up the algorithm to finish with

three unnormalised words instead of two. Otherwise it is safe to finish

with two words in the unnormalised divisor.

The algorithm begins by computing the number of words s the divisor needs to

be to start. This is two more than the number of iterations required to

get all but the final quotient word, since we should have two words in the

divisor at this point. If normalisation is going to be a problem, we add one

to this so that we compute the final quotient word with three unnormalised

words in the divisor.

Now the number of words required to start depends only on the size of the

quotient, and thus it may be more than the number of words d actually has.

Thus we begin with the ordinary divrem algorithm until the number of words

required is less than the number of words d actually has.

Now we truncate d to the required number of words and the dividend to one

more than that. The remaining part of the algorithm proceeds in the same

manner, throwing away a divisor word every iteration.

time we'd like to just return a quotient, with no remainder. The question is,

can this be done in less time than a full divrem?

At first sight, the answer seems to be no. As we saw in the post on the divrem

algorithm, every word of both the dividend and divisor counts, and we need

to keep the dividend completely updated as the algorithm proceeds, otherwise

we will get the wrong quotient.

So, with the exception of perhaps the final update (which is only needed to

determine the remainder), there doesn't seem to be much we can save.

But what if we allowed the quotient to be approximate, say within 1 of the actual

quotient? In fact, let's demand that if q' is our approximate quotient, that |a -

d*q'| < d. In other words, we allow the quotient to be too large by 1, but not

too small by 1.

Ideally, what we would like is to be doing about half the updating work.

Specifically, we'd like to be truncating both the dividend and divisor as we

go.

Ordinarily we start with something like

AAAAAAAAAAAAAAA /

DDDDDDDDD

To get the first quotient word we shift the divisor and pad with zeroes, thus

AAAAAAAAAAAAAAA /

DDDDDDDD000000

After one iteration, a word of the dividend has been removed, and we then

shift the divisor again.

AAAAAAAAAAAAAA /

DDDDDDDD00000

We continue until we have

AAAAAAAAA /

DDDDDDDD

Now there is only one quotient word left. But we notice that we don't use

most of the remaining dividend or the divisor to determine this quotient

word. In fact, we could almost truncate to

AA /

D

What if we had truncated at this same point all along?

In fact, if we truncate so that the final division is a two word by one

word division (here we have to be careful, in that we are talking about the

number of words *after* normalisation), then clearly our quotient could be

out by as much as two on that final division, by what we have said in an

earlier post. That is of course ignoring any accumulated error along the way.

As we don't wish to multiply the entire thing out to see what we have to do

to correct it, it is clear that this amount of truncation is too great.

So let's truncate one further word to the right in both the dividend and

divisor, so that the final division (to get the final quotient word) is a

three word by two word division.

In fact, in the example above, as there will be five quotient words, there

will be five iterations of the algorithm, after which we want two words of

the divisor remaining. So, we will start with

AAAAAAA.A /

DDDDDD.D

(The decimal points I have inserted are arbitrary, and only for notational

purposes in what follows.)

After five iterations, throwing away one more word of the divisor each time,

we'll be left with

AA.A /

D.D

The first thing to notice is that our previous divrem algorithms, with the

adjustments they made as they went, gave the precise quotient given the

data they started with.

The second thing to notice is that truncating both the dividend and the

divisor at the same point, as above, will not yield a quotient that is too

small. In fact, the quotient we end up with will be the same as what we would

have obtained if we had not truncated the dividend at all, and only truncated

the divisor. Additional places in the dividend can't affect the algorithm.

Truncating the divisor, on the other hand, may result in a different quotient

than we would have obtained without truncation. In fact, as we end up

subtracting less at each update than we would if all those words were still

there, we may end up with a quotient which is too large. The divisor may also

divide more times, because of the truncation, than it would have if it had not

been truncated.

However, it is not enough to merely consider how the quotient changes with

truncation in order to see how far we can be out. We'll likely end up with a

very pessimistic estimate if we do this, because we may suppose that the

quotient can be one too large at each iteration, which is not true.

Instead, the quantity to keep track of is the original dividend minus the

product of the full divisor d and the computed quotient q. At the end of the

algorithm, this is the actual remainder we'll end up with, and we'd like to

keep track of how much our *computed* remainder (what's left of the dividend

after the algorithm completes) differs from this actual remainder.

Essentially, we accumulate an error in the computed remainder due to the

truncation.

Clearly, at each iteration, the word after the decimal point in what remains

of the dividend may be completely incorrect. And we may miss a borrow out of

this place into the place before the decimal point. So after n iterations of

the algorithm, the dividend may become too large by n. Of course n.0 is much

smaller than our original (normalised) divisor d (also considered as a decimal

D.DD...).

At the final step of the algorithm, we will have a dividend which is too large

by at most this amount, and we'll be using a divisor which is truncated to

just two words. However, the latter affects the computed remainder by an

amount much less than the original d (if Q is the final quotient word, it is

as though we added q*0.0DDDDDD to our divisor, so that the full divisor would

go Q times where it otherwise would only go Q-1 times).

So these two sources of error only increase the computed value q'*d + r (where

q' is the computed quotient and r is the computed remainder) by an amount less

than d. Thus, the computed quotient q' can be at most one larger than the

actual quotient q.

This is equivalent to the required |a - d*q'| < d.

So it seems that if we truncate out dividend at the start of the algorithm,

and our divisor after each iteration, we can get an approximate quotient q'

within the required bounds.

We'll leave it to another time to describe the usefulness of an algorithm

which computes a quotient which may be out by one. What we will note is that

we've done computations with much smaller integers. It therefore costs us

significantly less time than a full divrem.

In this week's branch we implement this algorithm. In the test code, we check

the required quotient is the same as the one returned by divrem, or at most

one too large.

The trickiest part is ensuring we truncate at the right point. We want to

finish on the last iteration with two words *after* normalisation of the

divisor.

Actually, if we are really smart, we realise that if d does not need

to be shifted by much to normalise it, we can get away with finishing with

just two *unnormalised* words in the divisor. The error will still be much

less than d.

To be safe, if the number of quotient words is to be n, I check if the

leading word of the unnormalised divisor is more than 2*n. If not, too much

normalisation may be required, and I set up the algorithm to finish with

three unnormalised words instead of two. Otherwise it is safe to finish

with two words in the unnormalised divisor.

The algorithm begins by computing the number of words s the divisor needs to

be to start. This is two more than the number of iterations required to

get all but the final quotient word, since we should have two words in the

divisor at this point. If normalisation is going to be a problem, we add one

to this so that we compute the final quotient word with three unnormalised

words in the divisor.

Now the number of words required to start depends only on the size of the

quotient, and thus it may be more than the number of words d actually has.

Thus we begin with the ordinary divrem algorithm until the number of words

required is less than the number of words d actually has.

Now we truncate d to the required number of words and the dividend to one

more than that. The remaining part of the algorithm proceeds in the same

manner, throwing away a divisor word every iteration.

Only one thing can go wrong, and that is the following: because we are

truncating the divisor at each point, we may end up subtracting too little from

the dividend. In fact, what can happen is that the top word of the dividend

does not become zero after we subtract q*d' (where d' is the truncated divisor).

When this happens, the top word of the dividend may be 1 after the subtraction.

We know that the least significant word of the dividend could be completely

wrong, and the next word may be too large by about as many iterations as

we've completed so far.

Thus, in order to fix our overflow, we subtract from the dividend as much as we

need to in order for the overflow to disappear. We don't mind our dividend

being too large, as we adjust for that in the algorithm. But we cannot allow it to

become too small. Thus we must only subtract from the dividend precisely the

amount required to make the overflow vanish.

We can safely subtract away whatever is in the bottom two words of the

dividend, as this is not even enough to remove the overflow. And then we can

subtract 1 from the whole dividend. This must remove the overflow and is

clearly the least we can get away with subtracting to do so.

The code for today's post is here: v0.15

Previous article: v0.14 - divrem_classical

Previous article: v0.14 - divrem_classical

Next article: v0.16 - mullow_classical

## No comments:

## Post a Comment