Reading, Writing and Arithmetic: BSDNT

At last we come to addmul1 and submul1. After implementing these, we would

already be able to implement a full classical multiplication routine, using the

naive O(n^2) multiplication algorithm.

It turns out that addmul1 and submul1 are quite similar to mul1. Instead of

just writing out the result, we have to add it to, or subtract it from the

first operand.

These are really combined operations, i.e. combined mul1 and add_m or sub_m

respectively.

The reason we introduce these is that the processor has multiple pipelines

and merging two operations like this gives us the chance to push more

through those pipelines. We'll add more combined operations later on.

The fact that we can add b[i]*c, a[i] and ci and not overflow a double word

needs some justification.

Clearly b[i] and c are at most B-1. Thus b[i]*c is at most B^2-2B+1. And

clearly a[i] and ci are at most B-1. Thus the total is at most

B^2-2B+1 + (B-1) + (B-1) = B^2-1, which just fits into a double word.

The dword_t type allows us to get away with adding all these in C as though

we don't care about any overflow (in fact we know there is none). Of course

that doesn't make it efficient. This function is still a candidate for

assembly optimisation and loop unrolling later on.

The submul function is essentially the same, so long as we have taken care to

perform operations in the right order.

The naming of our functions causes us slight difficulties here. We'd ideally

like versions of addmul1 and submul1 which write the carry out to the high

limb and versions which add/sub it from the high limb. We opt for the latter

for now. After all, if one were accumulating addmuls, this is what one would

require most often.

Maybe someone reading this has a better idea how to handle these.

We add a test which checks that doing two addmuls in a row is the same as

doing a single addmul with the multiply constant equal to the sum of the

original two. We also add a test to check that chaining addmuls works. We

add similar tests for submul.

We delay coding up a quadratic multiplication basecase as there are a few more

linear functions to work on, most notably the various kinds of division and

remainder functions. These are all interesting functions to write.

The github repo for this version is here: v0.6

Reading, Writing and Arithmetic