Friday, 24 September 2010

BSDNT - v0.12 mul_classical

I've been on holidays for a couple of days, hence the break in transmission.
(For academics: the definition of a holiday is a day where you actually
stop working and do nothing work related for that whole day. I realise the
definition is not widely known, nor is it well-defined.)

Note, due to accidentally including some files in today's release that I
intended to hold over till next time, you have to do ./configure before
make check. This detects your CPU and links in any assembly code
that is relevant for it. More on this when I do the actual blog post about it.

It's time to start implementing quadratic algorithms (i.e. those that
take time O(n^2) to run, such as classical multiplication and division).

Before we do this, we are going to reorganise slightly. The file which
is currently called nn.c we will rename nn_linear.c to indicate that it
contains our linear functions. We'll also rename t-nn.c to t-nn_linear.c.

The macros in that file will be moved into test.h.

We also modify the makefile to build all .c files in the current directory
and to run all tests when we do make check. A new directory "test" will
hold our test files.

Finally, we add an nn_quadratic.c and test/t-nn_quadratic.c to hold the
new quadratic routines and test code that we are about to write.

The first routine we want is nn_mul_classical. This is simply a mul1
followed by addmul1's. Of course this will be horrendously slow, but once
again we defer speeding it up until we start adding assembly routines,
which will start happening shortly.

In a departure from our linear functions, we do not allow zero lengths.
This dramatically simplifies the code and means we do not have to check
for special cases.

There does not appear to be any reason to allow a carry-in to our
multiplication routine. An argument can be made for it on the basis of
consistency with mul1. However, the main use for carry-in's and carry-out's
thus far has been for chaining.

For example, we chained mul1's s*c and t*c for s and t of length m and
c a single word in order to compute (s*B^m + t)*c.

But the analogue in the case of full multiplication would seem to be
chaining to compute (s*B^m + t)*c where s, t and c all have length m. But
it probably doesn't make sense to chain full multiplications in this way
as it would involve splitting the full product t*c say, into two separate
parts of length m, which amongst other things, would be inefficient.

It might actually make more sense to pass a whole array of carries to our
multiplication routine, one for every word of the multiplier. However it is
not clear what use this would be. So, for now at least, we pass no carry-ins
to our multiplication routine.

We settle on allowing a single word of carry out. This may be zero if the
leading words of the multiplicands are small enough. Rather than write this
extra word out, we simply return it as a carry so that the caller can decide
whether to write it.

The classical multiplication routine is quite straightforward, but that plus the
rearrangement we've done is more than enough for one day.

The github branch for this release is here: v0.12

Previous article: v0.11 - generic test code

No comments:

Post a Comment