Saturday, 5 August 2017

Oscar:Antic Update

This week we had the first OSCAR:Antic workshop and coding sprint [1]. We had about 10 people turn up, though unfortunately we clashed with big meetings in Atlanta and Leiden. Still, we got a lot done, so we'll probably have another one of these soon!

For those who don't know, OSCAR is the computer algebra system being developed as part of a large DFG (German Research Council) transregional grant. The Antic subsystem of OSCAR is the Number Theory subsystem, consisting of Julia packages such as Hecke.jl, Nemo.jl and making use of Flint, Arb, the Antic C library, etc.

The workshop and expenses for invited speakers were funded by DFG TRR 195, but I should also note that a number of the participants are currently employed by OpenDreamKit, an EU Horizon 2020 grant.

Fp bar

The focus of the workshop more or less ended up being finite fields and their embeddings. Luca De Feo gave a really fantastic set of talks on algorithms for embedding finite fields with the aim of eventually reaching an implementation of Fp bar (the algebraic closure of the finite field of p elements).

We learned about all the recent breakthroughs in using elliptic curves over a finite field to efficiently do such constructions, including the papers of Luca and his coauthors [3].

Luca put all this work in the context of the older work of Bosma-Cannon-Steel and Lenstra-deSmit and the algorithms of Allombert and Plesken.

One of the main aims of the coding sprints was to implement a naive embedding method in Nemo, which just relies on finding roots of one finite field in another. This is still work in progress, but I think it is fair to say we made some good progress towards this.

The eventual aim is to have a modern sophisticated and highly performant implementation in Nemo, based on the code of Luca and his student Jean Kieffer. Some of the basic building blocks for this will be in Flint, of course.

The basic steps were:
  • Add some additional helper functions to Flint
  • Add root finding, based on equal degree factorisation, to Flint
  • Port the Hecke code for maps into Nemo
  • Add the code for handling embeddings, going up and down along embedding maps
  • Thinking about how to handle compatibility
  • Designing an interface for Nemo

Elliptic curve cryptography in Nemo

Luca and Jean presented a poster at ISSAC, which was held the previous week in Kaiserslautern. I think they even won an award for their poster.

Following on from this, Jean gave a talk at our Antic workshop on the work he and Luca have been doing on a Nemo module for elliptic curves over a finite field [4]. You can find the slides on the workshop homepage [1].

One remark that I found interesting was a note Jean made on performance. One of his graphs showed that Sage was beating Nemo for a generic implementation of Weierstrass elliptic curves over a finite field of cryptographic sized characteristic (e.g. p = 2^502 + 49), especially as the degree of the field grows.

We eventually traced this down to two specific things:

  • Sage was using one division instead of two in its implementation of the addition law on an elliptic curve (this was easily fixed in Jean's code)
  • Sage was using Pari for finite fields, and Pari is really fast at finite field arithmetic
My contribution was to speed up arithmetic for finite fields of smallish degree for large characteristic. Addition is still faster in Pari, due to their stack based memory management, but we now have multiplication that is faster than Pari, inversion about the same speed (sometimes faster, sometimes slower, to within about 1-2%), division and powering faster than Pari.

The tricks involved in speeding up inversion (and hence division) were as follows:

  • Improvements to the memory manager in Flint to block allocate mpz_structs
  • implement a gcdinv function for the Flint fmpz_mod_poly module (polynomials over Z/pZ for large p), instead of using the xgcd implementation, which does additional work
  • use temporary allocations (on the stack where possible) throughout the gcdinv and divrem code
  • delay reduction mod p in the divrem_basecase code in fmpz_mod_poly
  • inplace divrem_basecase
  • change crossover to divrem_divconquer to a higher value, in light of new basecase improvements
These improvements dropped the time for inversion in a finite field of degree 5 in large characteristic from about 12.7s for 100,000 inversions to about 8.7s.

There's still one potential improvement we could make, which would speed things up quite a bit:

  • implement a dedicated gcdinv function that doesn't call out to divrem, but does division itself, delaying reduction all the way through the entire Euclidean algorithm, except when actually needed
This would be quite a lot of work and so we didn't implement this yet. Contributions welcome.

An interesting observation

Initially we believed that Pari has special code for quadratic extensions of Fp. This is because we were constructing Fp^2 and then inverting f = x^1234 as a benchmark for inversion.

After investigating this, I realised that Pari was using cyclotomic polynomials for at least degree 2 and 4 extensions, where possible. This meant that in the degree 2 case, x^3 = 1 and in the degree 4 case x^5 = 1.

Of course x^1234 is not a really random looking element in such extensions, and it just so happens that the extended gcd computation that is necessary for inversion really goes fast for such computations.

I don't see how to make use of this in general, but it was interesting that it made such a huge difference (a factor of 10 or so in Pari).

Actually, we don't yet get such an improvement in Flint (it's more like 5x), so I will look into that today. It's almost certainly due to not terminating the xgcd early enough, or not handling a leading coefficient of 1 optimally, or something like that. It'll be something trivial, but worth optimising for, I think.

Update 06 Aug: this is now fixed in flint/trunk. It was indeed due to not dealing with length 1 remainders in a special case in the gcdinv code.

Other topics

Many other topics were covered at the workshop.

Ball arithmetic

Fredrik Johansson gave a very nice presentation on Arb for ball arithmetic over R and C. The slides are on the workshop website [1].

Generic polynomial testing

We decided to write test code for generic polynomials in Nemo. Three of us (me, Luca and Remi) wrote test functions, with particular emphasis on:

  • towers of univariate polynomial rings
  • generic polynomials where the base ring is built over a characteristic p field
  • randomised tests
As expected, this turned up numerous corner cases and failures, most of which are fixed (the rest will be fixed before the next Nemo release).

Multivariate polynomial arithmetic over Z

We finally merged all the new multivariate code in Flint/trunk that I have been working on (recently with the help of Daniel Schultz). I've spoken about this code in a recent blog post, and there will be another blog post on this topic fairly soon. Lots of things happened since the last update, including some speedups.

Memory leak in Flint matrices

Fredrik fixed a serious memory leak in Flint matrices, which was slowing things down.

A 2500 times speedup

Fredrik sped up some Flint code by a factor of 2500. The slow Flint code was pointed out by the Sage developers, who were using it for modular symbols. The code is still in a branch, and could be improved even further. But since the Flint code is also used in Nemo, this will speed up some things in Nemo too.

Flint on FreeBSD

Dima Pasechnik helped with getting Flint working on FreeBSD.

Gap-Julia interface

Thomas Breuer and Sebastian Gutsche made progress in the OSCAR Gap-Julia interface. They worked on:
  • Extended the GAP-Package JuliaInterface (Type conversions, e.g., of GAP cyclotomics into Nemo cyclotomics, access to more functionality, improved error handling)
  • Translating shortest vector GAP function to Julia to look to test how speed improves.
  • Tried to access compiled Julia methods directly (unsucessfully so far)

There is an upcoming Gap days in Siegen early next month [2]. 

Register for the Gap conference, or the subsequent coding sprint if you are interested in general Gap development, or a Gap-Julia interface.

Moving finite fields out of Flint

We discussed the future possibility of moving finite fields out of Flint, into a separate library, similar to the way Antic (for number field arithmetic) is separate from Flint.

The reason for this is the templated code (using C #defines) that is used in the finite fields module. This suggests we should be using C++ for finite fields (with a C interface to maintain compatibility with existing code).

Some of the work Luca wants to do on finite field extensions, which requires bivariate polynomial arithmetic and so on, would be fiddly to develop using the Flint templating system. C++ would make this easier, and we'd still get good performance if we restricted to a C-style interface and only used templates to make the code easier to write.

The other possibility is to reimplement finite fields directly in Julia. But we decided to delay a project like that until we see how the Julia JIT compiler will be improved. Putting too many layers of code in Julia is asking for trouble right now. When JIT compilation is improved after the language matures, this would be a sensible choice. Each time Julia improves, the sweet spot enlarges, and we can do more in Julia. For example, traits are a highly anticipated feature of Julia, which should materialise over the next couple of years.

Rewriting the finite fields code in C++ (with a C-style interface, for performance) is certainly a worthwhile project. Volunteers welcome.

References and links