Tuesday, 27 August 2019

Update on Trip timings for multivariate arithmetic

New Trip timings


In my last blog I wrote about the timings we did for the OpenDreamKit project for parallel multivariate arithmetic.

After publishing those timings, Mickael Gastineau of the Trip project let me know that there is an undocumented feature for modular arithmetic, using the Mod(., p) command. He also told me that they had not linked in a fast parallel memory allocator. He released a patch update version of the publicly available Trip with this fix.

You can see the new timings, including for mod p arithmetic below.



Cores Trip n=16 ODK n=16 Trip n=20 ODK n=20
1 24.0s 10.5s 140s 120s
2 11.9s 5.55s 69.9s 60.0s
3 8.06s 3.80s 46.7s 41.0s
4 6.13s 2.95s 35.2s 31.4s
6 4.19s 2.09s 25.0s 21.1s
8 3.19s 1.60s 18.7s 16.1s
10 2.60s 1.30s 15.1s 13.0s
12 2.19s 1.09s 12.6s 10.9s
14 1.99s 0.96s 10.8s 9.4s
16 1.96s 0.86s 9.7s 8.3s
20 1.42s 0.72s 7.98s 6.7s
24 1.30s 0.64s 6.72s 5.6s
28 1.17s 0.55s 6.01s 4.9s
32 1.10s 0.50s 5.36s 4.9s
Sparse Multiplication over Z

Cores Trip n=16 ODK n=16 Trip n=20 ODK n=20
1 27.9s 9.15s 175s 53.9s
2 14.0s 4.73s 97.5s 28.1s
3 9.38s 3.29s 59.1s 18.7s
4 7.18s 2.49s 48.5s 14.3s
6 5.14s 1.68s 31.8s 9.48s
8 3.80s 1.29s 23.3s 7.26s
10 3.02s 1.04s 19.0s 5.96s
12 2.60s 0.86s 15.6s 4.99s
14 2.25s 0.74s 13.5s 4.22
16 1.97s 0.66s 11.7s 3.76s
20 1.70s 0.56s 9.72s 3.13s
24 1.44s 0.49s 8.50s 2.65s
28 1.29s 0.43s 7.35s 2.27s
32 1.21s 0.38s 6.79s 2.04s
Sparse Multiplication over Z/pZ

Cores Trip Z ODK Z
1 25.6s 5.08s
2 12.8s 2.56s
3 8.55s 1.71s
4 6.44s 1.29s
6 4.38s 0.86s
8 3.36s 0.67s
10 2.81s 0.52s
12 2.34s 0.46s
14 1.98s 0.40s
16 1.79s 0.34s
Dense Multiplication over Z

Cores Trip Z/pZ ODK Z/pZ
1 30.5s 3.64s
2 15.3s 1.83s
3 10.2s 1.22s
4 7.65s 0.92s
6 5.21s 0.62s
8 3.97s 0.46s
10 3.17s 0.37s
12 2.86s 0.31s
14 2.51s 0.28s
16 2.08s 0.24s
Dense Multiplication over Z/pZ

Conclusion


As you can see, there are some really impressive speedups for Trip from 16 to 32 cores. Below that the times seem to have gone up a little, but overall the scaling with the number of cores is much better.

The Trip Z/pZ times are often slower than the Z times, which is the opposite for us.

By the way, as the ODK project is coming to a close and my other work related things now have a blog elsewhere, this blog will probably now revert to just being a private blog again. I will no doubt not be able to resist doing a post here when Oscar is finally ready for its first prerelease/release, and there may be some new timings for giac, but otherwise it'll be mostly personal projects now.

I may post some articles on speeding up graphics on the CGA, EGA and VGA adapters. So stick around if that interests you.

No comments:

Post a Comment