New Trip timings
In my last blog I wrote about the timings we did for the OpenDreamKit project for parallel multivariate arithmetic.
After publishing those timings, Mickael Gastineau of the Trip project let me know that there is an undocumented feature for modular arithmetic, using the Mod(., p) command. He also told me that they had not linked in a fast parallel memory allocator. He released a patch update version of the publicly available Trip with this fix.
You can see the new timings, including for mod p arithmetic below.
Cores | Trip n=16 | ODK n=16 | Trip n=20 | ODK n=20 |
---|---|---|---|---|
1 | 24.0s | 10.5s | 140s | 120s |
2 | 11.9s | 5.55s | 69.9s | 60.0s |
3 | 8.06s | 3.80s | 46.7s | 41.0s |
4 | 6.13s | 2.95s | 35.2s | 31.4s |
6 | 4.19s | 2.09s | 25.0s | 21.1s |
8 | 3.19s | 1.60s | 18.7s | 16.1s |
10 | 2.60s | 1.30s | 15.1s | 13.0s |
12 | 2.19s | 1.09s | 12.6s | 10.9s |
14 | 1.99s | 0.96s | 10.8s | 9.4s |
16 | 1.96s | 0.86s | 9.7s | 8.3s |
20 | 1.42s | 0.72s | 7.98s | 6.7s |
24 | 1.30s | 0.64s | 6.72s | 5.6s |
28 | 1.17s | 0.55s | 6.01s | 4.9s |
32 | 1.10s | 0.50s | 5.36s | 4.9s |
Cores | Trip n=16 | ODK n=16 | Trip n=20 | ODK n=20 |
---|---|---|---|---|
1 | 27.9s | 9.15s | 175s | 53.9s |
2 | 14.0s | 4.73s | 97.5s | 28.1s |
3 | 9.38s | 3.29s | 59.1s | 18.7s |
4 | 7.18s | 2.49s | 48.5s | 14.3s |
6 | 5.14s | 1.68s | 31.8s | 9.48s |
8 | 3.80s | 1.29s | 23.3s | 7.26s |
10 | 3.02s | 1.04s | 19.0s | 5.96s |
12 | 2.60s | 0.86s | 15.6s | 4.99s |
14 | 2.25s | 0.74s | 13.5s | 4.22 |
16 | 1.97s | 0.66s | 11.7s | 3.76s |
20 | 1.70s | 0.56s | 9.72s | 3.13s |
24 | 1.44s | 0.49s | 8.50s | 2.65s |
28 | 1.29s | 0.43s | 7.35s | 2.27s |
32 | 1.21s | 0.38s | 6.79s | 2.04s |
Cores | Trip Z | ODK Z |
---|---|---|
1 | 25.6s | 5.08s |
2 | 12.8s | 2.56s |
3 | 8.55s | 1.71s |
4 | 6.44s | 1.29s |
6 | 4.38s | 0.86s |
8 | 3.36s | 0.67s |
10 | 2.81s | 0.52s |
12 | 2.34s | 0.46s |
14 | 1.98s | 0.40s |
16 | 1.79s | 0.34s |
Cores | Trip Z/pZ | ODK Z/pZ |
---|---|---|
1 | 30.5s | 3.64s |
2 | 15.3s | 1.83s |
3 | 10.2s | 1.22s |
4 | 7.65s | 0.92s |
6 | 5.21s | 0.62s |
8 | 3.97s | 0.46s |
10 | 3.17s | 0.37s |
12 | 2.86s | 0.31s |
14 | 2.51s | 0.28s |
16 | 2.08s | 0.24s |
Conclusion
As you can see, there are some really impressive speedups for Trip from 16 to 32 cores. Below that the times seem to have gone up a little, but overall the scaling with the number of cores is much better.
The Trip Z/pZ times are often slower than the Z times, which is the opposite for us.
By the way, as the ODK project is coming to a close and my other work related things now have a blog elsewhere, this blog will probably now revert to just being a private blog again. I will no doubt not be able to resist doing a post here when Oscar is finally ready for its first prerelease/release, and there may be some new timings for giac, but otherwise it'll be mostly personal projects now.
I may post some articles on speeding up graphics on the CGA, EGA and VGA adapters. So stick around if that interests you.
No comments:
Post a Comment