Reading, Writing and Arithmetic: Update on Trip timings for multivariate arithmetic

New Trip timings

In my last blog I wrote about the timings we did for the OpenDreamKit project for parallel multivariate arithmetic.

After publishing those timings, Mickael Gastineau of the Trip project let me know that there is an undocumented feature for modular arithmetic, using the Mod(., p) command. He also told me that they had not linked in a fast parallel memory allocator. He released a patch update version of the publicly available Trip with this fix.

You can see the new timings, including for mod p arithmetic below.

**Sparse Multiplication over Z**
Cores	Trip n=16	ODK n=16	Trip n=20	ODK n=20
1	24.0s	10.5s	140s	120s
2	11.9s	5.55s	69.9s	60.0s
3	8.06s	3.80s	46.7s	41.0s
4	6.13s	2.95s	35.2s	31.4s
6	4.19s	2.09s	25.0s	21.1s
8	3.19s	1.60s	18.7s	16.1s
10	2.60s	1.30s	15.1s	13.0s
12	2.19s	1.09s	12.6s	10.9s
14	1.99s	0.96s	10.8s	9.4s
16	1.96s	0.86s	9.7s	8.3s
20	1.42s	0.72s	7.98s	6.7s
24	1.30s	0.64s	6.72s	5.6s
28	1.17s	0.55s	6.01s	4.9s
32	1.10s	0.50s	5.36s	4.9s

**Sparse Multiplication over Z/pZ**
Cores	Trip n=16	ODK n=16	Trip n=20	ODK n=20
1	27.9s	9.15s	175s	53.9s
2	14.0s	4.73s	97.5s	28.1s
3	9.38s	3.29s	59.1s	18.7s
4	7.18s	2.49s	48.5s	14.3s
6	5.14s	1.68s	31.8s	9.48s
8	3.80s	1.29s	23.3s	7.26s
10	3.02s	1.04s	19.0s	5.96s
12	2.60s	0.86s	15.6s	4.99s
14	2.25s	0.74s	13.5s	4.22
16	1.97s	0.66s	11.7s	3.76s
20	1.70s	0.56s	9.72s	3.13s
24	1.44s	0.49s	8.50s	2.65s
28	1.29s	0.43s	7.35s	2.27s
32	1.21s	0.38s	6.79s	2.04s

**Dense Multiplication over Z**
Cores	Trip Z	ODK Z
1	25.6s	5.08s
2	12.8s	2.56s
3	8.55s	1.71s
4	6.44s	1.29s
6	4.38s	0.86s
8	3.36s	0.67s
10	2.81s	0.52s
12	2.34s	0.46s
14	1.98s	0.40s
16	1.79s	0.34s

**Dense Multiplication over Z/pZ**
Cores	Trip Z/pZ	ODK Z/pZ
1	30.5s	3.64s
2	15.3s	1.83s
3	10.2s	1.22s
4	7.65s	0.92s
6	5.21s	0.62s
8	3.97s	0.46s
10	3.17s	0.37s
12	2.86s	0.31s
14	2.51s	0.28s
16	2.08s	0.24s

Conclusion

As you can see, there are some really impressive speedups for Trip from 16 to 32 cores. Below that the times seem to have gone up a little, but overall the scaling with the number of cores is much better.

The Trip Z/pZ times are often slower than the Z times, which is the opposite for us.

By the way, as the ODK project is coming to a close and my other work related things now have a blog elsewhere, this blog will probably now revert to just being a private blog again. I will no doubt not be able to resist doing a post here when Oscar is finally ready for its first prerelease/release, and there may be some new timings for giac, but otherwise it'll be mostly personal projects now.

I may post some articles on speeding up graphics on the CGA, EGA and VGA adapters. So stick around if that interests you.

Reading, Writing and Arithmetic

Tuesday, 27 August 2019

Update on Trip timings for multivariate arithmetic

New Trip timings

Conclusion

No comments:

Post a Comment

Followers

Blog Archive

About Me