Speeding up the GENGA N-body integrator on consumer-grade graphics cards
GPU computing is popular due to the calculation potential of a single card. The N-body integrator GENGA is built to for this, but it suffers a performance penalty on consumer-grade GPUs due to their truncated double precision (FP64) performance. We aim to speed up GENGA on consumer-grade cards by harvesting their high single-precision performance (FP32). We modified GENGA to be able to compute the long-distance forces between bodies in FP32 precision and tested this with 5 experiments. We ran simulations with similar initial conditions of 6600 planetesimals in both FP32 and FP64 precision. We also ran simulations that i) began with a mixture of planetesimals and planetary embryos, ii) planetesimal-driven giant planet migration, and iii) terrestrial planet formation with a gas disc. Second, we ran the same simulation beginning with 40 000 planetesimals using both FP32 and FP64 precision forces on a variety of consumer-grade and Tesla GPUs to measure the performance boost of FP32 computing. There are no statistical differences when running in FP32 or FP64 precision that can be attributed to the force prescription rather than stochastic effects. The uncertainties in energy are almost identical when using both precisions. However, the uncertainty in the angular momentum using FP32 rather than FP64 precision long-range forces is about two orders of magnitude greater, but still very low. Running the simulations in single precision on consumer-grade cards decreases running time by a factor of three and becomes within a factor of three of a Tesla A100 GPU. Additional tuning speeds up the simulation by a factor of two across all types of cards. The option to compute the long-range forces in single precision in GENGA when using consumer-grade GPUs dramatically improves performance at a little penalty to accuracy. There is an additional environmental benefit because it reduces energy usage.
READ FULL TEXT