New communication hiding conjugate gradient variants
The conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures due to the two global reductions required each iteration. In this paper, we introduce a new communication hiding conjugate gradient variant, which requires only one global reduction per iteration. We then introduce pipelined versions of this variant and of a variant due to Meraunt, to fully overlap the matrix vector product with all inner products. These variants exhibit rates of convergence and final accuracies better than previously known pipelined variants. The convergence these variants are improved using a predict-and-recompute scheme, whereby recursively updated quantities are first used as a predictor for the true value, and then recomputed exactly later in the iteration. Numerical tests indicate that, on difficult problems, applying this technique to our pipelined variants often results in both a rate of convergence and ultimately attainable accuracy far better than previously studied pipelined conjugate gradient variants.
READ FULL TEXT