On the Optimal Recovery Threshold of Coded Matrix Multiplication
We provide novel coded computation strategies for distributed matrix-matrix products that outperform the recent "Polynomial code" constructions in recovery threshold, i.e., the required number of successful workers. When m-th fraction of each matrix can be stored in each worker node, polynomial codes require m^2 successful workers, while our MatDot codes only require 2m-1 successful workers, albeit at a higher communication cost from each worker to the fusion node. Further, we propose "PolyDot" coding that interpolates between Polynomial codes and MatDot codes. Finally, we demonstrate an application of MatDot codes to multiplying multiple (> 2) matrices.
READ FULL TEXT