Variable Coded Batch Matrix Multiplication
In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded matrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of two of the optimal solution when it comes to straggler resilience; our simulations demonstrate that our codes achieve even better optimality gaps in practice.
READ FULL TEXT