ELFISH: Resource-Aware Federated Learning on Heterogeneous Edge Devices
In this work, we propose ELFISH - a resource-aware federated learning framework to tackle computation stragglers in federated learning. In ELFISH, neural network models' training consumption will be firstly profiled in terms of different computation resources. Guided by profiling, a "soft-training" method is proposed for straggler acceleration, which partially trains the model by masking a particular number of resource-intensive neurons. Rather than generating a deterministically optimized model with diverged structure, different sets of neurons will be dynamically masked every training cycle and will be recovered and updated during parameter aggregation, ensuring comprehensive model updates overtime. The corresponding parameter aggregation scheme is also proposed to balance the contribution from soft-trained models and guarantee the collaborative convergence. Eventually, ELFISH overcomes the computational heterogeneity of edge devices and achieves synchronized collaboration without computational stragglers. Experiments show that ELFISH can provide up to 2x training acceleration with soft-training in various straggler settings. Furthermore, benefited from the proposed parameter aggregation scheme, ELFISH improves the model accuracy for 4 collaborative convergence robustness.
READ FULL TEXT