Robust Inference for Federated Meta-Learning
Synthesizing information from multiple data sources is critical to ensure knowledge generalizability. Integrative analysis of multi-source data is challenging due to the heterogeneity across sources and data-sharing constraints due to privacy concerns. In this paper, we consider a general robust inference framework for federated meta-learning of data from multiple sites, enabling statistical inference for the prevailing model, defined as the one matching the majority of the sites. Statistical inference for the prevailing model is challenging since it requires a data-adaptive mechanism to select eligible sites and subsequently account for the selection uncertainty. We propose a novel sampling method to address the additional variation arising from the selection. Our devised CI construction does not require sites to share individual-level data and is shown to be valid without requiring the selection of eligible sites to be error-free. The proposed robust inference for federated meta-learning (RIFL) methodology is broadly applicable and illustrated with three inference problems: aggregation of parametric models, high-dimensional prediction models, and inference for average treatment effects. We use RIFL to perform federated learning of mortality risk for patients hospitalized with COVID-19 using real-world EHR data from 16 healthcare centers representing 275 hospitals across four countries.
READ FULL TEXT