Data-dependent Confidence Regions of Singular Subspaces
Matrix singular value decomposition (SVD) is popular in statistical data analysis which shows superior efficiency of extracting the unknown low-rank singular subspaces embedded in noisy matrix observations. This article is about the statistical inference for the singular subspaces when the noise matrix has i.i.d. entries and our goal is to construct data-dependent confidence regions of the unknown singular subspaces from one single noisy matrix observation. We derive an explicit representation formula for the empirical spectral projectors. The formula is neat and holds for deterministic matrix perturbations. Based on this representation formula, we calculate, up to the fourth-order approximations, the expected joint projection distance between the empirical singular subspace and the true singular subspace. Then, we prove the normal approximation of the joint projection distance with an explicit normalization factor and different levels of bias corrections. In particular, with the fourth-order bias corrections, we show that the asymptotical normality holds under the signal-to-noise ration (SNR) condition O((d_1+d_2)^9/16) where d_1 and d_2 denote the matrix sizes. We will propose a shrinkage estimator of the singular values by borrowing recent results from random matrix theory. Equipped with these estimators, we introduce data-dependent centering and normalization factors for the joint projection distance between the empirical singular subspace and the true singular subspace. Therefore, we are able to construct data-dependent confidence regions for the true singular subspaces which attain the pre-determined confidence levels asymptotically. The convergence rates of the asymptotical normality are also presented. Finally, we provide comprehensive simulation results to illustrate our theoretical discoveries.
READ FULL TEXT