Learning Latent Factors from Diversified Projections and its Applications to Over-Estimated and Weak Factors
Estimations and applications of factor models often rely on the crucial condition that the latent factors are consistently estimated, which in turn also requires that factors be relatively strong, data are stationary and weak serial dependence, and the sample size be fairly large, although in practical applications, one or several of these conditions may fail. In these cases it is difficult to analyze the eigenvectors of the original data matrix. To address this issue, we propose simple estimators of the latent factors using cross-sectional projections of the panel data, by weighted averages with pre-determined weights. These weights are chosen to diversify away the idiosyncratic components, resulting in "diversified factors". Because the projections are conducted cross-sectionally, they are robust to serial conditions, easy to analyze due to data-independent weights, and work even for finite length of time series. We formally prove that this procedure is robust to over-estimating the number of factors, and illustrate it in several applications. We also recommend several choices for the diversified weights. When they are randomly generated from a known distribution, we show that the estimated factor components are nearly optimal in terms of recovering the low-rank structure of the factor model.
READ FULL TEXT