Extrapolating the profile of a finite population

05/21/2020
by   Soham Jana, et al.
0

We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of k individuals each belonging to one of k types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size m = o(k). Nevertheless, we show that in the sublinear regime of m =ω(k/log k), it is possible to consistently estimate in total variation the profile of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of m=c k for any constant c the optimal rate is Θ(1/log k). Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size k. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2017

Sample complexity of population recovery

The problem of population recovery refers to estimating a distribution b...
research
05/04/2019

Quantitative Group Testing in the Sublinear Regime

The quantitative group testing (QGT) problem deals with efficiently iden...
research
03/27/2011

Fast Learning Rate of lp-MKL and its Minimax Optimality

In this paper, we give a new sharp generalization bound of lp-MKL which ...
research
03/13/2019

The Log-Concave Maximum Likelihood Estimator is Optimal in High Dimensions

We study the problem of learning a d-dimensional log-concave distributio...
research
08/05/2022

Malliavin calculus for the optimal estimation of the invariant density of discretely observed diffusions in intermediate regime

Let (X_t)_t ≥ 0 be solution of a one-dimensional stochastic differential...
research
02/26/2020

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Discrete Distributions

The profile of a sample is the multiset of its symbol frequencies. We sh...
research
01/05/2020

Decline of war or end of positive check? Analysis of change in war size distribution between 1816-2007

This study examines whether there has been a decline in the risk of deat...

Please sign up or login with your details

Forgot password? Click here to reset