Sample Size Calculations in Simple Linear Regression: Trials and Tribulations
The problem tackled in this paper is the determination of sample size for a given level and power in the context of a simple linear regression model. At a technical level, the simple linear regression model is a five-parameter model. It is natural to base sample size calculations on the least squares' estimator of the slope parameter of the model. Nuisance parameters such as the variance of the predictor X and conditional variance of the response Y create problems in the calculations. The current approaches in the literature are not illuminating. One approach is based on the conditional distribution of the estimator of the slope parameter given the data on the predictor X. Another approach is based on the sample correlation coefficient. We overcome the problems by determining the exact unconditional distribution of the test statistic built on the estimator of the slope parameter. The exact unconditional distribution alleviates difficulties to some extent in the computation of sample sizes. On the other hand, the test based on the sample correlation coefficient of X and Y avoids the problems besetting the test based on the slope parameter. However, we lose intuitive interpretation that comes with the slope parameter. Surprisingly, we see that the sample size that comes from the correlation test works in synchronization with the one that comes from the test built upon the slope parameter in a broad array of settings.
READ FULL TEXT