The consequences of checking for zero-inflation and overdispersion in the analysis of count data

10/31/2019
by   Harlan Campbell, et al.
0

Count data are ubiquitous in ecology and the Poisson generalized linear model (GLM) is commonly used to model the association between counts and explanatory variables of interest. When fitting this model to the data, one typically proceeds by first confirming that the data is not overdispersed and that there is no excess of zeros. If the data appear to be overdispersed or if there is any zero-inflation, key assumptions of the Poison GLM may be violated and researchers will then typically consider alternatives to the Poison GLM. An important question is whether the potential model selection bias introduced by this data-driven multi-stage procedure merits concern. In this paper, we conduct a large-scale simulation study to investigate the potential consequences of model selection bias that can arise in the simple scenario of analyzing a sample of potentially overdispersed, potentially zero-heavy, count data.

READ FULL TEXT

page 9

page 24

page 25

page 26

page 27

page 28

page 29

research
01/30/2023

A Simulation Study of the Performance of Statistical Models for Count Outcomes with Excessive Zeros

Background: Outcome measures that are count variables with excessive zer...
research
05/29/2020

Multiresolution Decomposition of Areal Count Data

Multiresolution decomposition is commonly understood as a procedure to c...
research
06/05/2023

Modeling Tor Network Growth by Extrapolating Consensus Data

Since the Tor network is evolving into an infrastructure for anonymous c...
research
12/06/2017

On overfitting and post-selection uncertainty assessments

In a regression context, when the relevant subset of explanatory variabl...
research
09/16/2020

A semi-analytical solution to the maximum likelihood fit of Poisson data to a linear model using the Cash statistic

[ABRIDGED] The Cash statistic, also known as the C stat, is commonly use...
research
12/22/2018

Bi-clustering for time-varying relational count data analysis

Relational count data are often obtained from sources such as simultaneo...
research
06/19/2023

On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance

In epidemiological studies, the capture-recapture (CRC) method is a powe...

Please sign up or login with your details

Forgot password? Click here to reset