Count-Based Regression Models

David Weisburd,David B. Wilson,Alese Wooditch,Chester Britt

Count-Based Regression Models

2022

Count-based data are common in criminological research including outcomes such as crime counts for geographic areas or the number of rearrests over a given time period for a group of individuals. Counts by nature are discrete, positively valued whole numbers. When we want to model counts as a dependent variable in a regression model, ordinary least squares (OLS) regression is generally a poor choice. Count-based regression approaches such as Poisson, quasi-Poisson, and negative binomial models appropriately handle these characteristics of counts as a dependent variable and do so by using a log-link, thus modeling the log of the count. The difference between a Poisson model and both quasi-Poisson and negative binomial models is that the latter two adjust for over-dispersion in the count data. Count data where the variance is greater than the mean are over-dispersion. The quasi-Poisson model produces regression coefficients that are identical to a Poisson model but with standard errors that are adjusted for any observed over-dispersion. Negative binomial regression models adjust for over-dispersion differently, and the regression coefficients may differ compared to a Poisson model but usually only slightly. Another complication with count data is the possibility that the distribution has an excess of zeros relative to what would be expected from a Poisson process. These are called zero-inflated distributions and can be modeled with zero-inflated versions of either the Poisson or negative binomial models. Historically, particularly in ecology where count data are also very common, using OLS regression on log-transformed counts was a common approach to handle this type of data (O’Hara and Kotze. Nature Proceedings 1(2):118-122, 2010). However, with the ready availability of statistical software able to perform the count-based regression models discussed in this chapter, there is little reason to not use these modeling methods. They better reflect the nature of the data and are less likely to violate key assumptions of the regression model.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations