Overdisp: A Stata (and Mata) Package for Direct Detection of Overdispersion in Poisson and Negative Binomial Regression Models
AbstractStata has several procedures that can be used in analyzing count-data regression models and, more specifically, in studying the behavior of the dependent variable, conditional on explanatory variables. Identifying overdispersion in countdata models is one of the most important procedures that allow researchers to correctly choose estimations such as Poisson or negative binomial, given the distribution of the dependent variable. The main purpose of this paper is to present a new command for the identification of overdispersion in the data as an alternative to the procedure presented by Cameron and Trivedi , since it directly identifies overdispersion in the data, without the need to previously estimate a specific type of count-data model. When estimating Poisson or negative binomial regression models in which the dependent variable is quantitative, with discrete and non-negative values, the new Stata package overdisp helps researchers to directly propose more consistent and adequate models. As a second contribution, we also present a simulation to show the consistency of the overdispersion test using the overdisp command. Findings show that, if the test indicates equidispersion in the data, there are consistent evidence that the distribution of the dependent variable is, in fact, Poisson. If, on the other hand, the test indicates overdispersion in the data, researchers should investigate more deeply whether the dependent variable actually exhibits better adherence to the Poisson-Gamma distribution or not.
Z. Y. Algamal, Variable selection in count data regression model based on firefly algorithm, Statistics, Optimization and Information Computing , vol. 7, pp. 520–529, 2019.
E. Avci, Flexiblity of using Com-Poisson regression model for count data, Statistics, Optimization and Information Computing , vol. 6, pp. 278–285, 2018.
A. C. Cameron, and P. K. Trivedi, Econometric models based on count data: comparisons and applications of some estimators and tests, Journal of Applied Econometrics, vol. 1, no. 1, pp. 29–53, 1986.
A. C. Cameron, and P. K. Trivedi, Microeconometrics: Methods and Applications, Cambridge University Press, New York, 2005.
A. C. Cameron, and P. K. Trivedi, Microeconometrics using Stata, Stata Press, College Station, 2010.
A. C. Cameron, and P. K. Trivedi, Regression Analysis of Count Data, Cambridge University Press, Cambridge, 2013.
L. P. F´avero, and P. Belfiore, Data Science for Business and Decision Making, Academic Press Elsevier, Cambridge, 2019.
L. P. F´avero, and P. Belfiore, overdisp: module to detect overdispersion in count-data models using Stata, Statistical Software Components, Boston College Department of Economics. https://ideas.repec.org/c/boc/bocode/s458496.html, 2018.
L. P. F´avero, M. A. Santos, and R. G. Serra, Cross-border branching in the Latin American banking sector, International Journal of Bank Marketing , vol. 36, no. 3, pp. 496–528, 2018.
W. Gardner, E. P. Mulvey, and E. C. Shaw, Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models, Psychological Bulletin, vol. 118, no. 3, pp. 392–404, 1995.
S. Gurmu, Tests for detecting overdispersion in the positive Poisson regression model, Journal of Business & Economic Statistics, vol. 9, no. 3, pp. 215–222, 1991.
J. A. Hausman, B. H. Hall, and Z. Griliches, Econometric models for count data with an application to the patents-R&D relationship, Econometrica , vol. 52, no. 4, pp. 909–938, 1984.
G. Leckie, runmixregls: a program to run the mixregls mixed-effects location scale software from within Stata, Journal of Statistical Software, vol. 59, pp. 1–41, 2014.
J. S. Long, and J. Freese, Regression Models for Categorical Dependent Variables using Stata, Stata Press, College Station, 2006.
M. Manj´on, and O. Mart´ınez, The chi-squared goodness-of-fit test for count-data models, Stata Journal, 14, pp. 798–816, 2014.
R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/, 2016.
S. Rabe-Hesketh, and A. Skrondal, Multilevel and Longitudinal Modeling using Stata: Categorical Responses, Counts, and Survival Stata Press, College Station, 2012.
SAS Institute Inc, SAS/STAT Software, Version 9.22, Cary, NC, http://www.sas.com/, 2018.
StataCorp, Stata Data Analysis Statistical Software, Release 15, College Station, TX, http://www.stata.com/, 2018.
H. Zhang, Y. Liu, and B. Li, Notes on discrete compound Poisson model with applications to risk theory, Insurance: Mathematics and Economics, vol. 59, p. 325–336, 2014.
M. L. Zwilling, Negative binomial regression, The Mathematica Journal, vol. 15, pp. 1–18, 2013.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).