Addressing the zeros problem: Regression models for outcomes with a large proportion of zeros, with an application to trial outcomes Academic Article uri icon


MeSH Major

  • Antibodies, Monoclonal
  • Antineoplastic Agents
  • Protein Kinase Inhibitors
  • Receptor, Epidermal Growth Factor
  • Skin Neoplasms


  • © 2015, Cornell Law School and Wiley Periodicals, Inc. In law-related and other social science contexts, researchers need to account for data with an excess number of zeros. In addition, dollar damages in legal cases also often are skewed. This article reviews various strategies for dealing with this data type. Tobit models are often applied to deal with the excess number of zeros, but these are more appropriate in cases of true censoring (e.g., when all negative values are recorded as zeros) and less appropriate when zeros are in fact often observed as the amount awarded. Heckman selection models are another methodology that is applied in this setting, yet they were developed for potential outcomes rather than actual ones. Two-part models account for actual outcomes and avoid the collinearity problems that often attend selection models. A two-part hierarchical model is developed here that accounts for both the skewed, zero-inflated nature of damages data and the fact that punitive damage awards may be correlated within case type, jurisdiction, or time. Inference is conducted using a Markov chain Monte Carlo sampling scheme. Tobit models, selection models, and two-part models are fit to two punitive damage awards data sets and the results are compared. We illustrate that the nonsignificance of coefficients in a selection model can be a consequence of collinearity, whereas that does not occur with two-part models.

publication date

  • January 2015



  • Academic Article


Digital Object Identifier (DOI)

  • 10.1111/jels.12068

Additional Document Info

start page

  • 161

end page

  • 186


  • 12


  • 1