You are here: Home / Publications / Errors in Retrospective Data on Smoking: Comparing Maximum Likelihood and Ad Hoc Approaches

Errors in Retrospective Data on Smoking: Comparing Maximum Likelihood and Ad Hoc Approaches

Kenkel, Don; & LeCates, Joseph. (2010). Errors in Retrospective Data on Smoking: Comparing Maximum Likelihood and Ad Hoc Approaches. Journal of Applied Econometrics.

Kenkel, Don; & LeCates, Joseph. (2010). Errors in Retrospective Data on Smoking: Comparing Maximum Likelihood and Ad Hoc Approaches. Journal of Applied Econometrics.

Octet Stream icon 1269.ris — Octet Stream, 3 kB (3424 bytes)

ABSTRACT A number of longitudinal and cross-sectional surveys include retrospective questions about the timing of smoking initiation and cessation. These data appear to offer health economists the opportunity to explore the dynamics of smoking over relatively long time periods. For example, the current prevalence of U.S. adult smoking reflects smoking initiation and cessation decisions made over multiple decades and over very wide ranges of cigarette taxes. However, this research opportunity might be partly illusory because of errors in retrospectively reported data on smoking. As in other types of retrospectively reported data, retrospective data on smoking show heaping on round numbers: a smoker is much more likely to report having quit a round number of years ago (such as 20) than to report an odd number of years ago (such as 19 or 21). Smoking cessation is a fairly rare event, with the annual rate of smoking cessation typically averaging below five percent. As a result, even a modest amount of heaping creates substantial rates of misclassification error in observed measures of smoking cessation. We compare alternative approaches to estimate a discrete time hazard model of smoking cessation with misclassification error in the dependent variable due to heaping. Our first approach uses an adjusted maximum likelihood approach (Hausman et al Journal of Econometrics 1998). Although the model is technically identified through non-linearities, we exploit exclusion restrictions based on patterns of heaping. We create indicators for years in which heaping is more or less likely. Identification is based on the argument that heaping changes the probabilities of misclassification error in predictable ways, but should not change the probability of true cessation. We compare the maximum likelihood approach to several ad hoc approaches with intuitive appeal. The first ad hoc approach simply introduces the heaping indicators as additional covariates in the discrete time hazard model of smoking cessation. The next ad hoc approach “coarsens” the data by changing the unit of analysis from an annual basis to the period of five years around each heaping point. The last ad hoc approach “decimates” the data by eliminating all observations from respondents who report cessation in a heaped year. While this approach distorts the observed rate of smoking cessation, it eliminates all misclassification error due to heaping and so might reduce bias in the estimated model coefficients. We conduct Monte Carlo simulations to compare the relative performance of the adjusted maximum likelihood model to the ad hoc approaches. We also compare the alternative approaches when we estimate discrete time hazard models of smoking cessation estimated using retrospective data from the Tobacco Use Supplements of the Current Population Survey. We conclude with a discussion of when heaping might be mild enough to warrant ad hoc approaches or severe enough to warrant abandoning the use of retrospectively reported data on smoking.




JOUR



Kenkel, Don
LeCates, Joseph



2010


Journal of Applied Econometrics













1269