Doug Rivers
About
Professor of political science at Stanford, senior fellow at Hoover Institution, polling expert, affiliated with YouGov
Claims by Doug Rivers (20)
Random digit dialing became the dominant phone-polling method because roughly 30% of the population had unlisted numbers, making listing-based sampling infeasible; it worked well for 20-30 years but broke down in the last 10-15 years as marketing calls, cell phones, and distrust drove cooperation rates from ~70% down to ~20% or less, hurting accuracy.
Pollsters typically select 10 to 20 times as many phone numbers as the number of completed interviews they want, meaning the people who actually respond are a small, non-random slice that skews toward more women, higher education, and higher income, requiring weighting and adjustment.
The clean statistical results (law of large numbers and central limit theorem) only hold for perfectly executed random sampling; in practice sampling plans are rarely executed perfectly because you never get near 100% cooperation, and the non-responders differ systematically from responders in unobservable ways, producing skews rather than mere random noise.
Weighting creates two distinct problems: it roughly doubles the real variability compared to a simple random sample, and it leaves residual skews (unobserved or uncorrected biases) that do NOT shrink as sample size grows, so conventional sampling-error estimates tell you how the procedure varies sample-to-sample but not how systematically wrong the procedure is.
Likely the New Hampshire 2008 polling miss came from over-representation of college- and graduate-degree voters (who favored Obama, over-represented by 2-3x and under-corrected) and from unreliable self-reports of voting intention, rather than from racism (the Bradley effect).
Self-reported voting intention is unreliable: about 90% of people say they will vote in a primary but actual turnout is much lower (US presidential turnout has hovered in the mid-50% range, congressional under 40%), and likely-voter screens cannot fully fix this because people answer based on what they think they should say.
Of about 1,300 polls Rivers examined from the 2008 presidential primaries, fewer than 50 reported anything other than the margin of error for a simple random sample with no weighting—meaning the reported margins of error are systematically misleading, which Rivers calls scandalous.
Final pre-election polls tend to be too similar to each other relative to their known sampling variability, suggesting pollsters herd—choosing among multiple defensible weighting schemes the one that places them in the pack rather than the one that yields a divergent answer—though Rivers attributes this to risk-aversion and the art of weighting rather than dishonesty.
There would seem to be an incentive for a pollster to trust divergent weights and look like a genius (as the lone correct caller of an upset), but in practice small subsamples (e.g. a NH poll with ~100 Republicans) make divergent results look like noise that gets discounted, which is why having 30 polls all wrong in New Hampshire was an extraordinary wake-up call.
Massive consumer and voter databases that did not exist 25 years ago now provide detailed information (income, home value, etc.) on most people, allowing pollsters to draw a randomly selected target sample from a voter list and then match closely-similar respondents from a large opt-in internet panel, creating a sample that mimics a random sample across many dimensions and removes skews that simple demographic weighting cannot.
In 2006, the matched internet-panel method produced election forecasts whose average error was substantially less than the average reported telephone poll, mainly because it removed biases better (not because sampling variability was smaller), and Rivers argues telephone polling could be improved by similarly substituting a respondent who resembles the missed person rather than drawing another number from the same skewed population.
Interactive voice response (robocall) polls make up roughly 80-90% of polling done in 2008 because newspapers cut back on expensive live-interviewer polling, and despite low response rates and not knowing who is answering, their record is not bad—largely because IVR organizations pay closer attention to weighting than traditional phone pollsters.
Some organizations (Associated Press, New York Times) refuse to report polls using nonprobability sampling—respondents selected without known probabilities of selection—but Rivers argues they are in denial because their own low-response telephone polls also lack known selection probabilities.
My Notes
Loading notes...