The Uncanny Accuracy of Polling Averages*, Part IV: Are The Polls Getting Worse?

This is the fourth article in a series on the accuracy of polls and polling averages. In the first two installments, we demonstrated that polls have been extremely accurate at forecasting the winners of governors’ and Senate races in recent years — much more so than you might expect based on intuition alone.

However, the fact that polls have been strong predictors in the past does not necessarily imply they will continue to be so in the future. In Part III, we took up one type of critique that I encounter frequently — that 2010 is an unusual political cycle, and that its idiosyncrasies may render the polling less accurate. While this is not an unreasonable hypothesis, we found it does not have any grounding in the evidence: the polls have done no worse in “unusual” political cycles like 1992, nor in “wave” years like 1994 and 2006, than in routine-seeming ones like 1996 and 1998.

There is another type of argument, however, that is potentially more troubling. It could be that, irrespective of the character of this political cycle, polling itself is in decline. This is a widely held view among political elites and many polling professionals — and quite a few of the readers of this blog, I might add.

There are some sound theoretical reasons to think that this is indeed the case. We’ll take these up today, in Part IV of the series. Tomorrow, in Part V, we’ll look at what the empirical evidence says — and make an effort to diagnose just how serious these problems are from the standpoint of our forecasting models. Finally, in Part VI, we’ll address some additional concerns related to statistical modeling more generally.

(For those of you scoring at home: this series was originally supposed consist of three articles — and now it’s going to be at least six. But the previous articles have drawn a favorable response, so I’m opting to go into more detail rather than less. From my standpoint, it’s more important to examine these big-picture issues than to give you the play-by-play on every new poll that comes over the news wires.)

Part IV

There are several good reasons to be concerned about the state of the polling industry.

1. Response rates to all types of polls are decreasing, as Americans become more aggressive about screening their phone calls. One academic study found roughly a 30 percent decline in survey response rates (from 36 percent to 25 percent) from 1997 to 2003 — although rates of decline were slightly smaller for surveys that used more rigorous methodology. The downward trajectory has almost certainly continued since 2003.

Moreover, the effects are especially large among certain demographic groups, like young voters. Pew has found, for instance, that only 7 percent of the adults who respond to a typical landline poll are 18- to 29-years-old. This compares to 21 percent in 1995, a figure that is far more representative of young adults’ share of the population.

This is the top-end case, by the way. Pew leaves their surveyors in the field for days, and goes to great lengths to “convert” people who initially decline to cooperate. They also call back numbers when people do not answer the first time around. Pollsters who don’t take these steps may get even lower participation rates among young voters. A series of polls conducted this year by the automated polling firm SurveyUSA, for instance, had only 1 percent to 5 percent of their samples made up of people aged 18 to 34.

2. Many young Americans — and an increasing number of older Americans — rely primarily or entirely on their mobile phones, which many pollsters do not call. About one in four Americans live in cellphone-only households, and that fraction is increasing every year. In addition, another 15 percent of Americans have land lines installed, but rely principally on their cellphones, and many of them rarely or never accept incoming calls on their land lines, especially from strangers.

On top of that, about 2 percent of Americans don’t have personal telephone service of any kind. That isn’t much of a problem right now, but the fraction may grow as more Americans switch to online substitutes for telephone service, like Skype or Google Voice.

Pollsters can combat this problem by including cellphone numbers in their samples, of course. An increasing number of national polling houses, including Pew, Gallup and The New York Times — as well as some local pollsters like Quinnipiac and Marist — are doing just that. But this is somewhat expensive, and in an era of austerity for traditional news media companies, not all polling companies are going to the expense of doing so.

Meanwhile, firms like SurveyUSA and Rasmussen Reports, which use automated programs rather than live interviewers to conduct their polling, rarely or never include cellphones in their samples. In part this is because federal law requires that calls placed to cellphones be dialed manually, which would undermine the cost-competitiveness of the automated polling firms.

There is some evidence that excluding cellphones may bias the polls — in particular, cellphone-only adults may be more liberal than those who share most of the same demographic characteristics. Therefore, polls that exclude cellphones may tend to underestimate support for Democratic candidates and liberal causes, even if demographic weighting is applied.

3. The proliferation of “robopoll” firms like SurveyUSA and Rasmussen Reports may in and of itself be a problem, or may exacerbate the other problems. About 60 percent of the polls in our database this year were conducted by automated polling firms. Some of them have achieved decent results in recent years. Rasmussen Reports and Public Policy Polling, for instance, have somewhat above-average track records, as measured by the accuracy of polls conducted close to Election Day. And SurveyUSA has had a considerably above-average performance.

But, automated polls are also associated with lower response rates. And some of the firms, like Rasmussen Reports, take other types of shortcuts, like conducting all of their polling in a single evening. Also, as we have mentioned, they almost never include cellphones in their samples. Therefore, it is open to question whether these firms can continue to perform on par with traditional pollsters.

4. An increasing number of voters are most comfortable speaking a language other than English, and are unlikely to complete surveys unless the interviewers are trained in their native tongue. Research sent to me by the firm Latino Decisions indicates that about 40 percent of Hispanic voters in California expressed a preference for speaking Spanish to pollsters. In 2008, about 10 percent of voters nationwide were Hispanic: the Latino Decisions data might imply, therefore, that about 4 percent of voters will be excluded from polling because of language barriers (possibly a bit more if we also account for non-Hispanic immigrant groups like Chinese Americans). The problem is biggest, of course, in states that have large immigrant populations, like California, Florida, New York and Texas.

5. Internet polling has yet to really mature. For every online polling firm like YouGov, which thinks carefully about its polling and has had some encouraging results, there are others like Zogby Interactive that make no real effort to ensure that they have drawn anything resembling a scientific sample. (FiveThirtyEight does not use Zogby Interactive surveys in its forecasting.)

I tend toward the optimistic side about the future of Internet polling. But there may a period, possibly lasting for a couple of political cycles, when neither telephone polls — particularly those that do not dial cellphones — nor Internet polls produce wholly satisfactory results.

*-*

All of these problems boil down, more or less, to response rates. Just which fraction of Americans are both able and willing to participate in a survey? And are the Americans who are taking part in surveys representative of those who don’t?

Let’s take up the case of one rather prolific firm: Rasmussen Reports. If low response rates produce grave problems with the quality of polling, they are liable to show up in Rasmussen Reports polls before most others. This is because the response rates in Rasmussen Reports surveys are almost certainly exceptionally low.

We’ve already noted that Rasmussen does not call cellphones, so they will never reach about 30 percent of the adult population. Of the rest, about half will not realistically be able to accept a phone call on their land lines between 5 p.m. and 9 p.m. on a given weekday evening, which is the only time that Rasmussen calls. Because Rasmussen does not call back when they miss respondents the first time around, these voters are effectively excluded from their surveys. That reduces the fraction of Americans who could potentially take one of their surveys down to about 35 percent.

Another issue is that Rasmussen simply talks to the first person they reach on the phone, rather than employing a randomized selection procedure within the household (“Can I speak to the person with the next birthday?”), as some traditional pollsters do. Therefore, anyone who is not always the first in their household to answer a land line is effectively excluded from Rasmussen polls (I’m sure many of our readers live in such households). It’s somewhat difficult to quantify this effect, but conservatively speaking, it probably brings Rasmussen’s potential reach down to no more than 30 percent of adults.

But, this is only half the battle. That 30 percent is what survey professionals call the contact rate. What percentage of them will be willing to take the survey? This is the cooperation rate.

Last year, I commissioned a large telephone survey on behalf of a nonpolitical consulting client. This client was willing to pay good money for survey work, and we used a reputable and professional call center with live operators.

Both the client and I were pleased with the call center’s work. Nevertheless, for each interview that we completed, there were roughly eight people who answered the telephone but refused to take the survey — in a few cases because of language problems, but mostly because they simply didn’t feel it was worth their time. In other words, only about one in nine people that we were able to reach on the phone were willing to complete the survey — a cooperation rate of about 11 percent.

The response rate to a survey is the contact rate multiplied by the cooperation rate. If you multiply our estimate of Rasmussen’s contact rate by the cooperation rate that my client experienced, you get a contact rate of only about 3 percent or 4 percent of the population.

It’s possible that the cooperation rate that my client achieved was unusually low — ours was a rather long survey, for instance (although few people discontinued the survey once they’d begun it). But, more likely than not, it’s an optimistic estimate. For one thing, I’ve counted only those people who picked up the phone but then said no or hung up, not those who deliberately ignored the call in the first place. Also, Rasmussen’s automated script may feel more “spammy” to the typical respondent than a call from a human operator — so their refusal rates might be higher than my client experienced. So perhaps their cooperation rate is about half as good as the one my client experienced, or around 6 percent.

If you multiply a 30 percent contact rate by a 6 percent cooperation rate, you wind up at a response rate of only about 2 percent. By this estimate, then, only about one in 50 adults will be both able and willing to take part in a given Rasmussen Reports poll.

Perhaps the real number is a little higher than our estimate, but I doubt that it is much higher. It may even be a little lower, in fact.

Fortunately, this is an extreme case. Pollsters can improve their response rates, perhaps by orders of magnitude, by doing any of these things:

Including cellphones in their sample.
Conducting their survey over several days and calling back people who don’t respond initially.
Working to “convert” those who refuse to take the survey initially.
Using in-household selection procedures.
Employing bilingual operators.

Many pollsters do some of these things and some do all of them.

But, in spite of a response rate that is probably in the low single digits, and may even be 1 percent or 2 percent, Rasmussen’s polls have performed about as well as the traditional ones when it comes to predicting the outcome of elections. How are they able to achieve this?

Mostly through the “magic” of demographic weighting. Say that you’re only getting one-third as many young people as you “should” have in your survey. You can counteract this by counting each of the young people that you do get at triple weight. This is essentially what pollsters do.

There are some potential problems with this. For instance, you’re necessarily giving disproportionate weight to small subsamples, which increases the overall margin of error associated with the poll. The more critical issue, however, is that the people you do get on the phone may not be representative of those you missed.

As I mentioned, for example, there is some evidence that young voters who rely entirely on cellphones are more politically liberal than those who have land lines in their homes.

Voters who have more pointed political views may also be more inclined to complete a survey than those who don’t. In this cycle, it is Republicans who appear more enthusiastic, so a survey that doesn’t make much effort to elicit a high response rate may wind up with too many of them. Rasmussen Reports, for instance, finds only a 1.5-point gap between Democratic and Republican party identification among what they say is all American adults. By contrast, surveys by live interviewers find, on average, a 6.6-point party identification gap.

The good news is that this is potentially much less of a problem on surveys of likely voters, since those surveys specifically seek people who are engaged enough by politics that they can be counted on to vote. In other words, the response bias that may be present for pollsters with low response rates may act as a de facto likely voter model, which means that automated firms may have less work to do in pruning out “unlikely voters” later on. SurveyUSA, for instance, generally deems 55 percent to 70 percent of the adults it contacts to be likely voters — whereas voter turnout rates in midterm elections are generally on the order of 40 percent. While this seems at first to reflect an overly lax likely-voter screen, it may instead be about the right number given that “unlikely voters” are also unlikely to have completed the survey in the first place.

Rasmussen also does something that is somewhat less common: weighting their samples based on party identification. For each of their polls, they have a formula to determine how many Democrats, Republicans and independents there “should” be in the sample, and they weight the sample they actually get accordingly.

Weighting based on party identification may work fairly well provided that (i) you indeed have a fairly good idea of what party identification really is at at given time, and (ii) party identification is fairly reliable as an indicator of voting preferences.

Let’s take up the first of those two provisos: how does Rasmussen set its party identification targets? In contrast to something like racial or gender identity, partisan identification is not fixed — it may change over time, and may also vary based on things like the wording of the survey question (asking about someone’s registration is different from asking about their identification, for instance). There is no Census Bureau data on the subject to use a s a yardstick, as there is for other demographic characteristics. Instead, Rasmussen establishes its party identification targets by looking at other polls.

In particular — although their methodology is not very transparent — it appears that they use some combination of exit polls and national Rasmussen Reports surveys that track changes in the number of adults belonging to each political party each month. You can probably see the issue here: if local Rasmussen Reports surveys are producing biased samples, then weighting them based on other Rasmussen Reports surveys that suffer from most or all of the same flaws will not really solve the problem. Exit polls may be a little better, but they are also subject to sampling and other types of error and have had their issues in the past.

Rasmussen could also get into trouble in an era when partisan identification becomes more fluid, or one where more Americans don’t identify with either party (evidence suggests that a greater number are indeed identifying themselves as independent).

Even now, these sorts of things are important to keep in mind when you evaluate the performance of their surveys across different states. I would tend to trust Rasmussen’s results more in a state like Oregon — where the Democrats are pretty darned Democratic and the Republicans pretty darned Republican — than in one like West Virginia, where a lot of Democrats vote Republican for federal office, or Alaska, where a lot of voters identify themselves as independent.

Nevertheless, for better or worse, Rasmussen has consistently been able to make lemons into lemonade.

Some of the reason may be that, however poor their raw data is, they have gotten used to working with it. Say, for instance, that their raw data underestimates the performance of Democratic candidates by 5 points. So long as this figure remained roughly constant over time, they might be able to achieve satisfactory results by calibrating their data based on it.

Another thing that a firm with a low response rate can do is to look for guidance at the results obtained by other pollsters who have higher response rates. If an automated firm saw its results diverging grossly from Gallup’s, for instance, the firm could tweak its weighting formulas until they produced results that were more in line.

I have no idea whether Rasmussen, or any other polling firm, “cheats” that way by looking at the results that other pollsters produce. But it is probably naïve to assume that polls are truly independent of one another. It can be shown, for instance, that the variation in polls decreases when more of them are in the field in any given state. (See chart below). This could reflect any number of things: that pollsters “peek” off of one another, or that they decline to publish surveys that appear to be outliers, or something else.

This, however, creates another potential problem. Say that the polling average in a particular state has the Republican ahead by 5 points — but we have knowledge that the Democrat will in fact win by 1 point. A pollster goes into the field and finds the Democrat 2 points ahead: he’s getting the results about right when others are getting it wrong, and including his poll would improve the accuracy of our average. But, the pollster is concerned about his “outlier” result, and declines to publish it. Now, our problems have been made worse. Pollsters usually take heavy criticism when they publish a result that is inconsistent with the consensus about a race. Although coming up with outlier results as a matter of course may suggest problems, publishing an occasional odd-seeming result may indicate that the pollster is being objective and is not putting his thumb on the scale.

This post has now gotten long enough to warrant an executive summary:

For a variety of reasons, response rates are declining throughout the survey industry. This has the potential to create problems, including both reduced accuracy and the introduction of bias.
These problems are made worse by firms that take various other types of shortcuts, usually in the name of cost containment.
Pollsters have various defense mechanisms to combat low response rates — particularly weighting based on demographic and sometimes political variables.
There is some evidence that pollsters do not behave independently from one another — that the results obtained by one may influence others. For an individual pollster, this may be a defense mechanism of sorts — a poor pollsters can look toward a stronger one for guidance. But it may create additional risks for forecasters if the “consensus” view is wrong.

In Part V, we’ll look at whether pollsters are winning or losing their battle against declining response rates. Are there signs that the accuracy of polling has begun to deteriorate?

The New York Times

FiveThirtyEight | The Uncanny Accuracy of Polling Averages*, Part IV: Are the Polls Getting Worse?

The Uncanny Accuracy of Polling Averages*, Part IV: Are the Polls Getting Worse?

Part IV

Part IV

What's Next