Wednesday, August 27, 2008

Why you shouldn't trust polls

People talk about polls a lot these days. Particularly presidential polls, where we're constantly told that Obama and McCain are "neck and neck."

This makes for more drama in the race, which journalists love. Nothing more boring than a runaway victory, that's for sure.

But is it really neck and neck? Maybe, but you really can't draw that conclusion from all these sucky-ass polls!

Remember the two days after the New Hampshire Democratic primary when everybody in the media suddenly realized that all their polls were incredibly wrong? Those were good days! Then they went right back to believing that their polls were valid and have been beating us over the head with them ever since.

Before I get into why the current polls are so incredibly flawed, let's talk about history a little. You've no doubt seen the famous picture of Harry Truman up there holding up a "Dewey Defeats Truman" headline after the 1948 election. This actually wasn't a polling error, it's just a nice picture. In fact, this screwup was due to a bad extrapolation of early voting returns. Which brings Fox News to mind, for some reason.

More relevant to what I'm talking about is the 1936 presidential race, in which it was at one point determined by the highly respected Literary Digest poll that Alf Landon would win in a landslide.

Remember president Landon? Of course not. FDR beat his brains in. Landon won Maine and Vermont and that's it, FDR got the other 98.5% of the electoral votes. In the popular vote, Roosevelt got 60.8% and Landon trailed almost 25 points behind with 36.5%, one of the biggest victories in modern history. Yikes!

What went wrong?

Well, Literary Digest (which was folded into Time magazine a couple of years later) used idiotic sampling procedures. They only polled their own readers, owners of registered automobiles, and people with telephones.

They did this during the Great Depression, of course. When the only people with cars and phones and with enough disposable income to subscribe to Literary Digest were far wealthier than the average citizen, and therefore more likely to vote Republican.

In modern terms, it would be like the Wall Street Journal running a poll that only counted their subscribers, people who drive Mercedes, and people who eat caviar at least once a week. Totally worthless.

Incidentally, George Gallup had his own poll of 5,000 random people that year and correctly predicted that FDR would win. And we haven't been able to shut up the Gallup polls ever since.

So, how does this relate to our current situation, when Gallup's more statistically-worthwhile polls are commonplace and pollsters have hopefully learned from their prior really stupid mistakes?

Simple: It all comes down to phones once again.

Most political polls are conducted by telephone. Many pollsters (but not all) just call land lines.

It shouldn't come as a shock to anyone that in the current presidential race Obama does much better with younger voters and McCain fares better with older voters. This is true not just of these two, but of their political parties. The young mostly vote for Democrats and the old mostly vote for Republicans. It's just the way it goes.

It also shouldn't come as a shock that young people are far more likely to be cell-only while old people love their landlines. Hello, sampling error!

Pew Research recently released a study looking at the cell phones vs landlines issue. It's pretty telling:

Let me particularly point you to the third part of that chart, where they look at Obama vs McCain preference.

Among the landline-only people, you get 46% for Obama and 41% for McCain. Okay, that's not far off what the other polls around this time showed. You get slightly more for Obama in the two groups that have both (landline-mostly and cell-mostly), but it's not really a significant difference.

But among the cell-only group (which is apparently just under 15% of the country) you get a massive shift. It goes up to 61% for Obama and drops to 32% for McCain. That's a big difference!

Yes, you'll also see that the cell-only group is less likely to be registered to vote (still time!) and also slightly less likely to actually vote (maybe not enough time?), but that's not unexpected among the relatively unreliable "youth vote" and is what candidates hope to address with GOTV campaigns.

Polling organizations are aware of these discrepancies, of course. Hell, Public Opinion Quarterly (which I assume most pollsters read) even had a whole issue devoted to the cell phone vs landline problem.

But it's not just about cell-phone only and landline-only people. It's a bit more complex.

The bigger polling firms (like Gallup) use random dialers that don't exclude cell phones. Let's take a look at the methods section of their most recent poll. It's pretty sparse, unfortunately:
For the Gallup Poll Daily tracking survey, Gallup is interviewing no fewer than 1,000 U.S. adults nationwide each day during 2008.

The general-election results are based on combined data from Aug. 24-26, 2008. For results based on this sample of 2,724 registered voters, the maximum margin of sampling error is ±2 percentage points.

Interviews are conducted with respondents on land-line telephones (for respondents with a land-line telephone) and cellular phones (for respondents who are cell-phone only).

In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.
Okay, it's good to have a large sample size, and that's a nice low margin of sampling error.

And as you can see, they do call people with cell phones. Though they don't mention how they know that it's cell-phone only. Do they ask anyone with a cell number if they have a landline and dump those people into the landline group? Maybe so. It's tough to know from this tiny little methods section.

The problem goes beyond cell-only and landline, though. It goes into a feature that's present on virtually every cell phone and only present on a relatively small (though growing) number of landlines: Caller ID.

I'd put Caller ID into the "practical difficulties in conducting surveys" category as mentioned in Gallup's disclaimer up there. For a large number of people (myself included), if a number comes up on the Caller ID that I don't recognize, I don't bother answering the phone. This is even more true if I'm out somewhere and my phone rings. Why should I take the time to answer some stranger's call when I know they can just leave me a message if it's important?

So who does answer the phone to talk to pollsters? Lonely people. People with nothing better to do. Invalids. People who are desperate to talk to someone -- anyone -- now that all their friends are dead.

In other words, old people.

Obviously it's not just old people talking to pollsters, but I'd be willing to bet good money that if you look at the median age of people who respond to telephone polling and compare it to the median age of voters, you're going to find at least a 5-10 year discrepancy.

According to, the likely median age of voters in the 2008 election will be about 44. I've been unable to find good data showing what the media age of responders to telephone polls is, but if anyone can point me to some I'd be mighty grateful.

Even if we give big firms like Gallup the benefit of the doubt and assume they go to great lengths to make their sample match the median age of the voting population (which they should if they want a truly representative sample), we still have a problem! They could do absolutely everything in their power to make the demographics of their sample match the electorate, but they can't do anything about people who just won't take a telephone poll. There are plenty of those people, and there's no reason to assume they have the same views as those who do respond to the polls.

A couple more problems with cell phones I'll just touch on...

First, unlike landlines, cell phones are not tied to one location. Your cellphone number can follow you around for years, so you can quite easily have a number that corresponds to another part of the country entirely.

This is a problem because polls rely on extrapolating your one point of data into a representation of a large number of people living in your area. If you live in Texas but have a Delaware phone number, you're counted as being from Delaware. Not such a big deal if you're doing a nationwide poll, but for anything more local than that then goodbye representative sample!

Second, cell phones generally don't represent a "household" like landlines do. This is perhaps a minor quibble, because polls probably shouldn't be counting the views of whoever answers a landline as being representative of anything beyond that individual, but it happens. My household has three phone numbers (two cells and a landline). Can that skew things? Yeah, but it's probably not huge.

Third, there are indeed still people who don't have a phone number they're reachable at. They may have a limited cell phone that they'll only use for outgoing calls. Or one of those cheapo convenience store phones that they certainly won't waste time talking to a pollster on. How do you think they vote? Probably not for Alf Landon!

There are plenty of problems with polling beyond the ones I've raised here, but I'm also not saying that polls are totally useless (indeed, I often refer to them even in this post). Tracking polls can show trends reasonably well. Standard polls can still give a rough idea of where things stand, and if this election weren't so divided by voter age maybe they'd even be reasonably accurate. And more focused polls can be quite good. But I'm talking about the presidential election here.

And since choice of candidate is divided so much by voter age in this election, most of the polls you're seeing now are likely to be wrong. Don't pay much attention to that +-2% margin of error, because there are so many other factors involved it's likely to be considerably higher.

Now, maybe I'm wrong about all this. Maybe the polls are spot-on and they've already done a flawless job correcting for all these factors. I guess we'll find out in November. But for now, don't read too much into the polls.