Tuesday, November 18, 2008

Why you can't trust research

Science reporting in the popular press is a terrible thing. You have reporters with no understanding of science regurgitating press releases about studies that may or may not have any validity. Only dramatic or unusual studies get reported on, and if further investigation proves those studies to be false it's unlikely to be reported.

So it's really no wonder that many people don't trust scientific studies. Why should they? Most of the studies that get mentioned in the popular press are wrong anyway. That's why they get printed! Ninety-nine studies confirming that there's no link between vaccines and autism aren't as newsworthy as one study showing there is, for instance.

Unfortunately, the problem here isn't just with the popular press. I've just stumbled across a very interesting paper that purports to prove that "most claimed research findings are false" even in the scientific literature!

Incidentally, you can read the entire paper at that link above. No lame abstracts for us! If you're not in the mood for wading through it (it can be a bit technical at times), I'll basically go over it here anyway.

The paper is actually titled Why Most Published Research Findings Are False, which is a pretty dramatic claim. Many, sure. But "most"? We'll see about that!

Let's begin with some basics, though. To be considered good scientific research there are a few things you need to do. Your tests should ideally be double-blinded, have a good control, have a large sample size, have a solid theoretical grounding, avoid bias, and so forth. Most studies don't live up to all of these, but the good ones at least try to.

Let's say you do a great study in which you attempt to see whether or not masturbation causes blindness. All appropriate controls are in place, and you have a nice big sample size. Now you have to figure out if the results mean anything.

So you compare your non-masturbating control group to your masturbating group and see a statistically significant difference in blindness. Since I'm making this all up anyway, let's say that you discover that masturbation does cause blindness. Interesting result!

You use various statistical techniques to determine that your p-value is below the standard 0.05 mark. What that 0.05 means is that there's only a 5% likelihood that your results are due to chance. You're 95% likely to be right!

That 95% confidence is the standard. Sure, a higher confidence (lower p-value) is even better, but it's at 95% that people will start to take your results seriously.

So now you have a well-done study that shows statistical significance to 95% confidence! Great! The media picks up on your story, religious groups start citing your research to try to get people to stop masturbating, and old wives feel happy with their tales.

The only problem is that even if you did absolutely everything right in carrying out your research (which in a study like this would be nearly impossible), there's a 5% chance that you're still totally wrong.

Even if every study out there was done flawlessly, one out of every twenty would reach a conclusion that's total crap. And unfortunately for you, you just did that study.

So does that mean 1/20th of the studies out there are wrong? That's disheartening, but still not too terrible!

Sadly, no. It's a lot worse than that.

Most research is not well-done. Biases are introduced that screw everything up. Numbers are cherry-picked to support preconceived ideas. Things go wrong. These bad studies may or may not be easy to spot, especially if you only have access to an abstract and not the full paper.

Wait, it gets worse. I've barely delved into the paper mentioned above.

You see, there are other elements at play here. For instance, the likelihood of something being true in the first place. For every hypothesis that's correct, there are a potentially infinite number of hypotheses that are wrong.

Let's say I have five hypotheses:
  • Cigarettes cause cancer
  • Potatoes cause cancer
  • Watching television causes cancer
  • Rain causes cancer
  • Leprechauns cause cancer
Of these theories, only the first is likely to be true (based on a large body of previous research). But I do studies on each of them.

Chances are that I'll discover that cigarettes cause cancer and that the rest don't. Yay!

But instead of just five studies, let's make it a thousand. We'll assume the same ratio of good to bad ideas. So I'd have 200 studies of things that actually do cause cancer and 800 studies of things that don't.

This is where things get really ugly.

Because I am a wonderfully skilled researcher, I do all 1,000 studies perfectly and get my beloved 0.05 p-value. Nobody can find a single flaw with my methodology. Even with the leprechaun one!

What do I end up with? Well, thanks to that 5% of uncertainty, I've just produced 40 well-done studies in which I prove that unicorns, bowties, and the letter Q all cause cancer.

I also have my 200 studies of things that actually do cause cancer, and since I am flawless I've confirmed that they all do so.

This leaves me with about 17% of my research being totally wrong. And that's assuming I did everything right. In the real world I'd likely have a much higher rate of both false positives and false negatives. It's hard to guess exactly how high they'd be.

Chances are the false negative rate would be fairly high, though. It can be tough to prove that something causes cancer even when it does. So let's say I only manage to prove that half of the things that cause cancer actually do. (The other half still cause cancer, I just can't prove it.) Now out of my 140 things that I think cause cancer, I'm wrong about nearly 30% of them.

If I wasn't such a good researcher and screwed up when studying things that don't cause cancer, I'd probably get a higher false positive rate than the idealistic 5% too. Even a piddling little 10% error rate would take me up to being wrong 44% of the time.

Add to that the possibility that I might suck even worse at choosing things to study than in this example. Maybe only one out of ten things I chose actually do cause cancer. Maybe one out of a hundred. Now we're in a situation where the vast majority of my results are total crap.

This is pretty bleak. Science is the best way we have of figuring out the world, and there are all these flaws! I haven't even gone into all the problems mentioned in the paper (and won't, since this is getting long). How the hell are we to know what's true and what's not?

Rest assured, there are ways to deal with the problems here.

First off, only pay attention to studies that are done well in the first place. It's hard enough trying to figure out the facts from good studies, bad ones are just a waste of time.

If a study has a small sample size, poor controls, or an obvious bias, there's a higher probability that it's wrong. Similarly, if it finds only a weak effect (like if something raises your risk of cancer by 2% or something) there's a decent chance it doesn't actually mean anything.

Keep in mind that, given enough time and a large enough number of studies, every crackpot theory in the world will have a research paper to back it up.

Don't jump to conclusions based on a single study, especially if its result strikes you as incompatible with what you've seen before. If something is interesting enough, other people will try to replicate the research. Once you get a fair number of studies into the same thing a clear consensus should emerge. One bad study is easy to come by, a hundred bad studies into the same thing is unlikely (but by no means impossible).

Some fields lend themselves to more false conclusions than others. In physics you're probably not going to find a huge number of totally wrong conclusions. In medicine you're going to find a lot. In the social sciences you're going to find a ridiculous amount. Basically, the more complicated things get the less likely you're going to get any useful conclusions. Few things are more complicated than human psychology.

Don't ever take the press release version of a study as the final word, and definitely don't trust the further bastardization of that press release as presented by the popular news media. Get the original research paper if you can. Try to find a review of it by a knowledgeable source. Even if you can get the original research paper, you may not be knowledgeable enough about its topic to really understand it. Even if you understand it, the paper itself could be wrong.

Everything is worth questioning, and if something really seems wrong then there's a good chance it is.

And please, please, support science education. There's way too much crap out there, and way too few people who know how to interpret it.