Wednesday, July 08, 2009

In search of a credible search engine

The internet is obviously a wonderful source of information. Unfortunately, a lot of that information isn't actually true.

The misinformation isn't usually intentionally malicious. It's just the byproduct of every dumbass with a computer having the ability to put their own mistaken beliefs on the web. Some of the resulting misinformation is more widely-believed than the actual truth, and this leads to problems.

Google is really a very good search engine. I use it all the time, and generally the first few results you get from a Google search are decent enough. But not always.

Recently, I've been contemplating how much better searching would be if the results for certain kinds of searches were sorted not by popularity, but by credibility. Far too often have I encountered search results where the misinformation heavily outranked the truth.

Well, as it turns out there are at least two search engines out there who claim to promote credible information on searches related to health care (but nothing much else at this point). Since bad healthcare information dominates the internet, and since that misinformation can actually have pretty severe consequences for people who believe it, health information seems like a great place to start.

Today I'm going to examine these two new search engines and compare their results to Google on a number of health issues. Let's see how they work.

The following search engines will be used:
  • Bing - this is Microsoft's recently-renamed search engine, that they stupidly choose to call a "decision engine." I don't know what that means either. Wikipedia (itself not exactly the most credible source) describes its "Bing Health" search services as follows:
    Bing Health (previously Live Search Health) is a health-related search service as part of Microsoft's Bing search engine. It is a search engine specifically for health-related information through a variety of trusted and credible sources, including Medstory, Mayo Clinic, National Institutes of Health's MedlinePlus, as well as from Wikipedia.
  • hakia - I just discovered this one last night. It's still in beta, but does make a very concerted effort to provide credible information. As they describe it, hakia gets librarians to submit credible sources, and require a source to be peer-reviewed, lack commercial bias, have current information, and not be controlled by outside parties for it to be considered credible. At this point, their credible search element is limited to health and environmental information.
  • Google - really needs no introduction.

I'll be using a few different health-related search terms to see how these three compare. They are:
  • diabetes - basic, easy one to start with.
  • vaccine ingredients - simple enough, just give me accurate information on what's in vaccines.
  • swine flu - because big stories draw scammers.
I'll submit each of those italicized search terms to the three engines, and let's see what we end up with. I'm only going to deal with the top five results from each, because that's where the vast majority of people are going to end up.

Test #1 - diabetes

Google - The first five results (discarding ads, duplicates from the same domain, and news results) are pretty good. At the top is the American Diabetes Association, with WebMD, a journal of diabetes, and NIH information in the top five.

Unfortunately, this site ranks third, and it's little more than a marketing site for the pharmaceutical company GlaxoSmithKline. Pretty blatant conflict of interest there.

Bing - Bing separates its results out into different categories (like "Diabetes Symptoms," "Diabetes Prevention," etc). I'm just going to use the top-level stuff that doesn't have a subheading, and again ignore the ads at the top.

Bing does okay here. The top results is from the Mayo Clinic, who clearly have partnered with Microsoft here. Second is the ADA. Third is Wikipedia's article on diabetes, which is not exactly what I'd consider a credible source (Wikipedia articles are a great example of popular ideas winning out over true ones). GSK's little propaganda site is up there too.

hakia - A search at the default domain brings up a special diabetes page. All the information seems pretty good, with the ADA, Mayo Clinic, and some other fairly respectable sources of information available. Notably absent are GSK's site and Wikipedia, which is nice.

Clicking on the "credible" tab for results gives us several PubMed articles, as well as a couple of top results for the Northwest Coalition for Alternatives to Pesticides. Which is a decently reality-based page, but probably not what you're actually looking for if you just search for "diabetes."

Verdict! - They're all pretty decent, actually. hakia is maybe a tiny bit ahead for keeping out GSK and Wikipedia, but loses points for relatively useless "credible" results. Let's call this one a tie.

Test #2 - vaccine ingredients

This one is going to be interesting. There's a large antivaccination contingent on the internet, so getting accurate information about what's actually in vaccines is tricky.

Google - It's not pretty. The top two results are both antivacine sites, publishing lists of dubious quality to try to scare people away from vaccines. Lots of talk of thimerosol, which crazy people insist causes autism despite tons of evidence to the contrary.

The third result is the CDC, which is the only credible source in the top five. Then it's back to the antivaccine propaganda. Just below the CDC, we even get One look at that site should be enough to show just how credible their information is.

Bing - Holy crap. Google's results were already horrible, but Bing's are even worse. Rense has moved up to the top spot, with the even more batshit-insane taking up third place. The entire top 10 is utter crap, except for the CDC at number 8. Horrible results.

hakia - Well, the "Web Results" are still incredibly bad. But hakia does put the "Credible Sites" right next to them.

So we have batshit insanity right next to very good stuff. It would be nice if the credible stuff actually showed up in the general "web results" too, but perhaps that's something that will come about at a later stage (note that hakia is still in beta). Additionally, hakia's credible results are poorly-ranked. They skew towards overly-specific when people are looking for general information. Again, this is something that might be worked out later.

Verdict! - They all suck, but hakia sucks slightly less than the others. Bing is actually even worse than Google, despite their claim to provide credible health information.

Test #3 - swine flu

Google - Not too bad. The CDC's excellent swine flu portal is in the top position, with the rest of the top five rounded out by Wikipedia, a mostly-reasonable but ad-heavy medical portal, the World Health Organization, and Medline. Wikipedia's presence is probably okay in this case, because of H1N1's newsworthiness.

Bing - Again we get the Mayo Clinic at the top, with Wikipedia following close after. Then some stuff from the Guardian which is pretty reasonable.

But right after that we get a really stupid blog, which has a lot of bad information (like telling you to disinfect your shoes, which is just dumb) and appears to be a veiled attempt at selling you an herbal supplement. It's a scam site part of a network of identical scam sites, plain and simple.

Overall, the Bing results for swine flu appear to be about 20% credible and 80% scams and bullshit.

hakia - Pretty good! The credible results are indeed very credible, though they once again suffer from not being particularly relevant. The normal results are also pretty credible, and there's nary a scam to be found. The normal results do skew a little newsy at times, but overall not too bad.

Verdict! - Google and hakia both win this round. They avoided nonsense and provided good information. hakia's "credible search" is still giving good, but often irrelevant results, but their normal search made up for it. Bing was fine for two links and then turned to crap.


I had intended to do a couple more test searches, but this is getting lengthy and there's a pretty clear pattern emerging.

Basically, when it comes to health information your best options are either Google or hakia. Google has a lot of bad stuff mixed in, but the top few results are usually decent. hakia has a way to go before it's truly useful, but the credible results it delivers are certainly credible. They're just not necessarily very relevant. If it works out the kinks, maybe by the time it's out of beta it'll be something quite good.

Bing, on the other hand, sucks really hard. I'm not sure how a search engine that goes out of its way to provide credible health information managed to provide worse information than Google (which makes no such attempt), but it did. Avoid this one.

Also notable is that all our engines did pretty badly with the vaccine ingredient search. This was not the least bit surprising, and is actually the reason why I chose that term. Faced with the tons of stupid out there, hakia seemed to do the best. Dealing with these sorts of controversial topics may be the most difficult test of a search engine, so hakia's relatively good performance may signal good things on the way.

Of course, it's necessary with any search engine to use your own critical thinking skills and actually sort out what's true from what's not. Even the most credible and well-respected sites can contain errors, and no search engine is going to be able to make the final call for you.

Hopefully we can reach a point where a simple health-related search can bring back mostly-credible results instead of mostly-nonsense results. There's still a way to go, but it's good to see some people trying.