Jonathan Leger – SEO And Internet Marketing Blog Internet Marketing Blog

28Jan/08Off

Why off-theme links have to be counted by Google.

Of all of the myths discussed in my Search Engine Myths Exposed report, the one that leaves people scratching their heads the most seems to be Myth #3, that you must get links from sites related to the same subject as yours in order to rank well in Google. I completely dispel this myth in the report.

People often walk away believing that the need for so-called "themed links" is a myth, since I demonstrate Google's glee in ranking sites with purely off-theme links in the report. Despite that, though, this reality is still confusing to many.

Why? Because from a purely logical point of view it appears to make sense that links from pages related to the same subject should hold more weight in Google. After all, a link counts as an endorsement, and an endorsement from an expert in the same field surely counts for more than one from a nobody in a different field.

How many medical products use the slogan "9 out of 10 doctors agree..." in their advertisements? Do you ever see them say "9 out of 10 plumbers agree..." when advertising the latest pharmaceutical? Of course not. That endorsement simply wouldn't hold much weight. So logically it seems that Google should be doing the same.

I think some people have the mistaken idea that I feel off-theme links count as much as on theme links. I'll be honest: I really don't know if they do. What I do know is that you don't have to get on-theme links to rank well in Google.

Why not? Why can't Google only count the "votes" from on-theme sites, since in theory they should be more valuable?

If you're thinking that it's not technically possible to do so, you're not completely wrong. It is possible, but there are problems. I personally have written code that breaks a page down into the keywords and phrases that are most represented on the page. It wouldn't be difficult to extend that out to discover the primary subjects of an entire site.

Store that information in a database and you can check outgoing links against it to see if the linking site relates to the category of the page it is linking to. If it does, count it as a vote. If not, ignore it.

Sounds simple, right? Wrong.

Let's take a look at a few search queries at Google to demonstrate why this is not so simple:

keywords results
personal page 133,000,000
newspaper 239,000,000
blog 2,330,000,000

 
The most recent study I've seen that tried to estimate the size of the web was from the University of Iowa in the USA. That study estimated the number of indexable web pages at 11.5 billion.

But that was 3 years ago (February of 2005), so let's "guesstimate" that there are about 20 billion indexable pages on the web now. The three search queries listed above constitute 2.7 billion pages, or about 13.5% of our 20 billion "guesstimate." Each of those queries represents a kind of page that, by its very nature, will very often link out to completely unrelated sites and pages.

After all, how often will one newspaper link out to another newspaper? How often will a personal blog link out to another personal blog, or a personal page to another personal page? It happens, but the percentages are very small. In fact, just about every kind of site will often need to link to sites that are not related (directly or indirectly) to their own subject matter.

If Google was to ignore those links in favor of only links from "on theme" sites or pages, a very large percentage of the natural "votes" would be ignored. That would be diametrically opposed to Google's premise that what other sites are voting for with their links is how the web should be ranked.

If you start ignoring links because the linking site does not appear to be related to the linked-to site, you start descending into the quagmire of determining the keyword relevance of a site, such as a news site, which reports on every kind of subject imaginable. With so many subjects, the list of keywords it relates to would be huge, making such comparisons computationally expensive.

One potential way to deal with this problem would be to only compare outgoing links against the keywords that appear on the page containing the link. But that is not without difficulties, too, since it is very possible that the specific link is not semantically related, but still falls within the same category of the site (or a related category).

For example, the word "fertilizer" is not semantically related to the word "gardening" (semantics deal with the different meanings of a word or phrase, and fertilizer does not mean gardening or vice versa). However, the two words are obviously associated. What garden doesn't need fertilizer?

It's completely natural for a web page about gardening to link to a site or page about fertilizer. That's an easy association for the human brain to make, but those kinds of looser associations are much more difficult for machines to figure out.

It gets even more difficult if a site about being environmentally conscience links out to a page about fertilizer. The web is full of loose connections like that which cannot be ignored if you want an accurate index.

Added: A reader posted a GREAT example of the difficulties that arise trying to match up themes in this way. It was so good I felt it should be included directly in the post:
 
"Although I write primarily about arthritis pain relief, I find myself diverging into various other topics, such as swimming, cycling, weight loss, vitamins, different types of fat , comfortable furniture etc. Some might think that they have nothing to do with arthritis, but I have plenty of links from sites dealing with those topics."
--Donnie, tipsarthritispainrelief.com
 
So at what point is a link "off theme"? That's a very tough question to answer accurately.

 
If Google relied on current technology to determine on-page and on-site relevance, and ignored all links that didn't make the grade, it would lose much of what makes it better than its competition: it's democratic approach to ranking the web.

Computers are not people, and the number of associations that the search engine would have to be able to make in order to accurately count and ignore links based on relevance are astronomical. This does not rule out Google giving on-theme links more weight based on whatever associations it is currently capable of making, but I think it makes it pretty clear why Google is not presently capable of ignoring off-theme links.

Besides, just because somebody works at the grocery store doesn't mean that their vote for the best MP3 player has no merit at all. It may not have quite as much merit as a professional reviewer for PC Magazine (though some would say it has more), but it still has merit.

So really, Google simply cannot ignore off-theme links if they want to stick to their guiding principle of letting links work as "votes" for ranking the web. Considering how much easier it is to get off-theme links than on-theme ones, I for one will stick to ranking my sites the easy way.

Please leave your thoughts in a comment below.