In my last blog post I talked about how to rank in Google after their Penguin update. One of the things I pointed out is that Google is penalizing sites for "anchor text over-optimization." That is, if you get too many links with the exact keywords you want to rank for in the anchor text, you might get penalized.
Notice that I said you might get penalized. That's because there are some circumstances in which site's aren't being penalized. More on that in a bit.
After reading about Google going after over-optimized sites, I decided to do some data mining and see if I could figure out exactly how much exact-match anchor text is too much. Here's what I did:
1. I gathered 1,500 keywords from 30 very diverse markets. Everything from business to technology to transportation to food and beverage to chemicals. A wide range of markets.
2. I ran those keywords through Google to get all of the top-level domains ranking on the first page for the keywords. I removed all ranking inner page urls from the data set.
3. I checked the link profiles of all of those domains to see what percentage of their links had anchor text matching the keywords they were ranking for.
Before we continue, let me explain why I removed the inner page urls from the data set. As I also pointed out in my previous blog post about Penguin, Google is favoring authority ("big box") sites in their search results more highly than ever. That means that an authority site can get an inner page ranked with very little or no backlinks at all. Wikipedia is a big example, as is Amazon.com. They're all over the search results after Penguin, as are other authority sites.
Because those authority sites would skew the results of my data mining (since they would typically have tiny exact-match anchor text percentages), and because I'm pretty sure most of my blog readers don't run huge authority sites, I only logged the data for the top-level domains whose home pages were ranking for the keywords. That is, if mydomain.com was on page one, I kept it, but if mydomain.com/innerpage.html was ranking, I dropped it. So the results you're seeing reflect sites that are more typical of what the average webmaster would be able to achieve.
Okay, onto the numbers. Some of what I learned was quite enlightening.
(For the purposes of this blog post, I'll refer to the percentage of links whose anchor text exactly match the keywords the site is ranking for as its EMA. Also, all of the queries were done via Google.com, so the ranking sites generally favor the USA. Lastly, all of the linking data was gathered using KeywordCanine.com.)
1. The Average EMA Is Pretty Low
It probably won't come as much of a surprise for me to tell you that the average EMA for a site is pretty low -- just 10% across all of the markets. So, on average, only 10% of the links to a ranking site contain the exact keywords the site is ranking for. But don't take that as the standard to aim for, because it varies a lot across markets.
Here's the full list of the markets, their average EMA, and the Maximum EMA found for any site that ranks for its keywords in that market. For the Maximum EMA, only sites with at least 50 unique domains linking to it were considered.
Average anchor text diversity across topics: 10%
|Market||Avg EMA||Max EMA|
|home based business||17%||57%|
|work from home||14%||53%|
|virtual server hosting||8%||11%|
|food and beverage||8%||8%|
From these figures you can see that the average EMAs are really low, but that there are usually sites with a much higher EMA than the average ranking on page one for their keywords. So who's getting away with having a higher EMA, and how are they doing it? Read on.
2. The Get Out Of Jail Free Card - Exact Match Domain Names
In pretty much every market I tested, Google is ranking one or more exact-match domain (EMD). What I mean by "exact match domain" is a domain whose name is made up of all of the terms in the keywords it's ranking for. That is, if workfromhomeonline.us is ranking for the keywords "work from home online", that's an exact match domain for the keywords. That site is ranking for those keywords, by the way, despite having a 53% EMA.
However (and this discovery goes against much of what's said about exact match domains), domain names that have dashes in between the terms also appear to be benefiting from this exception to the over-optimization penalty. That is, work-from-home.biz is ranking on page one for "work from home" despite having a 70% EMA.
Those two examples also bring to light something else that goes against common SEO knowledge -- Google is not penalizing EMDs even if they aren't one of the "big four": .com, .net, .org and .edu. The "lesser" domain name extensions are also being exempted: .info, .biz, .us, .ie, .ro, etc. They all appear to be averting punishment despite having very high EMAs.
Here's a list of some of the exact-match domains ranking in Google in the markets I tested. Each of the ones in the list have a 40% EMA or higher:
|internet business expert||6||http://internetbusinessexpert.co/||97%|
|work from home||4||http://www.work-from-home.biz/||70%|
|work from home online||2||http://workfromhomeonline.us/||53%|
|work from home india||6||http://www.workfromhomeindia.biz/||50%|
|work from home ideas||6||http://www.workfromhomeideas.us/||67%|
|business management solutions||6||http://www.businessmanagementsolutions.ie/||100%|
|music management software||8||http://www.music-management-software.com/||100%|
|network management software||3||http://www.networkmanagementsoftware.com/||40%|
|la crosse technology||1||http://www.lacrossetechnology.com/||80%|
|web design company||1||http://www.webdesigncompany.net/||88%|
|affordable web design||5||http://www.affordablewebdesign.net/||53%|
|new web design||1||http://newwebdesign.com/||57%|
|top web hosting||8||http://topwebhosting.com/||50%|
|web hosting review||6||http://www.web-hosting-review.net/||41%|
|web hosting canada||2||http://webhostingcanada.org/||51%|
|food beverage canada||2||http://www.foodbeveragecanada.com/||48%|
|exchange email hosting||6||http://www.exchangeemailhosting.com/||100%|
|plumbing how to||6||http://www.plumbinghowto.biz/||100%|
|home improvements maryland||10||http://homeimprovementsmaryland.net/||100%|
|computer internet security||8||http://computerinternetsecurity.org/||92%|
* Due to Google's localization, the rank you see the site at may be different than what's shown in the table.
Just look at those EMAs! Google is clearly letting those domains get past any anchor text over-optimization penalty. Also notice the last entry ("plumbing supplies"). Google is giving a pass to plumbingsupply.com even though its exact match anchor text is for "plumbing supplies", not "plumbing supply." So it seems Google is also exempting exact match domains which it determines have a variation of the keywords in the domain (in this case "supply" instead of "supplies").
So why is Google letting these guys get by without a penalty? It makes sense, really.
If Google penalized sites with a high EMA even if their domain name was an exact match for the keywords, they would end up dropping all kinds of brand-name sites out of the rankings. Think about it: what anchor text do most people use when linking out to a brand-name site (like Ford or Adobe or Amazon, etc.)? Their name, of course! So Google has to give those sites a pass despite having a high EMA. It just makes sense that Google won't penalize you for links with anchor text matching the keywords in your domain name. That's often the site's brand, and it will naturally have a much higher EMA for those keywords.
Is Google giving a pass to all EMDs with very high EMAs? The data can't answer that question. But clearly they are giving a pass to a lot of them.
3. The Ranking Exact-Match Domains Have A Lot Less Links
Another important point about the ranking exact-match domains versus all of the other ranking top-level domains: they have a lot fewer links aimed at them. In fact, on average the EMDs only have about 15% as many links as the other ranking sites.
One stand-out example is pharmaceutical-jobs.com, which is ranking on page one for (of course) "pharmaceutical jobs." It only has about two dozen external domains linking to the entire site. The other top ranking results typically have many hundreds or thousands of domains linking to them. Clearly Google is highly favoring the EMD in this case.
Here's a breakdown of the number of links from unique external domains ranking the non-EMD sites versus the EMD sites:
|Market||Avg Links||Avg EMD Links|
|home based business||156||225||69% more|
|work from home||204||85||42%|
|web design||435||475||92% more|
|food and beverage||430||40||9%|
|email hosting||143||278||51% more|
Some Non-EMD Sites Are Also Getting Away With It
The data also shows other sites with very high EMA values getting a pass from Google even if they don't have an exact-match domain name and aren't a brand. Why Google is giving those sites a pass isn't clear. For example, addonchat.com is ranking for "chat software" with a 60% EMA, and plimun.com is ranking for "web design" with a whopping 98% EMA. If I figure out why Google is letting these sites get away with that, I'll definitely be blogging about that, too.
So what can you take away from all of this data and these numbers? In short, exact-match domain names are your friend! They can be ranked with a lot fewer links and apparently have a much better chance of not getting penalized for anchor-text over-optimization. This includes the "lesser" domains (.info, .biz, etc.), as well as domain names with the keywords separated by dashes (e.g. work-from-home.biz).
Also, if you're not using EMDs, it's important to diversify your anchor text a lot. How much is "a lot" really depends on your market. So do the research. Check out the link profiles of other ranking sites in your market and see what their anchor text looks like.
If you have any questions or comments, or would like to suggest other post-Penguin ranking factors for me to dig into in a blog post, please leave a comment below. Your feedback is always welcome!
Oh, and one last note: if you found this post beneficial, please share it using one of the buttons below: