Egg on my face about Alexa stats

UPDATE MARCH 13: Well, it happens to every blogger sometime. Yesterday, when I posted this article (originally titled “Alexa’s ‘reach’ stats: More like ‘stretch'”) I got something pretty significant wrong. I was misinterpreting what the Alexa graphs actually communicate. Mea culpa, and apologies to Alexa and my readers. If you read the article below, don’t miss the comments thread, where a couple of my readers kindly clarified my error. Well, fortunately I view mistakes as an important part of learning. 🙂

Here is the original article…


Over the past few months I’ve seen articles, postings, and discussions concerning various aspects of online media tout site statistics offered by Alexa.

I’ve gotta tell you: I think something’s really wacky with Alexa’s stats – especially their “reach” benchmark.

Check this out. Here’s the Alexa “daily reach” results for Contentious over the past few months:

Alexa stats: Contentious daily reach as of March 12, 2006

OK, so according to Alexa, this humble little weblog you’re reading right now has recently had a “daily reach” as high as 58% of web users!!!

That’s flattering, but let’s get real. I see my server logs. There is no way I am reaching that many people. I do well with traffic to this blog, but Alexa’s figure is in the realm of utter fantasy. And it’s not even my fantasy!

So if you see people citing or touting Alexa statistics, here are some things to bear in mind…

Here’s some information from Alexa about how they calculate their “reach” stats:

What is Reach?

Reach measures the number of users. Reach is typically expressed as the percentage of all Internet users who visit a given site. So, for example, if a site like yahoo.com has a reach of 28%, this means that if you took random samples of one million Internet users, you would on average find that 280,000 of them visit yahoo.com. Alexa expresses reach as number of users per million.

OK, but how do they know that, say, 58% of web users visit Contentious daily?

Here’s the rub: Alexa’s data comes only from people who use the Alexa browser toolbar. Compared to the overall population of web users, that’s a pretty small and skewed sample.

All the way at the bottom of their page “About Alexa traffic rankings” you’ll find this disclaimer:

Some Important Disclaimers

The traffic data are based on the set of Alexa users, which may not be a representative sample of the global Internet population. Known biases include (but are likely not limited to) the following:

  • Our users are disproportionately likely to visit alexa.com, amazon.com and archive.org, and traffic to these sites may be substantially overcounted.

  • The Alexa Toolbar works only with the Internet Explorer browser. Sites frequented mainly by users of other browsers will be undercounted. For example, the AOL/Netscape browser is not supported, which means that Alexa collects little data from AOL users, and our traffic to aol.com is likely lower than it would be for a more representative sample.

  • The Alexa Toolbar works only on Windows operating systems. Although a large majority of the Internet population currently used Windows, traffic to any sites which are disproportionately visited by users of other operating systems will be undercounted.

  • The rate of of adoption of Alexa software in different parts of the world may vary widely due to advertising locality, language, and other geographic and cultural factors. For example, to some extent the prominence of Korean sites among our top-ranked sites reflects known high rates of general Internet usage in South Korea, but there may also be a disproportionate number of Korean Alexa users.

  • In some cases traffic data may also be adversely affected by our “site” definitions. With tens of millions of hosts on the Internet, our automated procedures for determining which hosts are serving the “same” content may be incorrect and/or out-of-date. Similarly, the determinations of domains and home pages may not always be accurate. When these determinations change (as they do periodically), there may be sudden artificial changes in the Alexa traffic rankings for some sites as a consequence.

  • The Alexa Toolbar turns itself off on secure pages (https:). Sites with secure page views will be under-represented in the Alexa traffic data.

In addition to the biases above, the Alexa user base is only a sample of the Internet population, and sites with relatively low traffic will not be accurately ranked by Alexa due to the statistical limitations of the sample. Alexa’s data come from a large sample of several million Alexa Toolbar users; however, this is not large enough to accurately determine the rankings of sites with fewer than roughly 1,000 total monthly visitors. Generally, Traffic Rankings of 100,000+ should be regarded as not reliable because the amount of data we receive is not statistically significant. Conversely, the more traffic a site receives (the closer it gets to the number 1 position), the more reliable its Traffic Ranking becomes.

So I’m gratified that apparently 58% of Internet Explorer users who access the net via computers running the Windows operating system, and who also have the Alexa toolbar installed, check out this weblog. But still, doesn’t even that sound rather unrealistic to you, even with all those caveats? It does to me.

The bottom line is, you don’t really know what’s going on with a site’s traffic unless you have access to the server logs. So if you’re citing traffic statistics drawn from any other source (and especially if you don’t get to see the source logs or log reports yourself), assume some level of potential uncertainty or inaccuracy.

More importantly, communicate that uncertainty to your audience. Let them know what they’re really getting.

And when you see something as obviously wacky or skewed as Alexa’s reach stats, don’t even bother. The graphs may look temptingly definite, but ultimately they can undermine your credibility if you appear to accept them uncritically.

7 thoughts on Egg on my face about Alexa stats

  1. thanks for the info amy. i have to admit, i was pretty surprised when i looked at my stats on that site. it’s clearer now why they are the way they are.

  2. The graph says reach per million users not all internet users.So, for every million users, about 58 will see your site. That sounds small, but its probably reasonable

  3. Amy, You have a reach of 58 out of 1m Alexa bar users–not 58 percent of all web users… It would correlate to about 0.000058 percent of all web users if Alexa were a typical sample–so your blog remains humble (and so does mine 🙂

  4. Ok, Ok, I’ve got egg on my face here. I was thrown by this line: “Reach is typically expressed as the percentage of all Internet users who visit a given site.”

    58 per million would be reasonable for my blog — maybe even a bit understated.

    I’ll update the header of this posting to issue a correction.

  5. Amy, your analysis of the data is not entirely correct. I have posted deeper analysis here if you are interested.

    Thanks for digging into this topic. I have been seeing more and more people using Alexa graphs to justify some claim or another and have been skeptical. Now I know why.

  6. I found your blog doing some research for a book. Interesting blog chock full of information.

    Alexa has one other caveat that they fail to mention. The look at the domain with all its possible subdomains as one site. This simply means that Yahoo.com is #1 because of mail., finance., stores., shopping., games., etc.

    We can ask the one unasked question, why isn’t Google #1 if they are the #1 search engine and all the services they offer?

  7. The graph says reach per million users not all internet users.So, for every million users, about 58 will see your site. That sounds small, but its probably reasonable?

    Not always.

Leave a Reply

Your email address will not be published. Required fields are marked *