Blocking abusive spider from AllResearch
Checking my blog’s visitor statistics, I noticed one particular IP address standing out from the crowd. It accumulated more hits than second, third and fourth top visitors together, and ate up an impressive amount of traffic. A brief check revealed that the IP address resolves to rss.allresearch.com.
Who is AllResearch? It is a company that provides statistics for the web to its paying customers. One of its specialties is to find out how many people are talking about certain brands or products online and report these to their customers. Mention Coca-Cola in your blog and you’ll be a part of AllResearch’s statistics sent to the company. Another activity is to provide “clippings” from various blog touching a certain keyword. Your post on Coca-Cola will be fed to customers who want to read what’s new on the web about this keyword. This is achieved through a subscription-based site, webclipping.com — another product of AllResearch. Basically, your blog is the source of revenue for AllResearch.
In order to pull this information, AllResearch sends “spiders” (automatic programs) to index the millions of blogs out there and find new information they could use. Google, Yahoo and MSN do this as well, with one difference: AllResearch’s spiders flood your blog with requests, as often as once per hour. Search engines’ spiders are more considerate and more intelligent, they only check your site for changes and index it when there’s new information. AllResearch disregards this and re-reads on each visit everything you’ve posted. This makes your site’s traffic skyrocket, and you’re most likely paying for the extra traffic.
In short, AllResearch piggybacks the blogosphere to make a nice buck, with a complete indifference to limiting the unnecessary load they put on others’ sites. This, in my opinion, is abusive, and I’m not the only one noticing it. Daniel Bowen at GeekRant went through the same experience in February 2005. Larry Snider had a conversation with AllResearch in December 2004. Google AllResearch and find out more.
So today I blocked their spider’s IP from being able to access my blog. This is simply done with a “deny” rule added to the .htaccess file of my site. My web host’s Apache server checks this file for each visit and, from now on, will not respond to any requests from the IP I’ve blacklisted.
This is the second time I’ve encountered abusive behaviour towards my blog. The first time was some time ago, when someone installed Google Desktop Search on their computer, and that application was indexing my blog even more often than once an hour. My site’s traffic exceeded in a few days the total traffic in several months. I’m not the kind of guy who will tolerate such an abuse, even of an automated computer system.
Farewell, AllResearch, you’re not going to extend your business through my blog’s content any longer. May you and the disgusting likes of you miserably disappear in oblivion.



