Oct 9, 2007
Can you afford to be in Yahoo? I'm not talking about the directory here, but the organic search results.Yesterday, our server was having some serious load problems because one of the sites was getting hit hard by the Yahoo Slurp spider. The site in question is 150,000 pages or so, and uses a lot of dynamically generated images.
Due to Yahoo running all over the site, the server was almost unresponsive to user requests, which is a fairly poor situation to be in.
It occured to me that Yahoo is probably the reason our bandwidth bill is so high each month (Bandwidth in New Zealand is very expensive, at NZ$10 per gigabyte). This would possibly be ok if Yahoo actually sent us any significant volume of traffic - on this particular site it accounts for 16% of all visitors, which is actually much higher than normal.
I'm loathed to use robots.txt to block 16% of the traffic to the site, but it's costing enough for me to want to get out my calculator and go through the same conversion metrics that I apply to Adwords. If a particular Adwords campaign isn't producing the conversions, the responsible thing to do is to axe it and spend the money on something else instead.
Yahoo is now going to need to provide value for money in terms of conversions vs the amount spent on bandwidth each month, just like other mediums. If it can't provide a satisfactory conversion ratio, I'll be blocking the hungry spider from eating all my bandwidth.
Apparently, this is the code to use for the job - place it in your robots.txt file
Disallow: /
User-agent: Yahoo-MMCrawler
Disallow: /
If you do some searching for this problem, you will find a lot of other angry punters out there with Yahoo bandwidth problems. The general word on the street is that Yahoo doesn't do a good job of following crawl-delay instructions in robots.txt, which would have been my preferred option.
Perhaps it's worth checking how much bandwidth Yahoo is using on your large websites?
2 Comments
Not really sure if this would help but if you add in the Head of your site a visit Schedule for search engines. An example would be...
<meta name="revisit-after" content="30 days">
Since Google doesn't follow these tags that won't hurt there.
Adding this Disallow: / Code would Drop all of your Indexing from Yahoo which would cut away a ton of traffic in the future, especially on newly added pages.
Use this Following Structure so that Yahoo isn't indexing to often.
User-Agent: Yahoo-MMCrawler
Crawl-Delay: 10
User-Agent: Slurp
Crawl-Delay: 10
Obviously you can't make yahoo go any faster but you can slow them down. so adjust your delay to 30 if you want them to visit 30 days or More apart.
This way atleast you will still get indexed and not loose a ton of traffic from the yahoo index on new pages.



















Eric - Oct 10, 2007
Yahoo's spiders are absolutely vicious. I have to deal with their greedy CafeKelsa spider pretty much once a quarter. It gets to the point where you have PHP do an "exit" if the user-agent is Yahoo's non-main spider. Sorry I don't have a solution, either - just know you're certainly not alone.