Alexa bot on the ball

A few days ago I wrote a quick and nasty script that emails me a dump of the PHP $_SERVER variables. The purpose was to find out some connection information from a client - for whatever reason his IP address wasn't being detected properly, and I needed some more information to work with.

I put this script online, tested it was working properly, and asked to client to visit a specific URL. About 2 days later, the script sent me an email, as expected. Except the deatils of the connection were for the Alexa crawler, not the client.

Obviously the Alexa widget of my Searchstatus Firefox extension (this extension is a must have for web developers) has phoned home to Alexa and let them know about this new page. Alexa has sent the bot out to crawl this new page.

Lessons learned

This whole experience isn't really at all remarkable, but I guess I was surprised at how quickly Alexa came through and crawled this URL. The take-home lesson is to always password protect those pages you don't want to be found by random crawlers.

So many developers put pages on their sites and don't link to or publish the URL anywhere - and then consider this "hidden" page to be safe. It's not.
Digg StumbleUpon del.icio.us technorati blinklist furl reddit sphinn

Tags: alexa

4 Comments

- Jul 10, 2007

Very good article u have released but i have some other techniques for improving the rank of the site.

- Aug 21, 2007

Wouldn't creating a directory and setting it aside specifically for testing/etc. be as useful if the directory was excluded from crawling by a .htaccess file or had the proper meta-tagging?

- Sep 4, 2007

A passworded directory would be ok for testing I think. I didn't quite bothered about this until now. Maybe I will when my sites are hacked ... dunno :)

- Mar 14, 2008

Such crawlers tends to multiply nowadays, you are right saying a hidden doesn't mean secured. Nice article


Post Comment

Post Comment

*
*


Visual CAPTCHA

*
Code is not case-sensitive
*

We welcome comments on this article, provided they have something to contribute. Please note that all links will be created using the nofollow attribute. This is a spam free zone. HTML is stripped from comments, but BBCode is allowed.