Spamming your own search results

I was faced with the recent challenge of adding a search facility to a medium sized directory website. I have written custom search algorithms before, and it's generally a game of cat and mouse of tweaking / testing to get the right amount of relevance. As soon as you change the balance in one area, search results get less relevant in other areas.
Invariably, as your search algorithm gets more complicated, it starts to get slow, especially when you start adding millions of records to your database.

This particular site is built with Jojo CMS, and I decided to use the built-in search facility (which I wrote about half of). This performs a search across the various plugins that are installed - the page plugin, the blog plugin and the directory plugin. The directory listings *are* the content for this site, and no doubt people paying for a listing want to see their site appear prominently in on-site search results.

SEO steps in

What I found was that we had SEO'ed the page content so well that it was outranking the directory listing. All the keyword-rich titles, headings and body content that had been added to the key pages was blocking the meat-and-potatoes content from ranking where it should be ranking.

It occurred to me that I have just spammed my own search engine.

I tweaked the search algo for a couple of hours, and then began to wonder why this particular job was proving so difficult.

The benchmark

It's not that my search results were bad or irrelevant, it's that the benchmark has been set so high. We are used to searching for something in Google / Yahoo / MSN and getting high quality search results.
When we have to use an on-site search that has been programmed by a regular web developer (as opposed to a large team of search experts with massive resources available), I guess the search results can seem a little disappointing.

This is the basis of the argument for including Google results on your website, rather than customizing.

Google vs Home-bake

I can't write a search algo as good as Google. Google can't index my site as deep or as often as I would like.

I'm left with the choice of customized results that aren't as relevant as what Google can produce, or Google results that don't include the latest content.

For this website, we will likely use the custom results for a couple of months until we are happy Google has indexed the site properly. Unfortunately, the site has very little link power / authority so we don't expect Google to make indexing this site a top priority.

There are third party crawlers out there who can spider your site more regularly and produce search results for you. I'm yet to try one of these solutions, but they may be the right balance of relevance vs completeness that I'm looking for.
Digg StumbleUpon del.icio.us technorati blinklist furl reddit sphinn

Tags: spam search results search results

2 Comments

- Nov 3, 2007

I'm not really sure what you were getting at here. Since the Title was spamming your own search engine, I was expecting a more relevant topic. Instead you seemed to Blab on and On about how your were getting your site rank.

Just as you stated before, your titles need to be relevant to the content of the page, and for some reason If I searched for Spamming Search Engines... This wouldn't have anything to do with what I was looking for.

- Nov 3, 2007

The point I was trying to illustrate was that I wrote a search engine, then optimised some of the content on the site for google, then found that I had accidentally made my own search results irrelevant because some pages were that much more keyword rich.

I think the title sums up the content appropriately, but open to suggestions on something better :)


Post Comment

Post Comment

*
*


Visual CAPTCHA

*
Code is not case-sensitive
*

We welcome comments on this article, provided they have something to contribute. Please note that all links will be created using the nofollow attribute. This is a spam free zone. HTML is stripped from comments, but BBCode is allowed.