Spamming your own search results

I was faced with the recent challenge of adding a search facility to a medium sized directory website. I have written custom search algorithms before, and it's generally a game of cat and mouse of tweaking / testing to get the right amount of relevance. As soon as you change the balance in one area, search results get less relevant in other areas.
Invariably, as your search algorithm gets more complicated, it starts to get slow, especially when you start adding millions of records to your database.

This particular site is built with Jojo CMS, and I decided to use the built-in search facility (which I wrote about half of). This performs a search across the various plugins that are installed - the page plugin, the blog plugin and the directory plugin. The directory listings *are* the content for this site, and no doubt people paying for a listing want to see their site appear prominently in on-site search results.

SEO steps in

What I found was that we had SEO'ed the page content so well that it was outranking the directory listing. All the keyword-rich titles, headings and body content that had been added to the key pages was blocking the meat-and-potatoes content from ranking where it should be ranking.

It occurred to me that I have just spammed my own search engine.

I tweaked the search algo for a couple of hours, and then began to wonder why this particular job was proving so difficult.

The benchmark

It's not that my search results were bad or irrelevant, it's that the benchmark has been set so high. We are used to searching for something in Google / Yahoo / MSN and getting high quality search results.
When we have to use an on-site search that has been programmed by a regular web developer (as opposed to a large team of search experts with massive resources available), I guess the search results can seem a little disappointing.

This is the basis of the argument for including Google results on your website, rather than customizing.

Google vs Home-bake

I can't write a search algo as good as Google. Google can't index my site as deep or as often as I would like.

I'm left with the choice of customized results that aren't as relevant as what Google can produce, or Google results that don't include the latest content.

For this website, we will likely use the custom results for a couple of months until we are happy Google has indexed the site properly. Unfortunately, the site has very little link power / authority so we don't expect Google to make indexing this site a top priority.

There are third party crawlers out there who can spider your site more regularly and produce search results for you. I'm yet to try one of these solutions, but they may be the right balance of relevance vs completeness that I'm looking for.
Digg StumbleUpon technorati blinklist furl reddit sphinn

Tags: spamsearch resultssearchresults