Google crawling of flash content "improved"

Google have recently announced that they have improved their algorithm for crawling flash content on websites.

I'd expect this would be a welcome change for many web developers, but I'm going to hold off on the excitement for a few reasons.

Flash as a page element

I have always supported using flash as a page element, rather than as the 'meat and potatoes' of a website. The reasoning behind this is simple - flash breaks browser functionality, and that will piss off all but your most technically-illiterate users.

Flash will do all kinds of nasty things when you use it for the main content.
  • No copy-paste of text (this isn't always the case)
  • No right-click 'save target as' for images
  • No 'open this link in new tab / window'
  • No statusbar text showing where a link will lead
  • No title text on links unless specifically put there by the author
  • No standard browser scrollbar mechanisms, often no mousewheel support either

This all seems really bad, and it is, but most of these things aren't an issue when you use flash as a page element - a header banner, an advertisement, an image rotation or an interactive page element - all great stuff. Make sure your body content is in regular HTML, and also the main navigation, and then spice up the non-content areas of the page with some flash.

For accessibility reasons, it's nice to offer non-flash content to browsers without flash. And with swfobject, this process is very easy. Rather than telling the non-flash user to run off and get flash before they can view your site, you show them a static image instead of an animated banner, and the user can still enjoy the site without even realising they are missing out on something.

Which brings me to the main point

So, now that Google can index flash files better, this doesn't mean we should start developing complete sites in flash. Nothing changes - in almost all cases, flash is still best used as a page element - for usability reasons and accessibility reasons.

Google indexing flash files better will be a very welcome change for those who already own a flash website and aren't enthused by their current indexing, I'm sure.

So as I'm writing this post...

...it seems I'm not alone. Have a read of the comments on Google's official blog, and you can see some reasonably good questions and points being raised.
  • Google, don't encourage people to start doing splash pages again.
  • Google, what about W3C, usability, accessibility (basically my objection above).
  • Google, we have been using flash to hide our low quality content from you, what now?
  • Google, we don't want the word "loading" to be all over our indexed content, and we don't want to replace plain text with images as you recommend.
  • Another user points out what happens when doing a Google search for loading filetype:swf
  • Lots of complaints about Google being vague on details (nothing new here).
  • One commenter asks, will this cause a black hat revolution in 1 pixel flash files jam packed with text content, now Google-readable?
  • Good developers have already made sure their sites have text-only representations of their flash content - how do they 'opt out' of this new indexing?
  • Can we expect a big shift in search results as all these flash sites start appearing in the index?
  • Some good questions regarding content in XML - XML is meaningless when it's read out of context - the application (eg flash) needs to import the data before it becomes meaningful. Google says they don't import XML content and will index that separately (meaning it will be indexed without context)

So while this seems like a welcome addition to the feature set of Googlebot, it really does open up a large can of worms. No doubt, indexing SWF files is hugely complicated, and this is early stages - but Google's inability to determine 'content' text from 'fluff' text in the SWF is a big issue, and could definitely contaminate search results and cause headaches for developers. Combined with the dumbed-down explanation of how Google is indexing flash, developers are left with more questions than answers.

Best practice

Best practice web development for flash websites means providing alternative content to non-flash users, which in the past includes Google. Usually this is done using swfobject, or a similar mechanism.

Has this change just given developers one less reason to 'do things properly'? I can see developers getting to the end of the project and saying "Argh, now that Google can index flash files, there's no need to provide a plain text alternative".

Robots.txt

If you want things to stay the same, and don't want the search results for your website's search index polluted with flash listings of questionable quality, now might be a good time to use robots.txt to block Googlebot from your SWF files.
Digg StumbleUpon del.icio.us technorati blinklist furl reddit sphinn

Tags: GooglegooglebotFlash SEO