seo

Valid HTML is bad SEO

I’ve been amazed at how many SEO firms I’ve seen recently touting “W3C valided HTML” as one of their core SEO recommendations – a fundamental misconception I thought had disappeared years ago.

It’s an easy mistake to make, but linking it directly with SEO is plain wrong, bad advice for SEOs to be giving to their clients, and gives our industry a bad name. Why is it bad advice? Consider the cost to an enterprise client such as Amazon (1,451 home page errors) of implementing W3C validated HTML throughout their website, and then consider the following:

Search engines index the majority of their content by parsing HTML files, so there is a link between parseable code and efficient indexing (and therefore ranking). However, parseable HTML is not the same as W3C validated HTML. This is an important distinction, illustrated in the examples below:

<br> <– Not valid in xHTML 1.0 Transitional, easily parseable

<a href=”http://www.example.com/” Link to example.com></a> <– Not valid HTML, easily parseable, will not pass any anchor text, you won’t see it in your browser

<META name=”keywords” content=”useless tag” /> <– Fully parseable, invalid HTML 4.01, invalid xHTML 1.0

<p <a href={example.com}> <h1 Cheap flights</p> <– Completely unparseable, completely invalid, you won’t even see the text in your browser.

From the examples above, a general rule of thumb might be, if you can see it in a text browser (such as Lynx), it’s more than likely it can be parsed by search engines, regardless of the HTML’s W3C validity.

Another reason why W3C validated HTML is not an SEO recommendation is shown in the code example below:

<p>Cheap Flights</p>
<h1>We fly to destinations across the globe from London to New York, and offer the best service from check-in to your destination</h1>

Now that’s clearly valid HTML, but what SEO would say that’s a good, optimised snippet of HTML? The W3C validator tool cannot check for semantic validity, which is far more important for SEO.

I knocked up a script to prove this to those who still aren’t convinced. The charts below show the number of HTML errors in the W3C validator on the y-axis, with the Google natural positions along the x-axis. Read More…

SEO & Perl: An introduction

perl-camelIt’s generally accepted that it’s useful/invaluable for any SEO to know at least one server-side language. I chose Perl after playing around with PHP and finding it didn’t quite fulfill my requirements. Why Perl and not Ruby/Python/C? Perl is well established (22 years) and supported, has an active community, a great library of extensions (modules), and perhaps most importantly for SEO, was built for processing and reporting on textual data (‘Perl’ is Practical Extraction and Reporting Language.)

Perl has a (misguided) reputation for being difficult to learn and producing unreadable and ‘ugly’ code. In reality, as with any other programming language, the language never produces ugly code, the programmer does. However, perhaps Perl does lend itself to ugly code, simply because it’s so damn easy to hack together useful scripts quickly :)

So where’s the link with SEO? Well a few of the reasons below should help to explain:

  • HTML = text. Perl is great at processing text, and by extension, HTML. As search engine optimisers, we deal with HTML on a daily basis – want to see any web page’s meta tags, heading tags, alt text, etc? Stop messing around with the Web Developer toolbar and build a Perl script to do it for you ;)
  • Spidering. With the help of LWP and WWW::Mechanize, powerful spiders can be written with only a few lines of code. The benefits of this should be obvious; suffice to say that if you’ve written a spider or few, maybe you’ll get a slightly better understanding of what Google’s spiders may (or may not) be capable of. Beyond that the number of competitive intelligence tools that can be built around this are limitless.
  • Regular Expression support. Apologies if this is jargon, but regex support is second to none. It’s built into the programming language and is incredibly powerful (PHP borrows ‘Perl-compatible regular expressions’.) This makes processing complex web pages a very easy thing to do.
  • Interaction with the web. There are probably thousands of modules in the CPAN library that make interacting with the web on many levels very, very easy (and advanced.) Incidentally, modern Perl’s UTF-8 support is leagues ahead of PHP’s.
  • Database interaction. To me it’s more logical than PHP (using the DBI module), and has been said to be far more secure.
  • Building websites. I still mainly use PHP for building the front-end to my SEO tools, but that’s just laziness really. Perl has a great embeddable code support for building websites (I just haven’t learned it yet), and the CGI module is great. A lot of the back-end functionality (glue) is written in Perl because it’s quick, reliable, secure, and plays nicely with PHP.

So, where to start if you’re into SEO and want to learn Perl? My first Perl/programming book was Learning Perl (Win32 version actually) – a very smooth introduction to the language, and even good for complete n00bs to programming (I only knew HTML and a bit of PHP when I started on Perl.)

O’Reilly really has the best library of Perl books I’ve found – Spidering Hacks is the most directly relevant for SEO (a lot of the examples are out of date, but if you know a bit of Perl before you start there’s a lot of good ideas here.) The Perl Cookbook is also a fantastic resource.

1 2 Scroll to top