seo

Less Google keyword data = more black hat activity

This post purely reflects my own personal opinion, and does not reflect those of my employer or colleagues. For those faint of heart, warning: contains traces of black hat material.

So Google has taken our keyword data away, probably for good. What next for SEO?

I think we have to assume this rollout will eventually happen everywhere, for all users. Why? What’s in it for Google?

  1. Less competition – the data provided on search by firms such as Experian Hitwise, Comscore, Quantcast becomes far less valuable, meaning the only media company that can authoritatively provide keyword data is, you guessed it, Google.
  2. Less spam – from a purely objective point of view, I do think this will result in much less spam. Sites such as Mahalo, Experts Exchange, etc will all suffer as the pages generated purely based on search volume will die a death, and Google’s results will get better as a result.
  3. Less SEOs – I don’t think it will kill the industry, but measurement and keyword research becomes a lot harder. The focus will probably get more technical on-site and more social off-site. Google’s not known for its love of the SEO industry so this is probably a nice side benefit for them.

It’s easy to despair and mourn the death of SEO (again) but as long as organic results exist that’s pap.

Here are a few serious/non-serious ideas for workarounds:

1. Deep Content Analysis

Reports on content need to get A LOT better. For example tying up content and technical changes with organic traffic to those pages will become a lot more important in order to judge the success of content strategies.

2. Resurgence of rank tracking (for now)

With less keyword data, SEOs are forced back into the corner of going back to old-school rank tracking in order to measure success. This is a very sad thing considering ethical SEOs have been pushing keyword traffic as a key performance metric for a very long time now, and rankings were starting to become less important.

However I think relying too much on this may well leave SEOs in for a further shock down the line, as search results will probably start looking more like a JavaScript-based web app and less like traditional web pages, making them a lot harder/impossible to parse with traditional screen scraping tools (although maybe experts in Selenium could knock something up).

3. Don a black hat

None of these black hat methods are recommended; however are an indication of the direction certain people may choose to go in as a result of Google’s announcement.

Through hacking into a user’s browsing history (possible, though only for a pre-defined set of URLs), or other such nefarious/illegal activity, it’s probably still pretty easy to figure out referring keywords, and send the data in a manipulated request to Google Analytics (though likely to be against GA ToS).

3a. Develop a popular browser plugin

You may be able to regain some keyword data through developing a browser plugin that tracks a user’s browsing behaviour, much as Google did with their toolbar. Obviously to stay on the legal side of the fence you’d need to hide this as well as you can in the plugin’s T&Cs. Much as Google did with their toolbar.

3b. Develop & spread spyware

Clearly straddling the blackhat/illegal arena, tracking browsing activity through spyware is child’s play.

3c. Develop & spread a virus/trojan/worm

Again, frankly illegal, but another easy way to track the keywords that users are searching for.

4. Develop your own search engine(s)

Many in our industry assume Google has an unassailable position as a search engine. This may or may not be true (remember Altavista?), however that doesn’t mean there isn’t room for more search engines on the web. Given the sheer number of open source search engines available all you really have to do is choose one, and customise it for your chosen niche, add some nice features and then start tracking user behaviour.

5. Only allow Google to index your home page

Then put a big search box on your home page and track users’ search behaviour. Clearly has certain drawbacks and would only work if you had an absolute ton of branded searches :)

6. Start using a different search engine

I Still Haven't Found What I'm Looking For - Because I use BingA small protest perhaps, but an important one. Webmasters have dived in to Google products as they’ve given us a lot of great products for free. There’s far more in the way of good, free (or cheap) alternatives today to services such as Google Analytics than there were 5 years ago. Personally I’ve set Bing as my default search engine and am relatively happy so far (though the pangs of “I’ll just double check that on Google” have not gone away.)

So that’s just a few collected thoughts, please share your thoughts & ideas below!

Further Reading:

Dynamically fetch web page contents in Excel

Excel’s built-in web features are pretty frustrating when you want to do more with the web than import a static HTML table to a predefined set of cells.

I’ve often wanted to be able to update the contents of a cell based on dynamic parameters passed into a URL, and not found a decent, easy way of doing this. The official Office website shows you how to do this the Microsoft way, but lo and behold that doesn’t actually translate to real-world uses very well.

Say for example you want to fill a column of cells with the ranking for a given list of keywords, a function similar to that shown below (where the URL could potentially be defined in the E column) would be very useful:

Provided you had an API into ranking data (see Perl code below), this should be an easy operation, but this doesn’t seem to be something Excel does out of the box.

I’m sure there are tons of xll’s and paid for solutions out there to do the same thing, but I want something that’s flexible, and ideally free. Hence the user-defined function below can replace Excel’s awkward built-in parameter handling quite easily and for lots of different uses:

<br />
 &#8216; Name the function anything you like &#8211; the variable parameters to pass in this instance are keyword, domain, number of results<br />
 Public Function getURL(kw As String, domain As String, num As Integer)</p>
<p>&#8216; Build the URL from the parameters passed in<br />
 URL = &quot;http://www.example.com/api/script.pl?kw=&quot; &amp; kw &amp; &quot;&amp;domain=&quot; &amp; domain &amp; &quot;&amp;num=&quot; &amp; num</p>
<p>&#8216; Uncomment to debug the request URL<br />
 &#8216;Debug.Print &quot;Request URL: &quot; &amp; URL</p>
<p>Dim objhttp As Object</p>
<p>Set objhttp = CreateObject(&quot;MSXML2.ServerXMLHTTP.6.0&quot;)</p>
<p>objhttp.Open &quot;GET&quot;, URL, False<br />
 objhttp.setRequestHeader &quot;Content-type&quot;, &quot;application/x-www-form-urlencoded&quot;<br />
 &#8216; If your URL uses basic HTTP authentication (like mine did), uncomment the line below and replace the contents of [] with your base 64 encoded credentials<br />
 &#8216;objhttp.setRequestHeader &quot;Authorization&quot;, &quot;Basic &quot; &amp; &quot;[base64_encode[user:pass]]&quot;<br />
 objhttp.send (&quot;&quot;)</p>
<p>Dim Response As String<br />
 Response = objhttp.responseText</p>
<p>&#8216; To debug the output into Excel&#8217;s error console, uncomment the line below<br />
 &#8216;Debug.Print &quot;GA data feed response: &quot; Response</p>
<p>&#8216; Load contents of URL into the cell that&#8217;s called the function<br />
 getURL = Response</p>
<p>End Function<br />
 

I’m no VBA whiz or anything so I’m sure the above code could be tweaked vastly to make it faster, more robust, and more flexible for variable numbers of query, different query types etc (open to suggestions :) ), but hey, it’s a quick & easy way to solve a problem.

The Perl code: as mentioned above, the VBA code was written specifically to grab ranking data from an API. Hence there’s no fancy XPath expressions in the VBA, or even any attempt to parse the output. This is another potential improvement, but I try and keep Excel’s interactions with the web & text processing to an absolute minimum because Excel + teh internets = sloooow. In this instance, the script output is literally just the numeric data, so any processing is done by the faster Perl code.

The code below relies on a custom scraping library (SearchMarketing::Crawl::GoogleNatural) – I’ll leave that code up to you to re-create ;)

It’ll only check the top 100 results but for most purposes that’s more than enough. If there is more than one result for the given domain, it’ll print the numbers separated by an ampersand and can then be manipulated in Excel if necessary (I’ll just about trust Excel for that…)

<br />
#!/usr/bin/perl -wT<br />
use strict;<br />
use WWW::Mechanize;<br />
use SearchMarketing::Crawl::GoogleNatural;<br />
use CGI qw(:standard);</p>
<p>my $kw          = param(&quot;kw&quot;);<br />
my $num_results = param(&quot;num&quot;);<br />
my $domain      = param(&quot;domain&quot;);</p>
<p>print header;</p>
<p>getPos($kw,$num_results,$domain);</p>
<p>sub getPos {</p>
<p>  my $kw          = $_[0];<br />
  my $num_results = $_[1];<br />
  my $domain      = $_[2];</p>
<p>  my %results = googleUK($kw, $num_results);<br />
  my $count = 0;<br />
  foreach my $position ( keys %results ) {<br />
    my $title    = $results{$position}[0];<br />
    my $snippet  = $results{$position}[1];<br />
    my $dest_url = $results{$position}[2];<br />
    my $disp_url = $results{$position}[3];</p>
<p>    if ($dest_url =~ m{^https?://[^/]*$domain/?.*$}i) {</p>
<p>      if ($count &gt; 0) {<br />
        print &quot;&amp;&quot;;<br />
      }<br />
      print $position;<br />
      $count++;<br />
    }<br />
  }<br />
  if ($count == 0) {<br />
    print &quot;N/A&quot;;<br />
  }<br />
}<br />

Why code?

I wish I had a pound for every time another SEO told me they want to learn a programming language. It seems most SEOs are sure they want to learn PHP, Python or another programming language, but when asked the question “to what end?” the answers generally become less clear.

Because of this I think the following is the reason why a lot of SEOs never end up taking that step:

Let’s face it: teaching yourself a new language is never easy, and it becomes much harder if you don’t actually know why you’re doing it. For this reason most people get frustrated and give up before they hit the red line above and get any significant payoff for investing the time in learning a new language.

When I taught myself Perl it wasn’t directly to do with SEO – in my first job I spent a whole day each week manually editing an HTML newsletter template in Notepad++. I hated it so much I figured there was probably a better way do it so I bought an O’Reilly book, got up 2 hours early every day until I knew the basics and could build a tool to generate the HTML for me. That saved me 5 hours a week of boring tasks and got me nicely into the payoff zone.

What’s your imperative?

The Real Link Wheel

Link development doesn’t exist in a silo. Sometimes I think the following gets lost in translation in the client/agency relationship:

Both us agencies and the SEO industry as a whole are guilty of propagating the perception you can do really great link development without the support of a good website to support it by offering separate link building/development packages.

In this context I’m thinking of link “development” as being distinct from link “building”; the former is about developing relationships, getting gold standard links, the latter about building links for the sake of short term rankings or hitting an arbitrary target of quantity of links.

Some people get confused by cliches such as “content is king” and “if you build it well, they will come” (interpreting that to mean they don’t need to engage in SEO), because they haven’t got all three of these elements working together in harmony.

If all three are truly in harmony (arguably very rare without SEO advice), perhaps you don’t need SEO for your company?

Valid HTML is bad SEO

I’ve been amazed at how many SEO firms I’ve seen recently touting “W3C valided HTML” as one of their core SEO recommendations – a fundamental misconception I thought had disappeared years ago.

It’s an easy mistake to make, but linking it directly with SEO is plain wrong, bad advice for SEOs to be giving to their clients, and gives our industry a bad name. Why is it bad advice? Consider the cost to an enterprise client such as Amazon (1,451 home page errors) of implementing W3C validated HTML throughout their website, and then consider the following:

Search engines index the majority of their content by parsing HTML files, so there is a link between parseable code and efficient indexing (and therefore ranking). However, parseable HTML is not the same as W3C validated HTML. This is an important distinction, illustrated in the examples below:

<br> <– Not valid in xHTML 1.0 Transitional, easily parseable

<a href=”http://www.example.com/” Link to example.com></a> <– Not valid HTML, easily parseable, will not pass any anchor text, you won’t see it in your browser

<META name=”keywords” content=”useless tag” /> <– Fully parseable, invalid HTML 4.01, invalid xHTML 1.0

<p <a href={example.com}> <h1 Cheap flights</p> <– Completely unparseable, completely invalid, you won’t even see the text in your browser.

From the examples above, a general rule of thumb might be, if you can see it in a text browser (such as Lynx), it’s more than likely it can be parsed by search engines, regardless of the HTML’s W3C validity.

Another reason why W3C validated HTML is not an SEO recommendation is shown in the code example below:

<p>Cheap Flights</p>
<h1>We fly to destinations across the globe from London to New York, and offer the best service from check-in to your destination</h1>

Now that’s clearly valid HTML, but what SEO would say that’s a good, optimised snippet of HTML? The W3C validator tool cannot check for semantic validity, which is far more important for SEO.

I knocked up a script to prove this to those who still aren’t convinced. The charts below show the number of HTML errors in the W3C validator on the y-axis, with the Google natural positions along the x-axis. Read More…

1 2  Scroll to top