Monthly Archives: November 2011

Setting up Nutch – troubleshooting

It’s pretty easy downloading the latest version of Nutch but I had a few issues getting it set up on my Red Hat server; it’s pretty easy really but there are a couple of gotchas along the way and it doesn’t work exactly as specified in the tutorial.

  1. Download & install Java – super simple: yum install java
  2. wget & unzip the latest version of Nutch
  3. I had issues with the JAVA_HOME environment when trying to follow the example on  crawling a website. The error I got was “/usr/loca/jdk/bin/java: No such file or directory” – the problem here was twofold: 1) I didn’t have java set in my environment variables, and 2) around line 118 in the bin/nutch file there’s a reference to $JAVA_HOME/bin/java – the bold part of which seems unnecessary and should be deleted

 

Jaamit: an SEO Legend

The below is an internal email sent out today at OMD which I thought would be nice to share with the wider SEO community.

————

Today marks the day last year when we lost a highly valued colleague and friend on the OMD SEO team Jaamit Durrani. His dedication, passion, humour & intelligence he showed while he was with us has had a lasting effect on us personally, but also has been crucial to the SEO team’s growth from a very small team last year to a highly successful team of 15+ this year; a year where we’ve won our first ever major SEO-only client, scored 100% in client feedback and continue to build a better offering month on month.

As a small tribute, and hopefully an insight for those who weren’t lucky enough to meet or work with him, this week’s links highlight some of his best blog posts, as well as some of the tributes posted online:

1.    Seven sensational SEO tips for ecommerce sites | Econsultancy

econsultancy.com

For the first of my guest posts for Econsultancy I wanted to take a step beyond the generic, oft-rehashed ‘SEO tips’ (you know, things like “include keywords in your page titles” and “create great content”) and contribute something based on my experience. Guest post by Jaamit

2.    Nine common SEO campaign mistakes | Econsultancy

econsultancy.com

Running an ongoing SEO campaign is a lot like spinning plates. With so many factors in play in search engine algorithms, you really need to be aware of all of them at once to ensure a successful campaign. Guest post by Jaamit

3.    SEO Insight: Analysis and Rants on Search & Internet Marketing

web.archive.org

Jaamit’s blog is sadly no longer available (however we are looking into how it can be reinstated); this link from the Internet Archive luckily preserves most of his posts fully intact.

4.    SMX London – Top 10 Tips – By Jaamit Durrani – SEOgadget.co.uk

seogadget.co.uk

As an illustration of how open and sharing Jaamit was with his knowledge, this is a guest post he did for a competing agency’s blog :)

5.    PPC vs SEO Showdown: SES London 2010 Recap | Fresh Egg SEO Blog

www.freshegg.com

Before OMD, Jaamit kicked off his career in SEO with a small Brighton digital agency FreshEgg (owned by James Caan). Here’s one of his posts from their blog.

6.    Link building in real life – A practical guide to dominating the SERPS | Fresh Egg SEO Blog

www.freshegg.com

Coverage from FreshEgg of Jaamit’s first conference speaking engagement at Think Visibility in Leeds.

7.    Link Building in Real Life: Think Visibility 2010 Recap – SearchTalk | SearchTalk

searchtalk.co.uk

Coverage of the same talk from OMD’s Jamie Peach.

8.    Jaamit Tribute

explicitly.me

A touching tribute from one of Jaamit’s close friends in the industry.

9.    Jaamit Durrani Tribute – SearchTalk

searchtalk.co.uk

Another very touching tribute from Omnicom Head of Search Mark Mitchell, detailing the truly impressive impact Jaamit had in his short time at OMD.

10. @Jaamit – Twitter

twitter.com

Jaamit’s twitter account – it’s easy to see just how highly engaged he was with the SEO community and how willing he was to help out others whenever possible.

 

—–
Miss you mate.

Perl’s HTML::TreeBuilder::XPath is a great module for parsing HTML documents without regular expressions, however it returns text content by default, which is not always what you want when you’re doing advanced HTML processing. The documentation on CPAN doesn’t mention this, but if you want to get out the HTML content, just use “findnodes” and “->shift->as_HTML” in the way illustrated below:

my $value = $tree->findnodes(q{//div[@class='crumbs'})->shift->as_HTML
 Scroll to top