accessibility

Setting up Nutch – troubleshooting

It’s pretty easy downloading the latest version of Nutch but I had a few issues getting it set up on my Red Hat server; it’s pretty easy really but there are a couple of gotchas along the way and it doesn’t work exactly as specified in the tutorial.

  1. Download & install Java – super simple: yum install java
  2. wget & unzip the latest version of Nutch
  3. I had issues with the JAVA_HOME environment when trying to follow the example onĀ  crawling a website. The error I got was “/usr/loca/jdk/bin/java: No such file or directory” – the problem here was twofold: 1) I didn’t have java set in my environment variables, and 2) around line 118 in the bin/nutch file there’s a reference to $JAVA_HOME/bin/java – the bold part of which seems unnecessary and should be deleted

 

Report a broken link

Just found a “Page not found” error on the RNIB website – why don’t more 404 pages have this “report a broken link” feature? It shows users you care and gives developers useful information:

Oh and it probably wouldn’t hurt your link profile either to find out & fix these broken links quickly and as a matter of process.

 Scroll to top