Setting up Nutch – troubleshooting
It’s pretty easy downloading the latest version of Nutch but I had a few issues getting it set up on my Red Hat server; it’s pretty easy really but there are a couple of gotchas along the way and it doesn’t work exactly as specified in the tutorial.
- Download & install Java – super simple: yum install java
- wget & unzip the latest version of Nutch
- I had issues with the JAVA_HOME environment when trying to follow the example onĀ crawling a website. The error I got was “/usr/loca/jdk/bin/java: No such file or directory” – the problem here was twofold: 1) I didn’t have java set in my environment variables, and 2) around line 118 in the bin/nutch file there’s a reference to $JAVA_HOME/bin/java – the bold part of which seems unnecessary and should be deleted
