Running Crunch with CDH4

I mainly followed the instructions in the Apache Crunch getting started guide but had to make a few tweaks to get the example to work with a version of CDH.

I first added a reference to the Cloudera repository in the pom.xml file:


And then changed the dependencies to  Cloudera ones compatible with the version of Hadoop we use:



Without these changes, the example job (replace hadoop-job with crunch to run it) from the getting-started guide:

hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

was failing with this error:

Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected

Looking forward to writing my first crunch/scrunch job now!
2013/08/04 Update: Wasn't able to get scrunch to work. Kept getting the interface found error mentioned above.
