Running Crunch with CDH4

I mainly followed the instructions in the Apache Crunch getting started guide but had to make a few tweaks to get the example to work with a version of CDH.

I first added a reference to the Cloudera repository in the pom.xml file:

  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

And then changed the dependencies to  Cloudera ones compatible with the version of Hadoop we use:

 <dependency>
      <groupid>com.cloudera.cdk</groupid>
      <artifactid>crunch-core</artifactid>
      <version>0.6.0-cdh4.2.0</version>
</dependency>

<dependency>
      <groupid>org.apache.hadoop</groupid>
      <artifactid>hadoop-client</artifactid>
      <version>2.0.0-mr1-cdh4.1.0</version>
      <scope>provided</scope>
</dependency> 

Without these changes, the example job (replace hadoop-job with crunch to run it) from the getting-started guide:

hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

was failing with this error:

Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected

Looking forward to writing my first crunch/scrunch job now!
2013/08/04 Update: Wasn't able to get scrunch to work. Kept getting the interface found error mentioned above.

Comments