I mainly followed the instructions in the Apache Crunch getting started guide but had to make a few tweaks to get the example to work with a version of CDH.
I first added a reference to the Cloudera repository in the pom.xml file:
And then changed the dependencies to Cloudera ones compatible with the version of Hadoop we use:
Without these changes, the example job (replace hadoop-job with crunch to run it) from the getting-started guide:
I first added a reference to the Cloudera repository in the pom.xml file:
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories>
And then changed the dependencies to Cloudera ones compatible with the version of Hadoop we use:
<dependency> <groupid>com.cloudera.cdk</groupid> <artifactid>crunch-core</artifactid> <version>0.6.0-cdh4.2.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-client</artifactid> <version>2.0.0-mr1-cdh4.1.0</version> <scope>provided</scope> </dependency>
Without these changes, the example job (replace hadoop-job with crunch to run it) from the getting-started guide:
hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>
was failing with this error:
Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected
Looking forward to writing my first crunch/scrunch job now!
2013/08/04 Update: Wasn't able to get scrunch to work. Kept getting the interface found error mentioned above.
Comments
Post a Comment