Chaining Jobs in Scalding

Not sure what the use case for this would be. I mean, you can always use oozie to do this. But if you want to call another scalding job directly from inside a scalding job, you just need to override the next method in your job (which presumably extends the scalding Job class).

override def next : Option[Job] = Some(Job("com.my.company.WordsCount", args))

The first argument is the fully qualified class name of the job you wish to call. The second argument is the arguments passed to the job, which can always be modified before calling the second job.

The really cool thing to do here would be to run some graph algorithms that require running the same job recursively till some convergence condition is reached. There is an example of doing PageRank in the scalding source code which illustrates this.

Comments