Workflow tool comaparison: Oozie Vs Cascading Workflow tool comaparison: Oozie Vs Cascading hadoop hadoop

Workflow tool comaparison: Oozie Vs Cascading


Cascading and Oozie are not in the same category.

Oozie is a workflow scheduler.

Cascading is an API for creating workflows. It is agnostic about schedulers, i.e., it should run with whatever scheduler system that you use.

There is perhaps some confusion because the Oozie docs mention a "DAG", and both run atop Hadoop.

Also, Cascading has a notion of "data availability" in the checkpoint support, which is supported in Oozie, albeit differently.


Personally i play around with both to some extend, what i found interesting with cascading is

1)concise and expressive in terms of simple keywords like flow,tap,pipe etc.,

2)amazing TDD based approach for local development and research

3)nice planner view(.dot file) and will be useful once the project is grown, so maintenance is ease.

4)DSL based approach using groovy,scala,cloujre. so no need to worry about learning any new language or rather hadoop.

5)simple cloud deployment(e.g. amazon support as raw jar deployment).

6)you can call anything like existing pig or hive or pure other MR jar as long as they expose java api.

7)amazing for ML and NLP related works.