Exporting a Scikit Learn Random Forest for use on Hadoop Platform
You could use Py2PMML to export the model to PMML and then evaluate it on Hadoop using JPMML-Cascading. JPMML is open source but Py2PMML from Zementis seems to be a commercial product. Besides this alternative there are no other tools to score Scikit models exported as PMML on Java/Hadoop. The core scikit team is planning to implement a PMML exporter though. But if you don't want any commercial solutions or wait for such tool to be implemented you still have some options but they require some coding:
- Adapt the SKLearn Compiled trees project so it generates Java/MapReduce code instead of C.
- Using the
export_graphviz
function obtain the DOT representation of each decision tree and write a small Java interpreter. - Forget about Java and Hadoop, use Apache Spark and evaluate each one of the decision trees in parallel using Python, Scikit and PySpark.
Hope it helps!