Export python scikit learn models into pmml Export python scikit learn models into pmml python python

Export python scikit learn models into pmml


SkLearn2PMML is

a thin wrapper around the JPMML-SkLearn command-line application. For a list of supported Scikit-Learn Estimator and Transformer types, please refer to the documentation of the JPMML-SkLearn project.

As @user1808924 notes, it supports Python 2.7 or 3.4+. It also requires Java 1.7+

Installed via: (requires git)

pip install git+https://github.com/jpmml/sklearn2pmml.git

Example of how export a classifier tree to PMML. First grow the tree:

# example tree & viz from http://scikit-learn.org/stable/modules/tree.htmlfrom sklearn import datasets, treeiris = datasets.load_iris()clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target)

There are two parts to an SkLearn2PMML conversion, an estimator (our clf) and a mapper (for preprocessing steps such as discretization or PCA). Our mapper is pretty basic, since we are not doing any transformations.

from sklearn_pandas import DataFrameMapperdefault_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])from sklearn2pmml import sklearn2pmmlsklearn2pmml(estimator=clf,              mapper=default_mapper,              pmml="D:/workspace/IrisClassificationTree.pmml")

It is possible (though not documented) to pass mapper=None, but you will see that the predictor names get lost (returning x1 not sepal length etc.).

Let's look at the .pmml file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">    <Header>        <Application name="JPMML-SkLearn" version="1.1.1"/>        <Timestamp>2016-09-26T19:21:43Z</Timestamp>    </Header>    <DataDictionary>        <DataField name="sepal length (cm)" optype="continuous" dataType="float"/>        <DataField name="sepal width (cm)" optype="continuous" dataType="float"/>        <DataField name="petal length (cm)" optype="continuous" dataType="float"/>        <DataField name="petal width (cm)" optype="continuous" dataType="float"/>        <DataField name="Species" optype="categorical" dataType="string">            <Value value="setosa"/>            <Value value="versicolor"/>            <Value value="virginica"/>        </DataField>    </DataDictionary>    <TreeModel functionName="classification" splitCharacteristic="binarySplit">        <MiningSchema>            <MiningField name="Species" usageType="target"/>            <MiningField name="sepal length (cm)"/>            <MiningField name="sepal width (cm)"/>            <MiningField name="petal length (cm)"/>            <MiningField name="petal width (cm)"/>        </MiningSchema>        <Output>            <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/>            <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/>            <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/>        </Output>        <Node id="1">            <True/>            <Node id="2" score="setosa" recordCount="50.0">                <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/>                <ScoreDistribution value="setosa" recordCount="50.0"/>                <ScoreDistribution value="versicolor" recordCount="0.0"/>                <ScoreDistribution value="virginica" recordCount="0.0"/>            </Node>            <Node id="3">                <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/>                <Node id="4">                    <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/>                    <Node id="5">                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/>                        <Node id="6" score="versicolor" recordCount="47.0">                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/>                            <ScoreDistribution value="setosa" recordCount="0.0"/>                            <ScoreDistribution value="versicolor" recordCount="47.0"/>                            <ScoreDistribution value="virginica" recordCount="0.0"/>                        </Node>                        <Node id="7" score="virginica" recordCount="1.0">                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/>                            <ScoreDistribution value="setosa" recordCount="0.0"/>                            <ScoreDistribution value="versicolor" recordCount="0.0"/>                            <ScoreDistribution value="virginica" recordCount="1.0"/>                        </Node>                    </Node>                    <Node id="8">                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/>                        <Node id="9" score="virginica" recordCount="3.0">                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/>                            <ScoreDistribution value="setosa" recordCount="0.0"/>                            <ScoreDistribution value="versicolor" recordCount="0.0"/>                            <ScoreDistribution value="virginica" recordCount="3.0"/>                        </Node>                        <Node id="10">                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/>                            <Node id="11" score="versicolor" recordCount="2.0">                                <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/>                                <ScoreDistribution value="setosa" recordCount="0.0"/>                                <ScoreDistribution value="versicolor" recordCount="2.0"/>                                <ScoreDistribution value="virginica" recordCount="0.0"/>                            </Node>                            <Node id="12" score="virginica" recordCount="1.0">                                <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/>                                <ScoreDistribution value="setosa" recordCount="0.0"/>                                <ScoreDistribution value="versicolor" recordCount="0.0"/>                                <ScoreDistribution value="virginica" recordCount="1.0"/>                            </Node>                        </Node>                    </Node>                </Node>                <Node id="13">                    <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/>                    <Node id="14">                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/>                        <Node id="15" score="virginica" recordCount="2.0">                            <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/>                            <ScoreDistribution value="setosa" recordCount="0.0"/>                            <ScoreDistribution value="versicolor" recordCount="0.0"/>                            <ScoreDistribution value="virginica" recordCount="2.0"/>                        </Node>                        <Node id="16" score="versicolor" recordCount="1.0">                            <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/>                            <ScoreDistribution value="setosa" recordCount="0.0"/>                            <ScoreDistribution value="versicolor" recordCount="1.0"/>                            <ScoreDistribution value="virginica" recordCount="0.0"/>                        </Node>                    </Node>                    <Node id="17" score="virginica" recordCount="43.0">                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/>                        <ScoreDistribution value="setosa" recordCount="0.0"/>                        <ScoreDistribution value="versicolor" recordCount="0.0"/>                        <ScoreDistribution value="virginica" recordCount="43.0"/>                    </Node>                </Node>            </Node>        </Node>    </TreeModel></PMML>

The first split (Node 1) is on petal width at 0.8. Node 2 (petal width <= 0.8) captures all of the setosa, with nothing else.

You can compare the pmml output to the graphviz output:

from sklearn.externals.six import StringIOimport pydotplus # this might be pydot for python 2.7dot_data = StringIO() tree.export_graphviz(clf,                      out_file=dot_data,                       feature_names=iris.feature_names,                       class_names=iris.target_names,                       filled=True, rounded=True,                       special_characters=True) graph = pydotplus.graph_from_dot_data(dot_data.getvalue())graph.write_pdf("D:/workspace/iris.pdf") # for in-line display, you can also do:# from IPython.display import Image  # Image(graph.create_png())  

enter image description here


Feel free to try Nyoka. Exports SKL models and then some.


Nyoka is a python library having support for Scikit-learn, XGBoost, LightGBM, Keras and Statsmodels.

Besides about 500 Python classes which each cover a PMML tag and all constructor parameters/attributes as defined in the standard, Nyoka also provides an increasing number of convenience classes and functions that make the Data Scientist’s life easier for example by reading or writing any PMML file in one line of code from within your favorite Python environment.

It can be installed from PyPi using :

pip install nyoka

Example code

Example 1

import pandas as pdfrom sklearn import datasetsfrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import StandardScaler, Imputerfrom sklearn_pandas import DataFrameMapperfrom sklearn.ensemble import RandomForestClassifieriris = datasets.load_iris()irisd = pd.DataFrame(iris.data, columns=iris.feature_names)irisd['Species'] = iris.targetfeatures = irisd.columns.drop('Species')target = 'Species'pipeline_obj = Pipeline([    ("mapping", DataFrameMapper([    (['sepal length (cm)', 'sepal width (cm)'], StandardScaler()) ,     (['petal length (cm)', 'petal width (cm)'], Imputer())    ])),    ("rfc", RandomForestClassifier(n_estimators = 100))])pipeline_obj.fit(irisd[features], irisd[target])from nyoka import skl_to_pmmlskl_to_pmml(pipeline_obj, features, target, "rf_pmml.pmml")

Example 2

from keras import applicationsfrom keras.layers import Flatten, Densefrom keras.models import Modelmodel = applications.MobileNet(weights='imagenet', include_top=False,input_shape = (224, 224,3))activType='sigmoid'x = model.outputx = Flatten()(x)x = Dense(1024, activation="relu")(x)predictions = Dense(2, activation=activType)(x)model_final = Model(inputs =model.input, outputs = predictions,name='predictions')from nyoka import KerasToPmmlcnn_pmml = KerasToPmml(model_final,dataSet='image',predictedClasses=['cats','dogs'])cnn_pmml.export(open('2classMBNet.pmml', "w"), 0)

More examples can be found in Nyoka's Github Page .