Does the SVM in sklearn support incremental (online) learning? Does the SVM in sklearn support incremental (online) learning? python python

Does the SVM in sklearn support incremental (online) learning?


While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

One of my specifications is that it should continuously update to changing trends.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it's learning rate does not decrease over time.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

It also has more online linear and kernel methods.

(bias note: I'm the author of JSAT).


Maybe it's me being naive but I think it is worth mentioning how to actually update the sci-kit SGD classifier when you present your data incrementally:

clf = linear_model.SGDClassifier()x1 = some_new_datay1 = the_labelsclf.partial_fit(x1,y1)x2 = some_newer_datay2 = the_labelsclf.partial_fit(x2,y2)


Technical aspects

The short answer is no. Sklearn implementation (as well as most of the existing others) do not support online SVM training. It is possible to train SVM in an incremental way, but it is not so trivial task.

If you want to limit yourself to the linear case, than the answer is yes, as sklearn provides you with Stochastic Gradient Descent (SGD), which has option to minimize the SVM criterion.

You can also try out pegasos library instead, which supports online SVM training.

Theoretical aspects

The problem of trend adaptation is currently very popular in ML community. As @Raff stated, it is called concept drift, and has numerous approaches, which are often kinds of meta models, which analyze "how the trend is behaving" and change the underlying ML model (by for example forcing it to retrain on the subset of the data). So you have two independent problems here:

  • the online training issue, which is purely technical, and can be addressed by SGD or other libraries than sklearn
  • concept drift, which is currently a hot topic and has no just works answers There are many possibilities, hypothesis and proofes of concepts, while there is no one, generaly accepted way of dealing with this phenomena, in fact many phd dissertations in ML are currenlly based on this issue.