Fastest SVM implementation usable in Python [closed] Fastest SVM implementation usable in Python [closed] python python

Fastest SVM implementation usable in Python [closed]


The most scalable kernel SVM implementation I know of is LaSVM. It's written in C hence wrap-able in Python if you know Cython, ctypes or cffi. Alternatively you can use it from the command line. You can use the utilities in sklearn.datasets to load convert data from a NumPy or CSR format into svmlight formatted files that LaSVM can use as training / test set.


Alternatively you can run the grid search on 1000 random samples instead of the full dataset:

>>> from sklearn.cross_validation import ShuffleSplit>>> cv = ShuffleSplit(3, test_fraction=0.2, train_fraction=0.2, random_state=0)>>> gs = GridSeachCV(clf, params_grid, cv=cv, n_jobs=-1, verbose=2)>>> gs.fit(X, y)

It's very likely that the optimal parameters for 5000 samples will be very close to the optimal parameters for 1000 samples. So that's a good way to start your coarse grid search.

n_jobs=-1 makes it possible to use all your CPUs to run the individual CV fits in parallel. It's using mulitprocessing so the python GIL is not an issue.


Firstly, according to scikit-learn's benchmark (here), scikit-learn is already one of the fastest if not fastest SVM package around. Hence, you might want to consider other ways of speeding up the training.

As suggested by bavaza, you can try to multi-thread the training process. If you are using Scikit-learn's GridSearchCV class, you can easily set n_jobs argument to be larger than the default value of 1 to perform the training in parallel at the expense of using more memory.You can find its the documentation here An example of how to use the class can be found here

Alternatively, you can take a look at Shogun Machine Learning Libraryhere

Shogun is designed for large scale machine learning with wrappers to many common svm packages and it is implemented in C/C++ with bindings for python. According to Scikit-learn's benchmark above, it's speed is comparable to scikit-learn. On other tasks (other than the one they demonstrated), it might be faster so it is worth giving a try.

Lastly, you can try to perform dimension reduction e.g. using PCA or randomized PCA to reduce the dimension of your feature vectors. That would speed up the training process. The documentation for the respective classes can be found in these 2 links: PCA, Randomized PCA . You can find examples on how to use them in Scikit-learn's examples section.