Fastest SVM implementation usable in Python [closed]

python machine-learning gpu svm scikit-learn

The most scalable kernel SVM implementation I know of is LaSVM. It's written in C hence wrap-able in Python if you know Cython, ctypes or cffi. Alternatively you can use it from the command line. You can use the utilities in sklearn.datasets to load convert data from a NumPy or CSR format into svmlight formatted files that LaSVM can use as training / test set.

python machine-learning gpu svm scikit-learn

Alternatively you can run the grid search on 1000 random samples instead of the full dataset:

>>> from sklearn.cross_validation import ShuffleSplit>>> cv = ShuffleSplit(3, test_fraction=0.2, train_fraction=0.2, random_state=0)>>> gs = GridSeachCV(clf, params_grid, cv=cv, n_jobs=-1, verbose=2)>>> gs.fit(X, y)

It's very likely that the optimal parameters for 5000 samples will be very close to the optimal parameters for 1000 samples. So that's a good way to start your coarse grid search.

n_jobs=-1 makes it possible to use all your CPUs to run the individual CV fits in parallel. It's using mulitprocessing so the python GIL is not an issue.

python machine-learning gpu svm scikit-learn

Firstly, according to scikit-learn's benchmark (here), scikit-learn is already one of the fastest if not fastest SVM package around. Hence, you might want to consider other ways of speeding up the training.

As suggested by bavaza, you can try to multi-thread the training process. If you are using Scikit-learn's GridSearchCV class, you can easily set n_jobs argument to be larger than the default value of 1 to perform the training in parallel at the expense of using more memory.You can find its the documentation here An example of how to use the class can be found here

Alternatively, you can take a look at Shogun Machine Learning Libraryhere

Shogun is designed for large scale machine learning with wrappers to many common svm packages and it is implemented in C/C++ with bindings for python. According to Scikit-learn's benchmark above, it's speed is comparable to scikit-learn. On other tasks (other than the one they demonstrated), it might be faster so it is worth giving a try.

Lastly, you can try to perform dimension reduction e.g. using PCA or randomized PCA to reduce the dimension of your feature vectors. That would speed up the training process. The documentation for the respective classes can be found in these 2 links: PCA, Randomized PCA . You can find examples on how to use them in Scikit-learn's examples section.

CodeHunter

Fastest SVM implementation usable in Python [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last