How to Interpret Predict Result of SVM in R?
Since your outcome variable is numeric, it uses the regression formulation of SVM. I think you want the classification formulation. You can change this by either coercing your outcome into a factor, or setting type="C-classification"
.
Regression:
> model <- svm(vs ~ hp+mpg+gear,data=mtcars)> predict(model) Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive 0.8529506670 0.8529506670 0.9558654451 0.8423224174 Hornet Sportabout Valiant Duster 360 Merc 240D 0.0747730699 0.6952501964 0.0123405904 0.9966162477 Merc 230 Merc 280 Merc 280C Merc 450SE 0.9494836511 0.7297563543 0.6909235343 -0.0327165348 Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental -0.0092851098 -0.0504982402 0.0319974842 0.0504292348 Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla -0.0504750284 0.9769206963 0.9724676874 0.9494910097 Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 0.9496260289 0.1349744908 0.1251344111 0.0395243313 Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa 0.0983094417 1.0041732099 0.4348209129 0.6349628695 Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E 0.0009258333 0.0607896408 0.0507385269 0.8664157985
Classification:
> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars)> predict(model) Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive 1 1 1 1 Hornet Sportabout Valiant Duster 360 Merc 240D 0 1 0 1 Merc 230 Merc 280 Merc 280C Merc 450SE 1 1 1 0 Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental 0 0 0 0 Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla 0 1 1 1 Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 1 0 0 0 Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa 0 1 0 1 Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E 0 0 0 1 Levels: 0 1
Also, if you want probabilities as your prediction rather than just the raw classification, you can do that by fitting with the probability option.
With Probabilities:
> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars,probability=TRUE)> predict(model,mtcars,probability=TRUE) Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive 1 1 1 1 Hornet Sportabout Valiant Duster 360 Merc 240D 0 1 0 1 Merc 230 Merc 280 Merc 280C Merc 450SE 1 1 1 0 Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental 0 0 0 0 Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla 0 1 1 1 Toyota Corona Dodge Challenger AMC Javelin Camaro Z28 1 0 0 0 Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa 0 1 0 1 Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E 0 0 0 1 attr(,"probabilities") 0 1Mazda RX4 0.2393753 0.76062473Mazda RX4 Wag 0.2393753 0.76062473Datsun 710 0.1750089 0.82499108Hornet 4 Drive 0.2370382 0.76296179Hornet Sportabout 0.8519490 0.14805103Valiant 0.3696019 0.63039810Duster 360 0.9236825 0.07631748Merc 240D 0.1564898 0.84351021Merc 230 0.1780135 0.82198650Merc 280 0.3402143 0.65978567Merc 280C 0.3829336 0.61706640Merc 450SE 0.9110862 0.08891378Merc 450SL 0.8979497 0.10205025Merc 450SLC 0.9223868 0.07761324Cadillac Fleetwood 0.9187301 0.08126994Lincoln Continental 0.9153549 0.08464509Chrysler Imperial 0.9358186 0.06418140Fiat 128 0.1627969 0.83720313Honda Civic 0.1649799 0.83502008Toyota Corolla 0.1781531 0.82184689Toyota Corona 0.1780519 0.82194807Dodge Challenger 0.8427087 0.15729129AMC Javelin 0.8496198 0.15038021Camaro Z28 0.9190294 0.08097056Pontiac Firebird 0.8361349 0.16386511Fiat X1-9 0.1490934 0.85090660Porsche 914-2 0.5797194 0.42028060Lotus Europa 0.4169587 0.58304133Ford Pantera L 0.8731716 0.12682843Ferrari Dino 0.8392372 0.16076281Maserati Bora 0.8519422 0.14805785Volvo 142E 0.2289231 0.77107694
Very broadly speaking with classifiers like this, the predicted value for a binary response variable can be thought of as the probability that that observation belongs to class 1 (in this case your classes are actually labeled 0/1; in other cases you'd need to know which class the function treats as 1 or 0; R often sorts the labels of factors alphabetically and so the last one would be class 1).
So the most common thing people do is use 0.5 as a cutoff. But I should warn you that there is plenty of math behind that decision and the particulars of your modeling circumstances can necessitate a different cutoff value. Using 0.5 as the cutoff is often the best thing to do, but SVMs are fairly complicated beasts; I would recommend that you do some reading on SVMs and classification theory in general before you start trying to apply them to real data.
My favorite reference is The Elements of Statistical Learning, by Hastie, Tibshirani and Friedman.