How to Interpret Predict Result of SVM in R? How to Interpret Predict Result of SVM in R? r r

How to Interpret Predict Result of SVM in R?


Since your outcome variable is numeric, it uses the regression formulation of SVM. I think you want the classification formulation. You can change this by either coercing your outcome into a factor, or setting type="C-classification".

Regression:

> model <- svm(vs ~ hp+mpg+gear,data=mtcars)> predict(model)          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive        0.8529506670        0.8529506670        0.9558654451        0.8423224174   Hornet Sportabout             Valiant          Duster 360           Merc 240D        0.0747730699        0.6952501964        0.0123405904        0.9966162477            Merc 230            Merc 280           Merc 280C          Merc 450SE        0.9494836511        0.7297563543        0.6909235343       -0.0327165348          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental       -0.0092851098       -0.0504982402        0.0319974842        0.0504292348   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla       -0.0504750284        0.9769206963        0.9724676874        0.9494910097       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28        0.9496260289        0.1349744908        0.1251344111        0.0395243313    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa        0.0983094417        1.0041732099        0.4348209129        0.6349628695      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E        0.0009258333        0.0607896408        0.0507385269        0.8664157985 

Classification:

> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars)> predict(model)          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive                   1                   1                   1                   1   Hornet Sportabout             Valiant          Duster 360           Merc 240D                   0                   1                   0                   1            Merc 230            Merc 280           Merc 280C          Merc 450SE                   1                   1                   1                   0          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental                   0                   0                   0                   0   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla                   0                   1                   1                   1       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28                   1                   0                   0                   0    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa                   0                   1                   0                   1      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E                   0                   0                   0                   1 Levels: 0 1

Also, if you want probabilities as your prediction rather than just the raw classification, you can do that by fitting with the probability option.

With Probabilities:

> model <- svm(as.factor(vs) ~ hp+mpg+gear,data=mtcars,probability=TRUE)> predict(model,mtcars,probability=TRUE)          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive                   1                   1                   1                   1   Hornet Sportabout             Valiant          Duster 360           Merc 240D                   0                   1                   0                   1            Merc 230            Merc 280           Merc 280C          Merc 450SE                   1                   1                   1                   0          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental                   0                   0                   0                   0   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla                   0                   1                   1                   1       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28                   1                   0                   0                   0    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa                   0                   1                   0                   1      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E                   0                   0                   0                   1 attr(,"probabilities")                            0          1Mazda RX4           0.2393753 0.76062473Mazda RX4 Wag       0.2393753 0.76062473Datsun 710          0.1750089 0.82499108Hornet 4 Drive      0.2370382 0.76296179Hornet Sportabout   0.8519490 0.14805103Valiant             0.3696019 0.63039810Duster 360          0.9236825 0.07631748Merc 240D           0.1564898 0.84351021Merc 230            0.1780135 0.82198650Merc 280            0.3402143 0.65978567Merc 280C           0.3829336 0.61706640Merc 450SE          0.9110862 0.08891378Merc 450SL          0.8979497 0.10205025Merc 450SLC         0.9223868 0.07761324Cadillac Fleetwood  0.9187301 0.08126994Lincoln Continental 0.9153549 0.08464509Chrysler Imperial   0.9358186 0.06418140Fiat 128            0.1627969 0.83720313Honda Civic         0.1649799 0.83502008Toyota Corolla      0.1781531 0.82184689Toyota Corona       0.1780519 0.82194807Dodge Challenger    0.8427087 0.15729129AMC Javelin         0.8496198 0.15038021Camaro Z28          0.9190294 0.08097056Pontiac Firebird    0.8361349 0.16386511Fiat X1-9           0.1490934 0.85090660Porsche 914-2       0.5797194 0.42028060Lotus Europa        0.4169587 0.58304133Ford Pantera L      0.8731716 0.12682843Ferrari Dino        0.8392372 0.16076281Maserati Bora       0.8519422 0.14805785Volvo 142E          0.2289231 0.77107694


Very broadly speaking with classifiers like this, the predicted value for a binary response variable can be thought of as the probability that that observation belongs to class 1 (in this case your classes are actually labeled 0/1; in other cases you'd need to know which class the function treats as 1 or 0; R often sorts the labels of factors alphabetically and so the last one would be class 1).

So the most common thing people do is use 0.5 as a cutoff. But I should warn you that there is plenty of math behind that decision and the particulars of your modeling circumstances can necessitate a different cutoff value. Using 0.5 as the cutoff is often the best thing to do, but SVMs are fairly complicated beasts; I would recommend that you do some reading on SVMs and classification theory in general before you start trying to apply them to real data.

My favorite reference is The Elements of Statistical Learning, by Hastie, Tibshirani and Friedman.