Is sklearn.metrics.mean_squared_error the larger the better (negated)?

python scikit-learn metrics mean-square-error

The actual function "mean_squared_error" doesn't have anything about the negative part. But the function implemented when you try 'neg_mean_squared_error' will return a negated version of the score.

Please check the source code as to how its defined in the source code:

neg_mean_squared_error_scorer = make_scorer(mean_squared_error,                                        greater_is_better=False)

Observe how the param greater_is_better is set to False.

Now all these scores/losses are used in various other things like cross_val_score, cross_val_predict, GridSearchCV etc. For example, in cases of 'accuracy_score' or 'f1_score', the higher score is better, but in case of losses (errors), lower score is better. To handle them both in same way, it returns the negative.

So this utility is made for handling the scores and losses in same way without changing the source code for the specific loss or score.

So, you did not miss anything. You just need to take care of the scenario where you want to use the loss function. If you only want to calculate the mean_squared_error you can use mean_squared_error only. But if you want to use it to tune your models, or cross_validate using the utilities present in Scikit, use 'neg_mean_squared_error'.

Maybe add some details about that and I will explain more.

python scikit-learn metrics mean-square-error

It's a convention for implementing your own scoring object [1]. And it must be positive, because you can create a non-loss function to compute a custom positive score. That means that by using a loss function (for a score object) you have to the negative value.

The range of a loss function is: (optimum) [0. ... +] (e.g. unequal values between y and y'). For instance check the formula of the mean squared error, it's always positive:

Image source: http://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-error

python scikit-learn metrics mean-square-error

This is exactly what I am looking for in my code that I am trying to decipher and clarify the rmse reports to make sense of my data.

in my case, I am using this approach to calculate the rmse. How should I read the reports? Is higher better or is it the opposite?

def rmsle_cv(model):    kf = KFold(n_folds, random_state=42).get_n_splits(train)    rmse= np.sqrt(-cross_val_score(model, train, y_train, scoring="neg_mean_squared_error", cv = kf))    return(rmse)def rmsle(y, y_pred):    return np.sqrt(mean_squared_error(y, y_pred))

In my case, I am getting these results

Lasso score(cv): 0.1176 (0.0068)ElasticNet score(cv): 0.1177 (0.0068)Ridge(01): 0.1270 (0.0097)Gradient Boosting score(cv): 0.1443 (0.0109)BayRidge(01): 0.1239 (0.0079)Kernel Ridge score(cv): 0.1210 (0.0068)Xgboost score(cv): 0.1177 (0.0060)LGBM score(cv): 0.1158 (0.0064)

CodeHunter

Is sklearn.metrics.mean_squared_error the larger the better (negated)?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last