calculate distance between regression line and datapoint calculate distance between regression line and datapoint r r

calculate distance between regression line and datapoint


You are basically asking for the residuals.

R> residuals(res)      1       2       3       4       5       6  192.61   12.57 -185.48 -205.52  -26.57  212.39 

As an aside, when you fit a linear regression, the sum of the residuals is 0:

R> sum(residuals(res))[1] 8.882e-15

and if the model is correct, should follow a Normal distribution - qqnorm(res).

I find working with the standardised residuals easier.

> rstandard(res)       1        2        3        4        5        6  1.37707  0.07527 -1.02653 -1.13610 -0.15845  1.54918 

These residuals have been scaled to have mean zero, variance (approximately) equal to one and have a Normal distribution. Outlying standardised residuals are those larger that +/- 2.


You can use the function below:

http://paulbourke.net/geometry/pointlineplane/pointline.r

Then just extract the slope and intercept:

> coef(res)  (Intercept) concentration    -210.61098      22.00441

So your final answer would be:

concentration <- c(1,10,20,30,40,50)signal <- c(4, 22, 44, 244, 643, 1102)plot(concentration, signal)res <- lm(signal ~ concentration)abline(res)

plot

cfs <- coef(res)distancePointLine(y=signal[5], x=concentration[5], slope=cfs[2], intercept=cfs[1])

If you want a more general solution to finding a particular point, concentration == 40 returns a Boolean vector of length length(concentration). You can use that vector to select points.

pt.sel <- ( concentration == 40 )> pt.sel[1] FALSE FALSE FALSE FALSE TRUE FALSE> distancePointLine(y=signal[pt.sel], x=concentration[pt.sel], slope=cfs["concentration"], intercept=cfs["(Intercept)"])     1.206032

Unfortunately distancePointLine doesn't appear to be vectorized (or it does, but it returns a warning when you pass it a vector). Otherwise you could get answers for all points just by leaving the [] selector off the x and y arguments.