Elementary Statistics – Regression Diagnostics

 

The value and the residual plot tell us different things about a regression line. The value measures the proportion of the variation in the response variable that is explained by the regression line. If this value is low, then there is a lot of variability left unexplained, and we cannot trust our predictions. The residual plot tells us if a line is actually the best way to describe the relationship in the data (a random scatter of residuals means the line is a good description; a pattern in the residuals means there is a better description than a line or a transformation should be used).

 

 

Example 1 – High value and no pattern in the residuals

A line is the best way to describe the relationship in the data and most all of the variation in the response variable is explained.

 

 

 

 


Example 2 – High value, yet a pattern in the residuals

Most of the variation in the response variable is explained, but a line is not the best way to characterize the relationship—a curve would fit the data better (and then the  value will increase, too).

 

 

 

 

 

 

 

 

 

 


Example 3 – No pattern in the residuals, but low value

A line is the best way to describe the relationship, but a lot of the variability in the response variable is left unexplained (perhaps another variable could be added to the regression).