Section 2.4 Solutions

 

2.62

  1. The plot is shown below.

 

 

  1. For any regression, the residuals sum to zero. The sum of these residuals is -0.01, only because of round-off error.

 

  1. The residual plot is shown below. Obviously, there is a strong pattern in the plot, indicating a curve (not a line) should be fit to these data.

 

 

.

 

 


2.63

  1. The scatterplot (with regression line drawn in) is shown below.

 

 

  1. The regression line is drawn on the plot above. From this plot, it is obvious that a straight line does not capture the overall relationship in the data—a cubic curve would be better.

 

  1. The sum of the residuals is 0.01—the only reason the sum isn’t exactly 0 is because of round-off error. The residual plot below shows a definite pattern of cubic curvature, indicating that a cubic curve would better describe the overall relationship between the variables.

 

 


2.67

If grade inflation has occurred, then a student who gets an A today may have gotten a B in years past. If that student’s ability is the same (and only grade standards have changed), then the student’s SAT score would stay about the same. Hence, there are now students receiving As who have only “B-level” SAT scores, so the SAT scores of A students will now decrease, on average. This could happen at every grade level even though the overall average SAT of all students has increased (because the number of students at each grade level is also changing).

 

 

2.71

  1. The plot is shown below, with the outlier indicated (with an open circle).

 

 

  1. The scatterplot, including both regression lines, is shown below. Because of its position, the influential point will flatten the slope of the regression line. Hence, the dotted line represents the regression done while omitting the influential value.

 

 


2.78

  1. Since the added point falls exactly on the regression line for the other data, the regression line doesn’t change.

 

  1. The point is so influential because it is an outlier in the x-direction—all the information we for that x-value is based on a single point.

 

 

2.79

An example is shown in the graph below. For these data, the correlation among business economists is 0.970, among academic economists is 0.976, yet among all is –0.412.

 

 

 

2.83

  1. This residual plot shows a “fanning out” pattern, indicating that the regression line makes better predictions for lower salaries than for higher salaries. To eliminate the changing variation (the “fanning out”), the salary variable could be transformed and then the regression rerun (in the next part they discuss a logarithm transformation).

 

  1. This residual plot shows a pattern of curvature. The model overestimates the salaries of new players, underestimates the salaries of mid-career players, and again overestimates the salaries of late-career players. Because this is a multiple regression, there are many explanatory variables included in the regression. This particular residual plot indicates that the “number of years” variable should be included in the model as a quadratic term.