Section 2.4 Solutions
2.62
- The
plot is shown below.

- For
any regression, the residuals sum to zero. The sum of these residuals is
-0.01, only because of round-off error.
- The
residual plot is shown below. Obviously, there is a strong pattern in the
plot, indicating a curve (not a line) should be fit to these data.

.
2.63
- The
scatterplot (with regression line drawn in) is shown below.

- The
regression line is drawn on the plot above. From this plot, it is obvious
that a straight line does not capture the overall relationship in the
data—a cubic curve would be better.
- The
sum of the residuals is 0.01—the only reason the sum isn’t exactly 0 is
because of round-off error. The residual plot below shows a definite
pattern of cubic curvature, indicating that a cubic curve would better
describe the overall relationship between the variables.

2.67
If grade inflation
has occurred, then a student who gets an A today may have gotten a B in years
past. If that student’s ability is the same (and only grade standards have
changed), then the student’s SAT score would stay about the same. Hence, there
are now students receiving As who have only “B-level” SAT scores, so the SAT
scores of A students will now decrease, on average. This could happen at every
grade level even though the overall average SAT of all students has increased
(because the number of students at each grade level is also changing).
2.71
- The
plot is shown below, with the outlier indicated (with an open circle).

- The
scatterplot, including both regression lines, is shown below. Because of
its position, the influential point will flatten the slope of the
regression line. Hence, the dotted line represents the regression done
while omitting the influential value.

2.78
- Since
the added point falls exactly on the regression line for the other data,
the regression line doesn’t change.
- The
point is so influential because it is an outlier in the x-direction—all
the information we for that x-value is based on a single point.
2.79
An example is shown in the graph below. For these data, the
correlation among business economists is 0.970, among academic economists is
0.976, yet among all is –0.412.

2.83
- This
residual plot shows a “fanning out” pattern, indicating that the
regression line makes better predictions for lower salaries than for higher
salaries. To eliminate the changing variation (the “fanning out”), the
salary variable could be transformed and then the regression rerun (in the
next part they discuss a logarithm transformation).
- This
residual plot shows a pattern of curvature. The model overestimates the
salaries of new players, underestimates the salaries of mid-career
players, and again overestimates the salaries of late-career players.
Because this is a multiple regression, there are many explanatory
variables included in the regression. This particular residual plot
indicates that the “number of years” variable should be included in the
model as a quadratic term.