Math 217 Homework 2 Solutions
11.7
We
aren’t sure of the degrees of freedom (because the number of predictor
variables isn’t given), but it’s probably very large (since n = 1810). That is,
it’s well over 1000. From Table D (using df=1000), we can say the P-value is less than 2(.0005) = 0.001
(and it’s probably much less than this). Hence, country does have a significant
impact on the willingness to pay more. (But is it practically significant? We
can’t be sure without knowing the units on the response variable.)
The
exclusion of people who answered “don’t know” is another place for potential
bias. If a large percentage answered “don’t know,” then maybe the question was
worded poorly or maybe the “don’t know” gives us important information about a
certain question. I would like to know what percentage of the survey subjects
answered “don’t know” (and if it’s high, then I’d wonder about potential bias
in the results).
11.16

Regression
Analysis: Total Assets (bi versus Internet Account
The regression equation is
Total
Assets (billions of $) = - 17.1 + 0.0832 Internet Accounts (in 1000s)
Predictor Coef SE Coef T
P
Constant -17.121 8.778 -1.95
0.087
Internet
Accounts (in 1000s)
0.083205 0.007592 10.96
0.000
S
= 20.1877 R-Sq = 93.8% R-Sq(adj) = 93.0%

The
residual plot no longer shows a pattern of curvature (although the two outliers
are clearly present). And the condition of normality seems plausible, based on
the histogram and normality plot. Hence, it seems like the conditions of our
regression model are met. Plus the R-squared value is now at 97.9%, indicating
that our model explains 97.9% of the variation in total assets—this is a very
high.
Regression
Analysis: Total Assets versus Internet Acc, Internet Acc
The regression equation is
Total
Assets (billions of $) = 7.61 - 0.0046 Internet Accounts (in 1000s)
+ 0.000034
Internet Accounts Squared
Predictor Coef SE Coef T
P
Constant 7.608 8.503
0.89 0.401
Internet
Accounts (in 1000s) -0.00457 0.02378 -0.19
0.853
Internet
Accounts Squared 0.00003361 0.00000893 3.76
0.007
S
= 12.4117 R-Sq = 97.9% R-Sq(adj) = 97.3%



11.18
As I mentioned in the
previous problem, the two outlying companies are Charles Schwab and Fidelity.
After removing these two companies, the linear regression output is shown
below. The residuals indicate that the model conditions are probably not met
(slight fanning out in the residual plot and the histogram and normality plot
indicate deviations from normality). Hence, we should be leery of doing
inference with this model. (If we feel comfortable doing inference, then we see
that the Internet accounts variable is statistically significant at the 0.05
level.) The slope coefficient changed quite a bit without the outliers: Now for
each additional 1000 Internet accounts a company has, it is predicted to have
an additional $29,700,000 in total assets Note that the R-squared value is now
only 50%, which is much lower than when the outliers were included.
Regression Analysis: Total Assets (bi
versus Internet Account
The regression equation is
Total Assets (billions of $)
= 2.13 + 0.0297 Internet Accounts (in 1000s)
Predictor Coef SE Coef T
P
Constant 2.131 5.787 0.37
0.725
Internet Accounts (in
1000s) 0.02967 0.01211 2.45
0.050
S = 9.36903 R-Sq
= 50.0% R-Sq(adj) = 41.7%




Note that since the residual plot does not show
any curvature, it doesn’t really make sense to fit a quadratic model. The
problem asks for this, but it wouldn’t be a natural step in the data analysis.
Below is the output from
the quadratic regression. (A look at the residuals indicates that normality
seems plausible, as does the constant-variance, although there is a slight
fanning out in the residual plot.) Notice that the F-test for the overall model
is not significant (p-value = 0.093). Hence, we cannot reject the hypothesis
that all slope coefficients are zero. This is then the end of our analysis.
(Which isn’t surprising based on the italicized comment I made above.)
Regression Analysis: Total Assets versus
Internet Acc, Internet Acc
The regression equation is
Total Assets (billions of $)_1 = - 6.74 + 0.0887 Internet Accounts (in 1000s)_1
- 0.000063
Internet Accounts Squared_1
Predictor Coef SE Coef T P
Constant -6.737 9.204 -0.73
0.497
Internet Accounts (in 1000s)_1 0.08874 0.05016
1.77 0.137
Internet Accounts
Squared_1 -0.00006251 0.00005163 -1.21
0.280
S = 9.02527 R-Sq
= 61.4% R-Sq(adj) = 45.9%
Analysis of Variance
Source DF SS
MS F P
Regression 2
646.80 323.40 3.97 0.093
Residual Error 5
407.28 81.46
Total 7 1054.08
11.24
Correlations:
Grade Point Average, IQ Test Score, Self-Concept Score
Grade Point Avg IQ Test Score
IQ
Test Score 0.634
Self-Concept
Score 0.542 0.493
The
straight-line relationship with IQ test score explains
of the variation in GPA. The straight-line relationship with
Self-Concept score explains
of the variation in GPA.
The
regression model explains 47.1% of the variation in GPA.
Regression
Analysis: Grade Point
versus IQ Test Scor, Self-Concept
The regression equation is
Grade
Point Average = - 3.88 + 0.0772 IQ Test Score + 0.0513 Self-Concept Score
Predictor Coef SE Coef T
P
Constant -3.882 1.472 -2.64
0.010
IQ
Test Score 0.07720 0.01539 5.02
0.000
Self-Concept
Score 0.05125 0.01633
3.14 0.002
S
= 1.54715 R-Sq = 47.1% R-Sq(adj) = 45.7%



Interpretation
of the gender coefficient: For constant IQ score and Self-Concept score,
females are predicted to have a GPA 0.9685 higher than for males.
Regression
Analysis: Grade Point
versus IQ Test Scor, Self-Concept, ...
The regression equation is
Grade
Point Average = - 5.02 + 0.0841 IQ Test Score + 0.0513 Self-Concept Score
+ 0.969 Gender_F
Predictor Coef SE Coef T
P
Constant -5.022 1.470 -3.42
0.001
IQ
Test Score 0.08412 0.01495 5.62
0.000
Self-Concept
Score 0.05129 0.01565
3.28 0.002
Gender_F
0.9685 0.3495 2.77 0.007
S
= 1.48253 R-Sq = 52.1% R-Sq(adj) = 50.1%



11.26


Descriptive
Statistics: VOPlus (Bone Formation Measure)
Variable N Mean
StDev
Minimum Q1 Median
Q3 Maximum IQR
VOPlus 31
986 580 285 513
870 1251 2545
738
For
VO-, graphics and numerical summaries are shown below. The distribution is
somewhat hard to characterize. Most of the women have VO- values in the
500-1000 range, and then there’s a small right tail The
median is 903 (no units given). The boxplot indicates
an outlying VO- value on the high end.


Descriptive
Statistics: VOMinus (Bone Resorption
Meas.)
Variable N Mean StDev Minimum
Q1 Median Q3
Maximum IQR
VOMinus 31 889.2 427.6
254.0 536.0 903.0
1028.0 2236.0 492.0
For
Osteocalcin, graphics and numerical summaries are
shown below. The distribution of values is skewed to the right, with a median
of 30.20 mg/ml. The boxplot indicates no outlying
points.


Descriptive
Statistics: Osteocalcin (mg/ml) - Biomarker
Variable N
Mean StDev Minimum Q1
Median Q3 Maximum
IQR
Osteocalcin 31 33.42 19.61
8.10 17.90 30.20
47.70 77.90 29.80
For Tartrate
Resistant Acid Phosphatase (TRAP), graphics and
numerical summaries are shown below. The distribution of values is skewed to
the right, with a median of 10.30 units per liter. The boxplot
indicates no outlying points.


Descriptive
Statistics: TRAP (U/l) - Biomarker
Variabl N Mean StDev Minimum Q1
Median Q3 Maximum
IQR
TRAP 31 13.25
6.53 3.30 8.80
10.30 19.00 28.80
10.20

Correlations: VOPlus,
VOMinus, Osteocalcin , and TRAP
VOPlus VOMinus Osteocalcin
VOMinus
0.898
Osteocalcin
0.647 0.455
TRAP 0.754 0.678
0.730
11.27
Regression
Analysis: VOPlus (Bone For versus Osteocalcin
(mg/
The regression equation is
VOPlus (Bone Formation Measure) = 346 + 19.1 Osteocalcin (mg/ml) - Biomarker
Predictor Coef SE Coef T
P
Constant 346.2 161.5 2.14
0.041
Osteocalcin (mg/ml) - Biomarker 19.142 4.185
4.57 0.000
S
= 449.527 R-Sq = 41.9% R-Sq(adj) = 39.9%



The
overall F test (P-value = 0.000) indicates that at least one of the population
regression coefficients is not zero. Looking at the individual coefficients,
the one associated with OC does not have a significant (P-value = 0.25) impact
on VO+, in the presence of TRAP; the one associated with TRAP does have a
significant (P-value = 0.002) impact on VO+, even in the presence of OC. This
is not surprising, given that OC and TRAP are correlated, yet TRAP is more
strongly correlated to VO+ than OC is. [Again, it’s important to realize that
this inference might not be accurate, since the normality condition of our
model is not met.]
Descriptively,
we see that 58.8% of the variation in VO+ is explained by the regression model.
Also, for each additional mg/ml of OC, the predicted VO+ increases by 6.157
units (assuming TRAP is held constant); for each additional unit/liter of TRAP,
the predicted VO+ increases by 53.44 units (assuming OC is held constant.
Regression
Analysis: VOPlus versus Osteocalcin , TRAP
The regression equation is
VOPlus (Bone Formation Measure) = 72 + 6.16 Osteocalcin (mg/ml) - Biomarker
+ 53.4
TRAP (U/l) - Biomarker
Predictor Coef SE Coef T
P
Constant 72.1 160.2 0.45
0.656
Osteocalcin (mg/ml) - Biomarker 6.157 5.246
1.17 0.250
TRAP
(U/l) - Biomarker 53.44 15.76 3.39
0.002
S
= 385.162 R-Sq = 58.8% R-Sq(adj) = 55.9%
Analysis of Variance
Source DF SS
MS F P
Regression 2
5933269 2966635 20.00
0.000
Residual
Error 28 4153791
148350
Total 30 10087061



11.28
The
overall F test (P-value = 0.000) indicates that at least one of the population
regression coefficients is not zero. Looking at the individual coefficients,
the one associated with OC does have a significant (P-value = 0.010) impact on
VO+, even in the presence of TRAP and VO-; the one associated with TRAP does
not have a significant (P-value = 0.637) impact on VO+, in the presence of OC
and VO-; the one associated with VO- does have a significant impact on VO+,
even in the presence of OC and TRAP.
Descriptively,
we see that 87.9% (which is quite high) of the variation in VO+ is explained by
the regression model. Also, for each additional mg/ml of OC, the predicted VO+
increases by 8.021 units (assuming TRAP and VO- are held constant); for each
additional unit/liter of TRAP, the predicted VO+ increases by 5.04 units
(assuming OC and VO- are held constant); for each additional unit of VO-, the
predicted VO+ increases by 0.9979 units (assuming OC and TRAP are held
constant).
Regression
Analysis: VOPlus (Bone versus Osteocalcin , TRAP (U/l) -,
...
The regression equation is
VOPlus (Bone Formation Measure) = - 236 + 8.02 Osteocalcin (mg/ml)
+ 5.0 TRAP (U/l) + 0.998 VOMinus (Bone Resorption Meas.)
Predictor Coef SE Coef T
P
Constant -236.36 96.37 -2.45
0.021
Osteocalcin (mg/ml) - Biomarker 8.021
2.904 2.76 0.010
TRAP
(U/l) - Biomarker 5.04 10.57
0.48 0.637
VOMinus (Bone Resorption Meas.) 0.9979 0.1239
8.06 0.000
S
= 212.580 R-Sq = 87.9% R-Sq(adj) = 86.6%
Analysis of Variance
Source DF SS
MS F P
Regression 3
8866925 2955642 65.40
0.000
Residual
Error 27 1220136
45190
Total 30 10087061



Predictor Coef SE Coef T
P
Constant 346.2 161.5 2.14
0.041
Osteocalcin (mg/ml) - Biomarker 19.142
4.185 4.57
0.000
Model
2 (both OC and TRAP as predictors):
Predictor Coef SE Coef T
P
Constant 72.1
160.2 0.45 0.656
Osteocalcin (mg/ml) - Biomarker 6.157 5.246 1.17
0.250
TRAP
(U/l) - Biomarker 53.44
15.76 3.39 0.002
Model 3 (OC, TRAP, and VO- as predictors):
Predictor Coef SE Coef T
P
Constant -236.36 96.37 -2.45
0.021
Osteocalcin (mg/ml) - Biomarker 8.021
2.904 2.76 0.010
TRAP
(U/l) - Biomarker 5.04 10.57
0.48 0.637
VOMinus (Bone Resorption Meas.) 0.9979 0.1239
8.06 0.000
The
estimated coefficient on OC changes greatly from the first model (when it’s the
only predictor) compared to the second and third models. Furthermore, the
significance of the OC coefficient changes between models (significant in the
first, non-significant in the second, and significant again in the third). The
estimated coefficient on TRAP stays fairly consistent between models 2 and 3
(and it’s non-significant in both).
Model 2 (both OC and TRAP as
predictors): S = 385.162 R-Sq = 58.8%
Model 3 (OC, TRAP, and VO- as predictors): S = 212.580
R-Sq = 87.9%
As
the number of predictor variables increases, the percentage of variation
explained (R-squared) increases, and the standard deviation of the residuals,
s, decreases. (These are both good things from a model standpoint. Although
R-squared always increases with an additional predictor variable, even if the
variable doesn’t have a significant impact on the response.)
The
overall F test (P-value = 0.000) indicates that at least one of the population
regression coefficients is not zero. Looking at the individual coefficients,
the one associated with OC does have a significant (P-value = 0.000) impact on
VO+, even in the presence of VO-; the one associated with VO- does have a
significant (P-value = 0.000) impact on VO+, even in the presence of OC.
Descriptively,
we see that 87.8% (which is quite high) of the variation in VO+ is explained by
the regression model. Also, for each additional mg/ml of OC, the predicted VO+
increases by 8.921 units (assuming VO- stays constant); for each additional
unit of VO-, the predicted VO+ increases by 1.0315 units (assuming OC stays
constant).
This
is clearly the best model of the four we ran. It has the highest adjusted
R-squared value (which adjusts for the number of predictor variables) and all
the coefficients are statistically significant (which we can trust, because the
model conditions seem to be met).
Regression
Analysis: VOPlus versus Osteocalcin
, VOMinus
The regression equation is
VOPlus (Bone Formation Measure) = - 229 + 8.91 Osteocalcin (mg/ml) + 1.03 VOMinus
Predictor Coef SE Coef T
P
Constant -229.23 93.88 -2.44
0.021
Osteocalcin (mg/ml) - Biomarker 8.912
2.191 4.07 0.000
VOMinus (Bone Resorption Meas.) 1.0315 0.1005 10.26
0.000
S
= 209.626 R-Sq = 87.8% R-Sq(adj) = 86.9%
Analysis of Variance
Source DF SS
MS F P
Regression 2
8856651 4428325 100.77
0.000
Residual
Error 28 1230410
43943
Total 30 10087061


