Math 217—Simple-Linear Regression Example
Data were collected on
115 homes sold in

Because a linear
relationship between these variables seems reasonable, we can proceed with the
regression analysis. The regression output from Minitab is shown below.
Regression
Analysis
The
regression equation is
selling
price (in $100) = - 61.8 + 0.682 square feet of living space
Predictor Coef SE Coef
T P
Constant -61.81 53.22
-1.16 0.248
square
feet of living space 0.68246 0.03126
21.83 0.000
S
= 162.902 R-Sq = 80.8% R-Sq(adj) = 80.7%
Analysis
of Variance
Source DF SS MS
F P
Regression 1
12646380 12646380 476.56
0.000
Residual
Error 113 2998686
26537
Total 114
15645066
Predicted
Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
2000 1303.1
19.1 (1265.3, 1340.9) (978.2, 1628.0)
The value of R-squared indicates that 80.8% of the
variation in selling price is explained by its linear relationship with square
footage. This is a fairly high R-squared value. Hence, we can feel pretty good
about making predictions based on this model.
Before we do
any inference based on the model, we must
check the normality and constant-variance conditions by looking at appropriate graphs of the
residuals:

The normality condition clearly appears to be met.
What about the constant-variance condition? Note the variability in the
residuals seems to “fan out” a bit (more variability in the residuals for
higher-priced houses). This violation isn’t awful, but perhaps a transformation
of the data should at least be considered (we’ll consider this in lab).
Now consider
the slope of the regression line. A significance test on the true slope (
versus
) clearly shows evidence (p-value = 0.000) that the population
slope is different from 0—assuming the population slope is 0 (that is, that square
footage has no linear impact on the selling price of a house), there is
essentially no chance of getting our sample slope value or a more extreme slope
value (note: that was just the “definition of the p-value in the context of the
problem” as I mentioned in class). This gives us strong evidence that the
square footage of a house has a statistically
significant linear impact on the selling price of a house (our p-value is
smaller than any typically-used significance level).
But is this result practically significant? To answer this
question, we can create a 95% confidence interval for the population slope:
(0.621, 0.743). (I got the t-value, 1.981, from Minitab—you
can get an estimate of it, based on 100 df, from Table D: 1.984.) Hence, for
each additional 100 square feet of living area, we are 95% confident that the
selling price increases by between $6,210 and $7,430 (remember, interpretation
of the slope in the context of the problem is part of the “explanation” piece
of regression). As always, our confidence in the method we use (i.e., our
methods gives correct results 95% of the time). Do you think this is of
practical importance?
The last bit of
output shows both a confidence interval for the average selling price and a
prediction interval for the selling price of a new house with 2000 square feet
(Minitab can easily create these intervals for any x-value of interest). Note
the prediction interval is substantially wider (which isn’t surprising). Be sure to use the interval (confidence
interval for a mean response or prediction interval for a new value) that best
answers your particular research question.