Math 217—Multiple Regression Example
Data were collected on 522
homes sold in a
In class, I’ll show you a
number of graphs of the individual variables and of all the variables with
house price. (Remember, it’s good practice to start your analysis at a basic graphical
and numerical level, before jumping into multiple regression.)
Using both Minitab’s
stepwise and best-subsets procedures, the following predictors seem most
important: finished square feet, number of bedrooms, garage size, quality
index, and lot size. The regression output for this particular model is shown
below. How do you interpret all these results? (Remember, this is the potentially “dangerous” territory of using our
data to both create and check our model. If we use this model as an explanation
of sales price, we’ll want to check it on a new data set. Alternatively, I
could have separated my data into a model-selection set and a model-testing
set.)
The regression equation is
Sales
Price (in dollars) =
+ 23130 Garage Size
(no. of cars) - 69804 Quality Index
+ 1.07
Predictor Coef SE Coef T
P
Constant 142644 28970
4.92 0.000
Finished
Square Feet 108.304 6.670
16.24 0.000
Number
of Bedrooms -9129 3543
-2.58 0.010
Garage
Size (no. of cars) 23130 5647
4.10 0.000
Quality
Index -69804 6753 -10.34
0.000
Lot
Size (in square feet) 1.0664 0.2592
4.11 0.000
S = 67963.2 R-Sq = 76.0% R-Sq(adj) = 75.7%
Analysis of Variance
Source DF
SS MS F
P
Regression 5 7.52751E+12 1.50550E+12
325.94 0.000
Residual
Error 516 2.38340E+12
4618994024
Total 521 9.91091E+12
Below are
several plots of the residuals. What
do these indicate about whether or not our model conditions are met?



After doing a
log (base e) transformation on Sales Price, stepwise and best-subsets choose a
slightly different model (which includes number of bathrooms rather than number
of bedrooms). The regression results and residual plots for this new model are
shown on the backside of this page. What
do you think of the new model?
The regression equation is
Log
Sales Price = 11.9 + 0.000279 Finished Square Feet + 0.0444 Number of Bathrooms
+ 0.0694 Garage Size (no. of
cars) - 0.218 Quality Index
+ 0.000004
Predictor Coef SE Coef T
P
Constant 11.9254 0.0824 144.79
0.000
Finished
Square Feet 0.00027926 0.00001958 14.26
0.000
Number
of Bathrooms 0.04444 0.01266
3.51 0.000
Garage
Size (no. of cars) 0.06940 0.01576
4.40 0.000
Quality
Index -0.21766 0.01976 -11.02
0.000
Lot
Size (in square feet)
0.00000370 0.00000072 5.12
0.000
S = 0.189423 R-Sq = 80.9% R-Sq(adj) = 80.7%
Analysis of Variance
Source DF SS
MS F
P
Regression 5 78.568
15.714 437.94 0.000
Residual
Error 516 18.515
0.036
Total 521 97.083



General Note on Indicator (or “Dummy”)
Variables
None of the
categorical variables was deemed (by our computer procedures) to have a
significant impact on sales price, in the presence of the other variables.
Still, I want you to understand the idea of indicator variables and how to
interpret the coefficients on indicator variables. Included below is the
regression output from a simple regression of sales price on pool status. How do you interpret the slope
coefficient? How would you interpret the coefficients if there were multiple
indicator variables?
The regression equation is
Sales
Price (in dollars) = 272396 + 79724 Pool?
Predictor Coef SE Coef T P
Constant 272396
6195 43.97 0.000
Pool?
79724 23589 3.38 0.001
S = 136564 R-Sq = 2.1%
R-Sq(adj) = 2.0%