Double click on
the My Computer icon on the
desktop. Then double click on the campus_share on 'curtis' (U:) drive and then the Class_Share folder. Finally,
double click on the Math folder and then the math_117 folder. In this folder are (among other files)
the PASW files we will use in today’s lab: BrainSize.sav, BodyMeasurements.sav,
HousePrices.sav, and AlcoholMetabolism.sav.
As a class, we
cannot access these share files (only one person can assess them at a time).
Thus, you each need to copy the four files to your personal account. You can do
this by simply highlighting all the files (click on the first one, then ctrl-click
on the others—this should highlight them all). Then press Ctrl-C to copy the
files. (We used BrainSize.sav last week, so if it’s still in your personal
account, you don’t need to recopy.) Now open the My Documents folder on the desktop (this is the My Documents folder of your personal
account). Once you are in the My
Documents folder, hit Ctrl-V to paste the four files into your account.
Now open the statistical
software (from the Start menu
select Programs>Class Programs
and then PASW Statistics 18.0).
From the File menu select Open>Data, then change the folder
to My Documents and open BrainSize.sav.
(Alternatively, you can simply
double-click on the BrainSize file in your account
and PASW Statistics will automatically open.)
Recall last
week we considered the distribution of student heights, separately for males
and females (via side-by-side boxplots, histograms,
and numerical summaries).
Now consider the relationship between
the quantitative variables in this data set. To analyze one of these
relationships, the first step is to create a scatterplot.
Let’s first consider
the relationship between the brain size (MRI count) and the weight of these
students—weight is the explanatory variable and brain size is the response.
Again, use the Graphs>Chart Builder option. (Quick notes: Select basic scatterplot from the gallery; choose Brain size for the
y-axis; choose Weight for the x-axis; title your graph appropriately.) How
would you describe the relationship between these variables? Estimate the
correlation for these data. To find the actual correlation coefficient, use the Analyze>Correlation>Bivariate option. In the dialog box, select weight and
brain size as the variables (note the default correlation is “Pearson,” which
is the correlation we discussed in class). How does the correlation compare
with your estimate?
When
considering weight and brain size, should the sex variable be taken into
account? To create separate scatterplots for males and females, use the
Graphs>Chart Builder option. There are two ways to create these scatterplots: 1) side-by-side (use the Groups>Column
panel variable option) or 2) on the same graph (choose the grouped scatter
option from the gallery). We’ll discuss both of these options in lab. Does it
seem the relationship between brain size and body weight is different for males
and females? How so?
We’ll
investigate this difference more, but first let’s consider a different research
question: can brain size predict IQ? Create a scatterplot
to address this question (you can leave separate colors for males and females).
Is there a relationship between brain size and IQ (either overall or separately
for males and females)? What is an odd feature of this particular scatterplot?
Back to the
body weight as a predictor of brain size (which seems much more fruitful than
brain size predicting IQ). We know this relationship is different for males and
females. If we want to do detailed analyses (rather than simply scatterplots with different symbols), we can analyze the data separately by sex. To do this, in the data window,
select Data>Split File. Click on the “Organize
output by groups” circle and move the sex variable to the box under “Groups
based on:” Now any analyses you choose will be done separately by sex (note that in order to analyze the data again
as a whole, you need to go back to the Split
File option and select “Analyze all cases, do not create groups”).
We noticed the
linear relationship (between body weight and brain size) is stronger for
females than males. Since the data is now split by sex, we can determine
separate correlations (we couldn’t do this with our original data set-up). Go
back to Analyze>Correlation>Bivariate, and
select body weight and brain size as your variables. In the output window, note
you get separate correlations for males and females. Do these agree with our
previous scatterplot? Does the difference between
males and females surprise you? Or do you have an explanation?
Description of HousePrices.sav
This data file
contains information (selling price—in hundreds of dollars—and square feet of
living space) on 102 houses sold in Albuquerque, New Mexico in 1993.
Analysis
What is the
relationship between selling price (in $100) and square feet of living space?
For example, can we use square footage to predict selling price? If so, this
would be helpful information for realtors and home-owners in Albuquerque.
Create a scatterplot of these variables. The
relationship is positive and quite strong. Hence, it seems reasonable to use
regression to explain and predict selling price based on square footage.
There are two ways to do simple regression
in SPSS (one method simply shows a scatterplot with
regression line, which is a nice visual; the other method provides more
detailed—and necessary—information about the regression model). Double-click on
the scatterplot you just created (to invoke the Chart
Editor). From the editor choose Elements>Fit Line At Total. SPSS then
creates a scatterplot with the regression line drawn
in (and gives the
value—it used to provide the equation of the
regression line, but this new version of SPSS does not). Note we have the visual of a regression line—which is nice—but we don’t
have the equation of the line (and we aren’t able to analyze the residuals).
Hence, from this scatterplot-with-regression-line, we
aren’t able to explain or predict selling price and we aren’t able to assess
our model via a residual plot.
For a more detailed regression analysis, go to the Analyze menu and select Regression>Linear. In the dialog
box, select selling price as
the dependent variable (i.e., response variable) and square feet as the
independent variable (i.e., explanatory variable). Then click on the Save button and in the new dialog box
simply select Residuals: Unstandardized (these are the residuals we discussed in the
class). Recall it is important to look at a residual plot for each regression (if the residual plot shows any
type of pattern, this indicates the regression model is somehow inadequate). Saving
the residuals allows us to create this residual plot.
The regression
output is shown in the output window (we’ll discuss this in detail in lab).
There is a lot of output, but one important take-away is the equation of the regression line. In the “Coefficients” output
table, there is a column labeled “B.”
The first value in that column is the numerical value of the y-intercept
of the regression line (in this case, the value is 129.915); the second value
in that column is the numerical value of the slope of the regression line (in
this case, the value is 0.541). Hence,
the equation of the regression line is
,
and we can interpret the slope value in the context of the problem: As the size
of a house increases by 1 square foot, the predicted selling price increases by
0.541 hundred dollars (i.e., $54.10). Or, more informatively, as the size of a
house increases by 100 square feet, the predicted selling price increases by
54.1 hundred dollars (i.e., $5,410).
We have two
diagnostics to assess our model:
and the residual plot. Recall our
interpretation of
(in the context of this problem): 73.5% of the
variation in selling prices is explained by our regression line. This is fairly
high (especially based on a single predictor variable). Now consider the
residuals, which are saved in column 3. To create the basic residual plot (residuals versus
explanatory variable), go to the Graphs>Chart Builder and make a scatterplot with Unstandardized
Residuals on the y-axis and square feet on the x-axis..
It’s nice to have a horizontal line at 0 within the plot, as this is a nice
visual reference of a perfect fit by the regression line. To add a reference line to a plot, first
double-click on the graph to invoke the graph editor. Then select Options>Add
Y-Axis Reference Line. In the new dialog box, change the “Position” to 0, and
then hit the Apply button. Does the residual plot show a random scatter of
points? Or a pattern? (Remember, don’t read too much into these plots—think big
picture.)
Based on our diagnostics, a line is the
best summary of the relationship in the data (based on the residual plot), and
our line explains 73.5% of the variation in selling price. So a realtor in Albuquerque should feel
good using this line to predict selling price (within the range of our data)
and to explain the impact of each additional square feet of housing space. If
the realtor wants an even better model, he can include additional variables
(multiple regression). Thoughts on other variables that might be good
predictors of selling price?
Description
of AlcoholMetabolism.sav
Case study from The Statistical Sleuth,
Second Edition, by Ramsey and Schafer, Duxbury Publishers:
“Women exhibit a lower tolerance for alcohol and develop alcohol-related
liver disease more readily than men. When men and women of the same size and
drinking history consume equal amounts of alcohol, the women on average carry a
higher concentration of alcohol in their bloodstream. According to a team of
Italian researchers, this occurs because alcohol-degrading enzymes in the
stomach (where alcohol is partially metabolized before it enters the
bloodstream and is eventually metabolized by the liver) are more active in men
than in women. The researchers studied the extent to which the activity of the
enzyme explained the first-pass alcohol metabolism and the extent to which it
explained the differences in first-pass metabolism between women and men. This
data file includes their data (M. Frezza, et al.
(1990), “High Blood Alcohol Levels in Women,” New England Journal of Medicine, 322, pp. 95-99).”
“The subjects were 18 women and 14 men, all volunteers living in
Analysis
You need to think about how to analyze these data appropriately—graphically
and numerically—using SPSS (and your mind). Consider the following questions of
interest. To answer each question, perform an appropriate and complete analysis
in SPSS. Be prepared to share your results with the whole class. And feel free
to ask question while your work.