Math 217 – Assignment 3
Due Friday,
January 28 (beginning of class)
The data files
for these problems are in the math_217 share folder (under Homework Files), so
you can copy them to your account (be sure to work from the files on your
account, not the share folder).
Textbook Problems
Chapter 3
3.16 (Use Calc>Calculator to create squared variables and
interaction variables. You will Assess in 3.17.)
3.17
3.20 (Prediction intervals for multiple regression: Through the
Options button—from the Stat>Regression>Regression dialog box—you can get
“prediction intervals for new observations.” You must enter specific values for
each predictor variable—in the order you entered them in your regression
model—with spaces between the numbers.)
Chapter 4
4.5
·
Create
new worksheets of your training sample and holdout sample. Use Data>Copy
Columns to Columns—name the new worksheet appropriately and “Subset the Data”
including row numbers 1:150 or 151:219, depending on the worksheet you’re
creating. Minitab will work with the highlighted—front-most—worksheet, so be
careful with your analysis.
·
Use
the Calc>Calculator to create fitted values for the holdout worksheet (and
to create residuals). Use the Calc>Column Statistics to find mean and
standard deviation of the residuals.
In-class Presentation for Friday,
January 28
The file
MLB2007Standings contains winning percentages for all Major League Baseball
Teams in 2007. It also contains 11 potential predictor variables, all team
statistics (i.e., statistics
determined for the whole team, not individual players). There are 10
quantitative variables and 1 categorical variable (League: 0=National League,
1=American League). I pared down the dataset to include team statistics that
are understandable to all (but please ask if you have questions!).
Your goal is to predict and explain
Winning Percentage. Use our model-building outline as a guide for your
investigation. I want you to choose one model and defend it (e.g., why exactly
do you think this is the best model? What nice properties does it have? How can
it be used to both explain and predict winning percentage? And so on).
Tips for your presentation:
·
Do
not talk about all the models that did not work—focus solely on the model you
chose. At the same time, you should be able to address questions about other
possible models. For example, if I ask why you didn’t include a certain
variable, you should be organized enough that via a brief look at your notes
you’ll know if you tried (or didn’t try) a certain model and why you didn’t
select it.
·
Include
appropriate graphs, numerical tables, written descriptions in your presentation
(you can create it in PowerPoint). Act as if you’re presenting these results to
a big-wig MLB official. What would this person want to know? How would the
information best be presented?
o
This
includes all parts of the process: Choose, Fit, Assess, and Use. But in
layperson’s term (read: in the context of the problem). And the presentation
should focus on the most important information. Then organize your notes so you
can easily answer questions from the audience. That is, do not put every single
detail in the presentation, but be prepared to answer questions about the
details.
·
You
might both choose the same model. That’s fine. There will still be differences
in what you decide to present, etc. Plus, since you will both be engaged in the
investigation, you can ask each other pertinent questions. On the other hand,
you might determine different models. Then we can have some interesting
conversations.
·
Our
class is generally informal (which is good), but I want this presentation to be
more formal. You need not dress up and you certainly don’t need to pretend that
we don’t know each other well. So I want you to be at ease, but I want this to
be more than simply sitting at a table sharing your results. (We can talk about
these fine nuances when we’re together, in person.)