Math 217 – Assignment 3

Due Friday, January 28 (beginning of class)

 

The data files for these problems are in the math_217 share folder (under Homework Files), so you can copy them to your account (be sure to work from the files on your account, not the share folder).

 

Textbook Problems

Chapter 3

3.16 (Use Calc>Calculator to create squared variables and interaction variables. You will Assess in 3.17.)

3.17

3.20 (Prediction intervals for multiple regression: Through the Options button—from the Stat>Regression>Regression dialog box—you can get “prediction intervals for new observations.” You must enter specific values for each predictor variable—in the order you entered them in your regression model—with spaces between the numbers.)

 

Chapter 4

4.5

·         Create new worksheets of your training sample and holdout sample. Use Data>Copy Columns to Columns—name the new worksheet appropriately and “Subset the Data” including row numbers 1:150 or 151:219, depending on the worksheet you’re creating. Minitab will work with the highlighted—front-most—worksheet, so be careful with your analysis.

 

·         Use the Calc>Calculator to create fitted values for the holdout worksheet (and to create residuals). Use the Calc>Column Statistics to find mean and standard deviation of the residuals.

 

 

 

In-class Presentation for Friday, January 28

The file MLB2007Standings contains winning percentages for all Major League Baseball Teams in 2007. It also contains 11 potential predictor variables, all team statistics (i.e., statistics determined for the whole team, not individual players). There are 10 quantitative variables and 1 categorical variable (League: 0=National League, 1=American League). I pared down the dataset to include team statistics that are understandable to all (but please ask if you have questions!).

 

Your goal is to predict and explain Winning Percentage. Use our model-building outline as a guide for your investigation. I want you to choose one model and defend it (e.g., why exactly do you think this is the best model? What nice properties does it have? How can it be used to both explain and predict winning percentage? And so on).

 

Tips for your presentation:

·         Do not talk about all the models that did not work—focus solely on the model you chose. At the same time, you should be able to address questions about other possible models. For example, if I ask why you didn’t include a certain variable, you should be organized enough that via a brief look at your notes you’ll know if you tried (or didn’t try) a certain model and why you didn’t select it. 

 

·         Include appropriate graphs, numerical tables, written descriptions in your presentation (you can create it in PowerPoint). Act as if you’re presenting these results to a big-wig MLB official. What would this person want to know? How would the information best be presented?

o   This includes all parts of the process: Choose, Fit, Assess, and Use. But in layperson’s term (read: in the context of the problem). And the presentation should focus on the most important information. Then organize your notes so you can easily answer questions from the audience. That is, do not put every single detail in the presentation, but be prepared to answer questions about the details.

 

·         You might both choose the same model. That’s fine. There will still be differences in what you decide to present, etc. Plus, since you will both be engaged in the investigation, you can ask each other pertinent questions. On the other hand, you might determine different models. Then we can have some interesting conversations.

 

·         Our class is generally informal (which is good), but I want this presentation to be more formal. You need not dress up and you certainly don’t need to pretend that we don’t know each other well. So I want you to be at ease, but I want this to be more than simply sitting at a table sharing your results. (We can talk about these fine nuances when we’re together, in person.)