Math 217 Computer Lab – Logistic Regression

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the Math217 folder. Make a copy of Senility.MPJ file and put it in your account. Then open the file from your account

 

Description of Senility.MPJ

A sample of elderly people were given a psychiatric examination to determine whether symptoms of senility were present (senility shows a decline or deterioration of physical strength or mental functioning, especially as a result of old age or disease). One possible predictor variable of senility is the score on a subtest of the Wechsler Adult Intelligences Scale. This data file includes the senility value (1 = symptoms present, 0 = no symptoms present) and the WAIS subscale score for the 54 people in the study.

 

Analysis

First look descriptively at the data. Create comparative boxplots and numerical summaries. What do you notice? We want to predict senility based on the Weschsler Adult Intelligences Scale. In this case, the response variable is binary. Hence, we should use logistic regression. From the Stat menu select Regression>Binary Logistic Regression. The response variable is senility and the model simply includes WAIS score (note you can include multiple predictor variables in the model—that is, you can do multiple logistic regression). From the Graphs button, select “Delta chi-square versus probability.” You can look at the Options and Results buttons, but we’ll leave all the default selections.

 

Now consider the output in the session window. First look at the “Test that all slopes are zero” output. This G statistic follows a chi-square distribution with degrees of freedom equal to the number of predictors. Clearly, this shows a significant slope value (p-value = 0.001). The logistic regression table gives estimated coefficients, standard errors, and significance tests (based on the z-distribution). The WAIS score is clearly a significant predictor (p-value = 0.005). Note it also gives the odds ratio, which is simply = 0.72. We can give an interpretation of this value: for each 1 unit increase in WAIS score, the odds of senility decrease by a factor of 0.72. Also, for a score of 10 on the WAIS test, the predicted probability of senility is = 0.30. Or, for a score of 8 on the WAIS test, the predicted probability of senility is =0.45. Of course, our trust in these predictions depends on how good our model is.

 

In the output, notice the “Goodness of Fit” tests. We’ll focus on the Pearson Chi-Square value. In goodness-of-fit tests, the null hypothesis is that the data fit the null distribution well (this is one of the few significance tests where the null hypothesis is actually the “research” hypothesis, so not rejecting the null hypothesis is actually a good thing). In the logistic regression case, the null hypothesis is that the predicted values and actual values agree well (fit well). The Pearson Chi-Square test statistics calculates differences between the observed and predicted values for each observation, squares them, adds them up, and divides each squared difference by an estimate of its variance. Big values of this test statistic indicate that the predicted values don’t fit the actual data well. Small values of this test statistic give no indication that the predicted values don’t fit well. Hence, large p-values give us no indication that our model doesn’t fit well. (This doesn’t necessarily say our model does fit well, but at least we have no evidence against this hypothesis.)

 

It’s also interesting to see how particular values impact the chi-square statistic. For the plot we chose, each observation is removed one at a time from the data set and the summary goodness-of-fit chi-square statistic is recalculated. The change (delta) in chi-square provides an idea of how each particular observation affects the chi-square. You can see from the plot that two of the values stand out.