Quiz Two

The scatterplot below shows the average amount of money spent per household weekly on alcohol and tobacco in 11 regions in Great Britain. The eleventh region is the data point located at (4.02,4.56).

Taking only the first 10 data points into consideration the value of r is 0.784 and the equation of the regression line through the first 10 points is

1. Roughly what value does that regression line predict for tobacco consumption in region 11?

Since the value of x for region is 4.02, the regression line predicts a value of

which is far below the observed value of y = 4.56.

2. What do you think would happen to the value of r if we included the data point for region 11 in the correlation calculation? Why?

Region 11 is a strong x outlier that would pull the regression line up towards it. This would decrease the correlation and the slope of the regression line. In particular, r would end up much closer to 0.

3. Ignoring the eleventh region, there appears to be a strong correlation between spending on alcohol and spending on tobacco. Is this a causal relation? How do you think this strong correlation can be explained?

Although alcohol consumption does not cause tobacco consumption directly, it is quite reasonable to expect them to be closely related. One plausible explanation is that many people may frequently both drink and smoke in social settings (pubs in particular). Thus, rates of participation in those kinds of social gatherings may be acting as a lurking variable with a common influence both on rates of alcohol consumption and rates of smoking.