Math 117 Computer Lab

Relationships Between Variables: Two-way Tables, Side-by-Side Graphs, and Scatterplots

 

Getting the Needed Files

Double click on the My Computer icon on the desktop. Then double click on the campus_share on 'curtis' (U:)  drive and then the Class_Share folder. Finally, double click on the Math folder and then the math_117 folder. In this folder are (among other files) the PASW files we will use in today’s lab: BrainSize.sav, DrinkingStudyUpdated.sav, and StateData.sav.

 

As a class, we cannot access these share files (only one person can assess them at a time). Thus, you each need to copy the three files to your personal account. You can do this by simply highlighting all the files (click on the first one, then ctrl-click on the others—this should highlight them all). Then press Ctrl-C to copy the files. (We used DrinkingStudy.sav last week in lab, but I added new variables, so be sure to copy the updated file: DrinkingStudyUpdated.sav.) Now open the My Documents folder on the desktop (this is the My Documents folder of your personal account). Once you are in the My Documents folder, hit Ctrl-V to paste the four files into your account.

 

Now open the statistical software (from the Start menu select Programs>Class Programs and then PASW Statistics 18.0). From the File menu select Open>Data, then change the folder to My Documents and open DrinkingStudyUpdated.sav. (Alternatively, you can simply double-click on the DrinkingStudyUpdated file in your account and PASW Statistics will automatically open.)

 

Description of DrinkingStudyUpdated.sav

We considered this data file during the last lab session. Recall, in 2001 (these are actually more recent data than I realized!) the Harvard School of Public Health published a college alcohol study. Samples of students from 140 four-year colleges were asked questions about their alcohol consumption (demographic information was also collected). The 10,904 responses are included in this data set. The students answered questions based on their behavior in the last 30 days (e.g., they recorded how often they drove after drinking within the last 30 days). I’ve added variables to the dataset in hopes of a more thorough analysis. In lab, we’ll discuss all the new variables (remember you can use the “Variable View” to see the detailed descriptions of variables in the worksheet).

 

Analysis

Now we can consider the relationship between categorical variables in this data set. For example, how do the responses about driving after drinking change between men and women? To create a two-way summary table, go to the Analyze menu and select Tables>Custom Tables. (A pleasant reminder dialog box will initially appear—you can select “don’t show this dialog again.”) Drag Sex to the “Row” and Driven After Drinking Alcohol to the “Columns.”  Then click on the Sex rectangle within the table you created, and click on the Summary Statistics button. By default, the count is included as a cell statistic. Also add row % to be displayed (so we can compare percentages for males and females). Note you can select other cell statistics, depending on the questions you want to answer. To exit the dialog box, click on the “Apply to Selection” button. Notice that the “Position” of the summary statistics is by columns. Switch the position to rows and see how it changes the format of the table (you can choose the one that is most pleasing to your eye). Finally, use the Titles button to give the table an appropriate title. The corresponding two-way table is shown in the output window. What do you notice about the relationship between these variables?

 

To create side-by-side pie charts, from the Graph menu select Chart Builder. (A pleasant reminder dialog box will initially appear—you can select “don’t show this dialog again.”). First select Pie/Polar and drag it into the “Chart Preview” box.  Then drag Rode with a Designated Driver into the “Slice by” box. Then select the Groups/Point ID folder tab and check the “Columns panel variable” box. This allows for separate (side-by-side) pie charts based on a second categorical variable—drag Sex into the “Panel?” box on the Chart Preview. Notice the “Element Properties” box to the right of your graph (if it’s not there, click on the “Element Properties” button and a new window will appear). We will change the statistic from count to percent (remember there are more males than females). When using percent, PASW needs you to define the parameters—that is, define the denominator of your percentage. Since we want percentages calculated separately for males and females, choose as the “Denominator of Computing Percentage” the total for each panel, and then hit the Continue button. To make this selection final, you must now click on the Apply button—notice how your graph changes in the Graph Preview. Finally, title your graph. Notice the graphs do not include numerical summaries. If you want the actual percentages along with the pie charts, double click on the graph to invoke the chart editor—from the Elements menu select “Show data labels.”  What do you notice about the relationship between the Rode with a Designated Driver and Sex variables?

__________________________________________________________________________________________

Important Note: The Chart Builder is generally user-friendly. But it’s very time-consuming to write down every little command. That is, my above description is the only place where you get detail commands about the Chart Builder. For the rest of the handout, you’ll get brief notes. The best way to practice with the Chart Builder (say, for homework) is to try something (something reasonable based on the analysis you want), see if it works (and makes sense for the question you’re trying to answer), and make a change if necessary. Clearly, you must keep your brain turned on while using this statistical software—consider the data set-up, variable types, variable contexts, and the research question.

To create side-by-side bar charts, select Graphs>Chart Builder. PASW remembers your past work. Sometimes this is helpful (if you want to make a small change) and sometimes it’s not (if you want to start anew). In this case, we want to start anew, so click the Reset button at the bottom of the dialog box. Let’s investigate, separately for males and females, the responses about serving as a designated driver. There are three different ways to use the Chart Builder to create comparative bar charts—in each case, we must be careful about how the percentage (y-axis) is determined. We’ll work through all three of these methods in lab. Then you can decide which of the graphs you think best portrays the relationship (actually, they all portray the same story, but perhaps one of the graphical displays is most persuasive to you). As with the pie charts, if you want numerical values displayed with your graph, you must choose “Show data labels” from the Chart Editor>Elements menu (after the graph is created).

 

On your own, investigate the relationships between other variables in the data set (either numerically, via a two-way table, or graphically, via bar or pie charts). There are many potential research questions. Remember that you don’t turn off your brain when using statistical software. Think carefully about what questions you’d like to answer, and then perform the appropriate analyses.

 

Description of BrainSize.sav

This data file contains information on IQ scores, brain size (based on total pixel count of an MRI), sex, weight, and height for 40 college students (the students are all Caucasian and right-handed and they attend a large southwestern university). Look at the Variable-View to see the detailed variable descriptions.

 

Analysis

This data set contains 6 quantitative variables and 1 categorical variable (sex). Suppose we’re interested in the distributions (graphically and numerically) of any of the quantitative variables, but separately for males and females. We discussed this briefly last week, but it’s good to review the process. Recall the Analyze>Descriptive Statistics>Explore is a nice place to initially investigate relationships (between a quantitative and a categorical variable). Choose Height for the “Dependent List” (quantitative variable) and  Sex for the “Factor List” (categorical variable). Then click on the Statistics and Plots buttons to select appropriate numerical summaries and graphics. Now interpret the output: How would you describe the distribution of heights, separately for males and females? How exactly do they differ?

 

Notice the boxplots are on the same scale, but the histograms are not. We can use the Graphs>Chart Builder to create separate histograms that are on the same scale. We’ll discuss this in detail in lab. (Quick notes: Select basic histogram from the gallery; choose Height for the x-axis; click on the Groups folder tab and check the box for “Rows panel variable”—this places the histograms vertically, rather than side-by-side, but I think they’re easier to read this way; choose sex as your panel variable; switch the Statistic to “Histogram Percent,” and be sure the percentage is calculated appropriately; title your graph appropriately.) How does this particular graphical display show the differences in the height distributions.

 

We can also us the Graphs>Chart Builder to create separate boxplots. Again, we’ll discuss this in detail in lab. (Quick notes: Select basic boxplot from the gallery; choose Height for the y-axis; choose Sex for the x-axis; for the Statistics choose “value”; to create horizontal boxplots, from the Basic Elements folder tab hit the Transpose button; title your graph appropriately.) Notice the odd scaling of these boxplots. It’s much better if the boxplots take up the whole graph window. Double-click on the graph to invoke the Chart Editor. Then double click on the x-axis. Go to the Scale folder tab and set the minimum to 60 (we’ll discuss why in lab), and then hit the Apply button. Much better! Note that some selections are done within the Chart Builder dialog box and other selections/changes are done from the Chart Editor.

 

Now consider the relationship between the quantitative variables in this data set. To analyze one of these relationships, the first step is to create a scatterplot. Let’s first consider the relationship between the brain size (MRI count) and the weight of these students—weight is the explanatory variable and brain size is the response. Again, use the Graphs>Chart Builder option. (Quick notes: Select basic scatterplot from the gallery; choose Brain size for the y-axis; choose Weight for the x-axis; title your graph appropriately.) How would you describe the relationship between these variables?

 

Should the sex variable be taken into account? To create separate scatterplots for males and females, again use the Graphs>Chart Builder option. There are two ways to create these scatterplots: 1) side-by-side (use the Groups>Column panel variable option) or 2) on the same graph (choose the grouped scatter option from the gallery). We’ll discuss both of these options in lab. Does it seem the relationship between brain size and body weight is different for males and females? How so?

 

On your own, investigate the relationships between other variables in the data set (remember to consider the impact of the sex variable). There are many potential research questions (e.g., can brain size predict IQ?). Remember that you don’t turn off your brain when using statistical software. Think carefully about what questions you’d like to answer, and then perform the appropriate analyses.