Allan Rossman – Nuts and Bolts of Classroom Assessment

 

1.      What is your process of creating an exam?  That is, how do you decide on the topics to cover, types of questions, number of questions, use of technology, use of external aids, re-use of questions, etc.?

 

Let me try to be provocative right off the bat by addressing the “external aids” question, because I have a strong minority opinion there.  I strongly advocate giving all exams on an open-book, open-notes basis.  Some common arguments against this position are:

 

I find the arguments in favor of open-book, open-notes exams to be more compelling.  These include:

 

In my experience, most students do “get it.”  They generally understand my reasons for giving open-book, open-notes exams, they recognize that the exams will still be challenging, and they accept the responsibility of preparing well for the exams.  Most students find that they do not rely on their book or notes much at all during the exam (as I tell them, to their considerable doubt, ahead of time).  Students also tend to agree that the time that they invest in organizing their notes is an effective studying method.  For some students, they get the hang of this as the term proceeds, which is one reason that I allow the lowest exam score to count for less weight toward the course grade than others.

 

Of course, some students do not rise to the occasion, as two anecdotes reveal.  One student wrote to me on the weekend before her Monday final exam, telling me that she had already sold back her book to a used book buyer, and asking whether I thought she would be at a disadvantage on the exam.  Another student, who never bought the text for the course, approached me two minutes before an exam to express concern that the normal probability table that he had downloaded from the web seemed to be set up differently from the one in the text that I had demonstrated in class.  But these discouraging anecdotes notwithstanding, I have been very pleased by the way that most students have responded to my open-book, open-notes policy.

 

With regard to use of technology during exams, I would like to be consistent and allow students full use of computer software during exams.  But I teach sections that have twice as many students as computers, and some sections that do not meet in a computer lab at all.  So while I consider an open-technology policy to be ideal, I have not found it feasible to implement.  Some might ask why I do not simply give students take-home exams on which they can use technology to their heart’s content.  But that presses the limit of my naiveté about cheating; I do worry that students would engage in excessive unauthorized collaboration if I were to give take-home exams.

 

Instead I include computer output on exams and ask students to interpret it and explain what conclusions they would draw from it.  I sometimes include irrelevant output as well, asking them to distinguish what’s relevant from what’s not.  I also edit the output on occasion and ask students to fill in missing pieces.

 

Now that I have begun with my most provocative position, let me take a step backwards and describe my overall process of creating an exam.  The most valuable advice that I have received, from Joan Garfield and Beth Chance, is: assess what you value!  Students are very quick to tailor their studying to what the instructor reveals to be important by asking for on exams.  So, given my earlier proclamations that reasoning and explaining and interpreting are what I value most, I try to write questions that ask students to demonstrate their abilities in those areas.  One more concrete example is that in many of my courses, one learning goal is to help students to become more critical consumers of statistical information found in news accounts.  I try to assess students’ progress toward this goal by asking exam questions that present them with genuine news accounts and ask them to describe and explain what conclusions they would draw.  I always strive for the majority of points on my exams to come from conceptual/interpretive questions rather than computational/mechanical ones. 

 

One very specific piece of advice about this process that I offer is: always have a colleague review drafts of your exams!  I find this to be extremely valuable for many reasons.  My colleagues can help me to judge whether my scope of questions is adequate, whether my questions are really getting at my learning goals, and whether my exam is really do-able in the time allotted.  I often find myself changing questions substantially, and especially reducing the number of questions on the exam, in response to feedback from colleagues.

 

As for where my exam questions come from, I do often start by revising questions that I have used in previous years.  I also strongly advocate not waiting until the last minute to write an exam; I try to remain constantly on the lookout for good exam questions.  I frequently write notes to myself about potentially good exam questions throughout the term.  Prime times for writing myself these notes are immediately after class while discussion of a particular topic is fresh in my mind, while I am writing homework problems, and while I am grading homework, when common students errors are on my mind.  I do try as much as possible to use genuine data from real studies as the sources of my exam problems.

 

I try not to surprise students with my exam problems.  By this I mean that students have seen similar questions on in-class activities, on quizzes, and on assignments.  I also try to tie some exam questions closely to class activities and homework problems.

 

I have recently started to use the ARTIST (Assessment Resource Tools for Improving Statistical Thinking) database (https://ore.gen.umn.edu/artist//) and the CAOS exam developing by the ARTIST project as sources for final exam questions.  My departmental colleagues and I have begin to use a common final exam question, partly for purposes of evaluating Cal Poly’s general education program.

 

Some of my exam questions are short-answer in format, but most are fairly open-ended and call for explanations and interpretations.  I rarely use multiple choice exams, partly because I find them hard to write well, and partly because open-ended questions allow me to see students’ reasoning process and communication skills.

 

With regard to the number of questions on an exam, I try hard to make time a non-factor.  I do this primarily by providing lots of partial information and reminding students to take advantage of it.  My exams typically have 5-8 multi-part questions.

 

Some examples of exam questions, with brief commentary, follow:

 

1. Researchers conducted a “randomized, double-blind trial” to determine whether taking large amount of Vitamins E protects against prostate cancer (Journal of the National Cancer Institute, 1998). To study this question, they enrolled 29,133 Finnish men, all smokers between the ages of 50 and 69. The men were randomly divided into two groups: One group took vitamin E and a second group took a placebo.  The researchers followed all the men for eight years and then determined how many had developed prostate cancer.  They found that participants taking vitamin E were “significantly” less likely to develop prostate cancer.

a) Explain what “randomized” means in this study and its purpose.

b) Explain what “double-blind” means in the context of this study and its purpose.

c) Explain what “significantly less likely” means in a statistical sense and why it is an important consideration.

d) Based on this report, would you consider it reasonable to conclude that taking vitamin E causes a reduction in the probability of developing prostate cancer?  Explain your reasoning.

e) Based on this report, what population would you be willing to generalize these results to?  Explain your reasoning.

 

The above question is the common final exam question that I just mentioned.  It asks students to describe the importance of some common statistical terms that they are likely to see in news reports, and it also assesses their understanding about the scope of conclusions that one can draw, depending on the design of the study.

 

2. Your text states that “confidence intervals seek to estimate a population parameter with an interval of values calculated from an observed sample statistic.”  Convince me that you understand this statement by describing a situation in which one could use a sample proportion to produce a confidence interval as an estimate of a population proportion.  Clearly identify the population, sample, parameter, and statistic involved in your example.  Do not use any example that appears in your book.

 

The above question is very explicit in testing whether students understand the “big picture” of one of the two main types of statistical inference.  The question is extremely open-ended, leaving students with a lot of latitude for how they can answer, but many students struggle to answer this question thoroughly.

 

3. Statistical evidence played an important role in the murder trial involving Kristen Gilbert, a nurse who was accused of murdering hospital patients by giving them fatal doses of heart stimulant.  Hospital records for an eighteen-month period indicated that of 257 eight-hour shifts that Gilbert worked on, a patient died in 40 of those shifts (15.6%).  But of 1384 eight-hour shifts that Gilbert did not work on, a patient died in only 34 of those shifts (2.5%). 

a) Identify the observational units in this study.

b) Identify the explanatory variable and the response variable in this study.

c) Organize the given information into a two-way table, putting the explanatory variable in columns and the response variable in rows.

d) Calculate the odds ratio of a death occurring on a shift, comparing shifts on which Gilbert worked to shifts on which she did not work.

e) Treat these data as a random sample from a population, and produce a 95% confidence interval for the population odds ratio.

f) Interpret what this confidence interval reveals about the question of whether a significantly higher proportion of deaths occurred on Gilbert’s shifts as compared to other shifts.

g)  Put yourself in the role of the defense attorney who needs to argue that Gilbert was not responsible for any deaths.  Suggest a potential confounding variable that you might use to explain why there was a higher percentage of deaths on Gilbert’s shifts.  Explain how this confounding variable provides an alternative explanation to the prosecution’s contention that Gilbert was responsible.

 

The above question is based on a real study, described in the fourth edition of Statistics: A Guide to the Unknown.  The question begins with very fundamental ideas of observational units and variables, but most students fail to recognize that the observational units in this study are 8-hour shifts, rather than patients.  This question does ask for some calculations, but it also asks for interpretations and concludes with the tricky topic of confounding variables.

 

4. It can be shown that the sum of the residuals from a least squares regression line must always equal zero.

a) Does it follow that the mean of the residuals must always equal zero?  Explain briefly.

b) Does it follow that the median of the residuals must always equal zero?  Explain briefly.

 

This question is easy for most students, but some are confused by the regression setting and so do not realize that the sum equaling zero of course forces the mean to equal zero.

 

5. Suppose that every student in this class scores 10 points lower on the final exam than on the first midterm exam.  What would be the value of the correlation coefficient between midterm exam score and final exam score?  Explain briefly.

 

This question really stumped many of my students last term, for the most common answer was “the same.”  Students who responded like this did not even realize that a correlation coefficient is a number.

 

 

2.      How do you grade exams?  How much feedback do you give?  How do you decide on partial credit?  How much time do you spend grading?  Any tips to reduce grading time?

 

My attitude toward grading exams can be summarized in one word: procrastinate!  I’m just kidding, of course, but I do confess to succumbing to this urge from time to time.

 

I don’t think I have much wisdom to share on this issue.  I grade one problem at a time, in an effort to maintain consistency.  I used to ask students to write their answers in blue books or on their own paper, but I have recently begin to leave space for students to write on the exam itself.  Students seem to like this better, and I think it speeds up my grading a bit.

 

I award partial credit based on partial progress toward solving a problem, or demonstrating partial knowledge of relevant concepts.  All of my exams are worth a total of 100 points, and I try to let students know what the distribution of points is on various parts of a problem.

 

I try to give individual feedback as I grade exams, but I often lapse into writing “see solutions” on students’ papers.  I do post complete solutions on the web, and I devote some class time to going over the solutions.

 

Grading exams is very time-consuming; it is not uncommon for me to spend 8-12 hours grading an exam.

 

 

3.      Besides exams/quizzes, what types of assessment do you use in your courses?  What is your method of creating and grading these assessments (e.g., learning goals, expectations)?

 

I have begun to make routine homework problems from the text optional for students.  I make solutions available and encourage them to ask questions in class and in office hours, but I no longer collect and grade these problems. 

 

I do assign, collect, and grade what I call “investigation assignments.”  These are typically more open-ended and exploratory than textbook exercises, and they often involve use of technology.  One example is asking students to investigate the effect of sample size on statistical significance, and another example is leading students to explore influential observations in a regression context.  These assignments involve a fair amount of writing, and I encourage students to work on them in pairs so that they can learn from each other.

 

In some classes I also assign mini-projects that involve data collection and analysis, with a substantial write-up.  In some other classes I give essay writing assignments that concern contemporary applications and current events uses of statistics.  I have also begin assigning what I call “practice problems” in some classes.  These are short self-tests of whether a student understands the current concepts well enough to be ready to proceed to study new concepts.

 

 

4.      How much weight (toward the course grade) is each assessment piece worth in your class?  Do you use any classroom assessment techniques that are not graded?

 

I play around with different weighting systems every term.  My most recent experience is: 10% quizzes (after dropping 5 of 21), 15% investigation assignments (after dropping 3 of 19), 50% midterms (20% for the highest of three, 15% for each of the two others), and 25% cumulative final exam. 

 

My rationale for the drops is that this policy recognizes that students sometimes have other obligations that prevent them from devoting adequate attention to my class.  With this drop policy I do not accept late work except in extreme cases of illness or personal need.  My reason for assigning lower weight to some midterms is that some students take a while to get used to my exams.  Putting more weight on the final allows students the opportunity to benefit from pulling their knowledge together at the end of the course.  Some students complain about counting quiz an investigation scores toward the grade at all.  With the quizzes my primary purposes are to motivate students to keep up with the material and reward them for doing so.  With the investigation assignments I believe that I am assessing skills, such as use of technology and writing of reports, that are hard to assess in an exam setting.

 

 

5.      How do you support the students in preparation for assessment (e.g., review session, drafts, practice problems)?

 

I provide students with extensive review/preparation advice on the web.  This includes an outline of topics and lots of advice for they can effectively prepare before the exam and take the exam.  I consider this advice to be quite common-sensical, but students seem to appreciate it.  Examples include encouraging them to organize their notes and show up on time, and to take advantage of partial information provided on the exam. 

 

I also typically devote one class period to reviewing and helping students prepare for the exam.  This is one of the few class periods when I lecture for most of the time, rather than having students work on activities.  I spend class time going over the outline that I have posted on the web, summarizing the most important points that I hope they will have learned from investigation assignments, and answering the questions of those who are organized enough to have begin their studying already.  Occasionally I also hold an optional evening Q&A session prior to exams.