Rob Gould – Nuts and Bolts of Classroom Assessment

 

1.      What is your process of creating an exam?  That is, how do you decide on the topics to cover, types of questions, number of questions, use of technology, use of external aids, re-use of questions, etc.?

 

• In large class, multiple choice exams have many advantages, but require great effort to write.  I write with a colleague (Mahtash Esfandiari)  and we try to create questions at various levels of difficulty that include "distractors" that will help us understand where the students are going astray, and we then feed this information back to the students.

 

• Open-ended questions are essential, but in large classes need to be used with care.  Restricting responses to 3-5 sentences helps, as does creating fairly limited contexts.

 

• A good question is difficult, but "fair", although defining fairness is far from easy.

 

Details

I have two different answers to this question.  The first concerns my typical method for creating tests for some of our large introductory classes.  The second  answer concerns the method I have begun to use for our new "blended instruction" course. Our department has received support from our college to develop and implement a new  format that integrates technological and traditional methods with the goal of improving instruction for large classes.   My colleague Mahtash Esfandiari has been heading this project, and I've worked with her to develop assessment items and, in the process, have learned that developing assessment items intended to be implemented in a computer program and used by many different faculty is a very different experience than what I have been used to.

 

First let me talk about my usual practice.  I have several goals for an exam.  I want to test whether a student understands a particular concept.  I want to understand how well the student understands.  And I want to see how well they can apply the concept in a novel setting. 

 

In addition to these primary goals, there are many "mini" goals.  I want the test to  be  sufficiently challenging that I can discern between different levels of understanding, but I want it to be simple enough that students who have studied the material  can have a sense of progress and accomplishment.

 

There are also some other factors that have to be dealt with when developing exams.  As Jim Stigler pointed out in his analysis of the Trends in International  Mathematics and Science Study (TIMSS),   teaching is a cultural activity, and while he was writing about primary school education, certainly the same is true of college.  One cultural factor that must be taken into account is the students' sense of "fairness".  This is a vague and slippery concept, but students are quick to point out "unfair" questions when they think they've seen them.  I translate this to mean that exam questions need to be similar to the homework and quizzes in terms of difficulty and language used.

 

It is difficult to judge the difficulty or "fairness" of an exam without knowing the context of the classroom that produced it.  I can create a "hard" question just by taking a homework question from another book, even a book that to our minds might seem equivalent in every way to the class text.  The slight changes in wording and tone will often strike some students as "unfairly" difficult. 

 

I haven't had great success dealing with "fairness".  I routinely deal with this complain about difficult questions, and some classes complain more than others.  But I confess I don't lose much sleep over it.  I'm content to include some questions that are very much like the homework (maybe unassigned problems or assigned problems with slight changes) and add one or two of my own that might be more difficult or address topics I'm interested in exploring. 

 

These issues are magnified when trying to construct an all-purpose multiple choice item.  I think both my colleague and I have found that what one of us considers an easy problem the other considers difficult, because, for example, I don't cover that particular topic in my class, or I cover it in less detail than is needed to answer the question.  For example, my colleague created several questions for our test bank that test whether students can choose the best graphic for displaying categorical data to answer a particular question.  For her students this is a moderately difficult question.  I don't really cover categorical data much at all, and for my students this is impossibly difficult.

 

Writing multiple choice questions has its own challenges.  I was once quite dismissive of multiple choice exams, but now see them as a useful but limited tool.   (I should explain that our "blended instruction" class hopes to use automated, multiple choice exams as a method for teaching more students at once and also for providing students with instant feedback so that they can learn from their mistakes.)  On the one hand, they prevent the students from providing nuanced answers and prevent us from testing their ability to express their thoughts in writing.  But their greatest advantage is that, if you write good "distractors", then you also gain a good understanding of the reasons that students are missing questions.

 

Our approach in writing multiple choice questions has been to use our own experience to judge what difficulties students are likely to have for a particular idea or concept, and then include these as distractors.  Sometimes this is easy, for example if asking about interpreting a confidence interval, the only hard part is limiting yourself to choosing just 3 common misinterpretations.  But other times, for example when trying to test if students can correctly describe a scatterplot, this is quite difficult.  We have also borrowed some questions from the ARTIST web site.

 

Even in our blended instruction course we have one "regular" exam, and on this exam we feel it necessary to test the students' ability to write.  A common question describes a data set (how it was collected and why it was collected) and displays (for example) a scatterplot and asks the students to describe the plot.   Students often find these questions to be very "unfair".  First, although "describe" sounds like a simple enough request, we are looking for particular things and students often complain that there are unwritten rules.  I try to make these rules as explicit as possible, for example telling them in advance that I'm looking for them to address the central trend, the shape, the strength, and describe these in context and with respect to the research question asked.  But students still find this hard in part, I think, because few textbooks provide questions like this and in part because students find writing quite difficult.

 

Sample Questions

This is intended to test understanding of the correct interpretation of the correlation coefficient and does so by providing several popular misinterpretations.

 

1) In a famous study, Francis Galton collected the heights of over 1000 fathers and their sons.  He plotted the father's height against the son's height, and found the correlation was about 0.50.  From this we can conclude (circle all that are true)

 

a)               if he had plotted son's height against father's height, the correlation would be  -0.50.

b)               the positive correlation indicates that about 50% of the  fathers tended to be taller than the sons

c)               the father's were about half an inch taller than their sons

d)               taller than average fathers tended to have  taller than average sons

e)               taller than average fathers tended to have shorter than average sons

 

 

2)  (4 points) Suppose that you were the teaching assistant is Statistics 10 and a student came to you with the following plot and wanted you to describe it to him. What would you tell him. Write the major point of your argument in five lines. Do not repeat yourself.

 

The Y axis shows the score of 138 eighth grade boys and girls on diagram problems.

 


3)  The scatterplot shows the number of fatal car accidents in each of the 50 states in 1993 plotted against the number of registered vehicles (in the thousands).

 Which of the following is a valid interpretation of these data?  Choose the best option.  All statements refer to the year 1993.

 

a)         States would have fewer accidents if they had fewer cars on the road.

 

b)         States with a larger number of registered cars tend to have a greater number of fatal accidents.

 

c)         If you live in a state with a greater number of registered vehicles, you are more likely to be involved in a fatal car accident.

 I gave this question on a midterm, and the average score, out of 4 points, was 3.6

 

 

4. A research assistant is working on a project that examines the attitude of the undergraduate students toward downloading music from the Internet. He reports the following data on the number of music files that a randomly selected group of 130 UCLA undergraduates have on their computer:

 

Mean = 792

Standard deviation = 1353

Median = 500

 

Given the above data what can you conclude?

 

a)      There is a calculation error; the mean is smaller than the standard deviation.

 

b)      A few of the students have got an extremely large number of music files.

 

c)      A few of the students have got an extremely small number of music files.

 

d) These findings cannot be trusted, data needs to be collected on a larger sample.

 

Avg Score 2.2

 

 

2.      How do you grade exams?  How much feedback do you give?  How do you decide on partial credit?  How much time do you spend grading?  Any tips to reduce grading time?

 

• Plenty of low-stakes exams that are graded simply (right/wrong) and provide quick feedback and assistance towards improving conceptual understanding.

 

Two to three high-stakes exams (midterm + final) that include partial-credit questions that test the ability to solve problems and explain solutions.

 

Details

Exams serve two purposes (at least!):  summative and formative. The summative purpose is meant to provide the instructor with an idea of how well the student understands the material so that I can assign a grade.  Formative helps students understand their own level of knowledge, confront misconceptions, and adjust their understanding.  Of course, accomplishing this requires a fairly quick turn-around for grading, otherwise students don't remember the question and don't remember what caused them difficulty.   In our classes of 180 students, it was difficult to provide a sufficiently quick turn-around time.  In our blended course, students take weekly multiple-choice exams and get immediate feedback as to whether they got it right or wrong (but not what the correct answer is). They later meet in small groups to discuss the questions they got wrong and then get an opportunity to re-take the exam.  The jury is still out as to the effectiveness of this method, but early results show that the students believe it helps their understanding.

 

For midterms and finals, grading is done with a team of teaching assistants, usually one or two people. I meet with them and go over the exam question by question, and answer their questions.  Each of us is then assigned a question and we grade that question on all of the exams.  I do spot checks to make sure TAs are consistent in their scores, and occasionally adjust their trend either higher or lower. 

 

Influenced by AP grading, I grade most questions on a 0-4 scale, in which 4 shows mastery and 0 does not.  I write down a list of characteristics that a masterful answer should have, and, after reading the first 10 or so exams, try to rank the multitude of possible sins from slight transgressions to grave misunderstandings. 

 

After doing this for awhile, I have gradually adjusted my questions to fit this style of grading.  One side effect is that I'm a little less likely, now, to give very open ended questions (for intro classes) on the order of:  researchers believe this to be true, here is a summary of their data, what do you think?  

 

3.      Besides exams/quizzes, what types of assessment do you use in your courses?  What is your method of creating and grading these assessments (e.g., learning goals, expectations)?

 

• I do very little in large classes, but give projects in smaller courses or in honors sections.

 

• One project is to have students find news reports of studies that use statistics, and compare the media coverage of these studies with the actual studies.

 

Details

In my large courses, the only other material we grade is HW, and it is not graded very carefully.  In smaller courses, I'm a big fan of small projects, but have learned that in a 10-week quarter this can require intensive amounts of effort on my part. 

 

One project I have tried in the past is to have students find an example of a statistics story in the news, and then track down the initial research article that prompted the story, and compare and contrast the two.   This is nice, because some students really shine when they get involved in the story.  But it takes quite a bit of supervision.   The formal grading is a rather traditional A, B, C, or D, but weekly meetings provide for continuous feedback and try to nip misconceptions in the bud.

 

 

4.      How much weight (toward the course grade) is each assessment piece worth in your class?  Do you use any classroom assessment techniques that are not graded?

 

  Typically, HW is 10%, Quizzes 10-15%, Finals 40%, midterms 20% each if there are two. 

 

Details

The exact blend depends on the course.   I like to have lots of low-stakes assessments so that students can make mistakes without being punished and I can grade them harshly without fear of long lines of complaining students outside my office.   One disadvantage of this, I feel, is that students then develop certain expectations about the high stakes exams based on the quizzes and homeworks, and complaining loudly if they feel the quizzes did not prepare them.  This, in turn, encourages me to write harder quizzes, but then students perform so poorly on them that they don't learn really where their misunderstandings lie.

 

 

5.      How do you support the students in preparation for assessment (e.g., review session, drafts, practice problems)?

 

• Almost none, although much time provided for answering student's questions

 

Details

 I'm pretty limited in the support I provide.  My approach is that the homework, quizzes, and class discussions are good preparation for the exams.  I understand that (a) exams create lots of anxiety and (b) a fair amount of thinking happens at the last minute when an exam looms on the horizon.  For this reason,  I make myself very available for questions (both in person and on-line) and ask my TAs to schedule an extra session.  I also sometimes reserve  the class before the midterm just for questions, and tell them about a week in advance to bring questions to class.  However, since in a 10 week quarter we only get about 28 class meetings, and already lose two to midterms, I hate to lose two more, and so will sometimes schedule this session for off-hours, if a room is available.