REAP07: Assessment design for learner responsibility
Online Conference (29-31 May, 2007)

Re-Engineering Assessment Practices -- Univs. of Strathclyde, Glasgow, Glasgow Caledonian


Proceedings for Session :   Raising students' meta-cognition (self-assessment) abilities
Case Study :  Certainty-Based Marking (CBM)

Paper :  Certainty-Based Marking (CBM) for reflective learning and proper knowledge assessment (Click here to see the Paper)
                A.R. Gardner-Medwin (University College London) &  N.A. Curtin (Imperial College London),  UK
Review:   by Dr. Nigel Watson, University of Strathclyde (Click Here to see Review)
Chat Session (30/5/07 9-10a.m.) :  Click here to see Transcript

FORUM POSTINGS ON CBM CASE STUDY:
NB you can contribute to further discussion (anonymously if you wish) by using the COMMENT site
(All links from here will open in a separate window).

Response to Review

 

Author

Messages

 

Tony Gardner-Medwin
Posts: 14


30/05/2007 07:39

 

Nigel, thanks for the stimulating comments in your review. It's interesting that the concerns you raise about CBM are all about use in summative assessment. Our main objective has been to improve students' habits of thought and study while learning (formative assessment), and it was surprising to me, when CBM was used in UCL exams, that the concerns you express about summative assessment simply didn't materialise.

You ask about standard setting. As we show in the paper, CBM can generate a score (CBS) that is on average equivalent (for both T/F and best-of-5 MCQs) to the % correct above chance. I think you may have misunderstood this "% correct above chance" concept - known perhaps more simply at Imperial as "% knowledge". This has nothing to do with confidence ratings, but is simply a way of expressing the conventional score ("number correct"), scaled so that complete guesses would give 0% and perfect scores 100%. For example, on a T/F exam guesses would yield on average 50% correct, so a passmark of 75% correct answers corresponds to "50% knowledge" (half way between guesses and perfection). Similarly with best-of-5 exam Qs, guesses would on average give 20% correct, so it is 60% correct that corresponds to "50% knowledge". Whatever passmark you set based on criterion referenced standard setting procedures, if you scale this to "% knowledge" then the equivalent CBS passmark will be the same, as shown in the empirical relationship in Fig. 2.

Yes, a weak but confident student may occasionally do well through luck, as with any assessment scheme. The nature of CBM means however that both in theory and in practice a weak student will lose by claiming confidence where it is not justified. A strong student will also lose through inappropriate diffidence (too often choosing low certainty levels). But experience shows that practised students become well calibrated to use certainty levels appropriately (Gardner-Medwin & Gahan, 2003: http://www.ucl.ac.uk/~ucgbarg/tea/caa03.doc ). The students who do best under CBM, with the same number of correct answers, are those with better than average ability to identify which of their answers are reliable and which are based on uncertain knowledge or reasoning. Knowledge, after all, is not just a matter of being right but of knowing that you are right.

The query about whether CBM enhances stress is interesting, and I hadn't encountered it before. It doesn't seem likely to be a major student concern, since in a 2005 survey, less than 1/3 of UCL students with experience of CBM in year 1, 2 exams voted to drop CBM from exams.
Imperial College cautiously allowed extra time when it first introduced CBM into formative exams, but it became clear that extra time was not needed. It is an interesting fact about the brain that when it comes up with an answer, this seems to come packaged with a certainty judgement. The judgement may be wrong, which can in many walks of life be a disaster. But it can be refined or corrected by reflection. CBM helps to train students to make such judgements carefully, correctly and realistically and rewards them accordingly.

Does CBM assess reasoning? Certainly CBM can be used with questions that require reasoning. CBM puts a premium on care in such reasoning. Many of our incoming students - I see this especially in maths for medical students - have got in the habit of giving answers with very little thought. Since they are highly selected students, these answers are often right. But at university, this simply isn't good enough. If it were possible, we would of course always like to assess and comment on the quality or steps in a student's reasoning, and not just base a mark on the final answer and confidence expressed for whether it is right. But this requires 'hand' marking, and is not always possible. CBM motivates extra care in reasoning, realistic self-appraisal and checking. There is surely no way that reflection about the quality of reasoning is in any way different or less valuable than reasoning itself.

Thanks again - Tony G-M

Try it

Author

Messages

 

Tony Gardner-Medwin
Posts: 14

25/05/2007 18:44

 

If you want to know what Certainty-Based Marking is all about, go to http://www.ucl.ac.uk/lapt and click on "Exercises".



CBM: student involvement?

Author

Messages

 

Jane MacKenzie
Posts: 10

30/05/2007 10:24

 

I think David Nichol raised the issue of getting students involved in setting CBM questions. Is anyone doing this either in the context of CBM or other objective questions?

 

Tony Gardner-Medwin
Posts: 14


30/05/2007 11:42

 

In the chat session, Nancy and I said how, both at UCL and Imperial, we have got students to write Qs and explanations, paid them, and got staff to vet their work. A few students (mainly graduate medical students) at UCL in my experience have interestingly told me that they have written Qs for LAPT (our CBM tool) as a stimulus to their own study and revision. Some of these we have used.

There is no real difference between writing objective Qs for CBM or for conventional right/wrong marking. When writing Qs for students to use for self-assessment, to aid learning, there is however a bit of a plus when using CBM, which is that there is less problem mixing questions of different levels of difficulty, suitable for a wide range of abilities - as conspicuously when writing maths exercises for medical students. Good students get a kick out of being confident about the easy answers, while it can be very revealing to see how often students are unsure about even real basics.

 

Interesting concept but ..... 

Author

Messages

 

Madan Gupta
Posts: 1

30/05/2007 12:02

 

Hi Tony and Nancy,

CBM seems to very interesting and innovative self-assessment concept. But I am not clear how the students are informed what is the correct answers and in case of mathematical questions where did they go wrong in solving a given question.

 

Tony Gardner-Medwin
Posts: 14

30/05/2007 12:58

 

Hi Madan, The choices about how to give feedback and diagnosing where students have gone wrong is really more to do with how sophisticated your software is, and how much effort you want to put into devising good answer checking, than it is to do with CBM. With CBM it is, however, particularly valuable to give immediate feedback while the student is still thinking about why they may feel sure or uncertain that they have the right answer. The LAPT software for CBM is actually quite sophisticated, and can give appropriate explanations for specific correct answers ( A() below ) or incorrect answers ( I() below ). It can also base mathematical questions on randomised input values, if you wish - though I don't do this in the following example, which defines ranges of allowable values and variations of unit spelling in the answers, and chastises the student (though marking them correct) for inappropriate precision or lack of units. Insufficient precision, wrong units, and a comon error in working are each picked up and marked wrong. I seldom go to this kind of trouble in wriiting Qs, since a student trying to learn can usually identify their error by reading a simple explanation, as at the end here.

Q("Convert 5.23 miles to km.");
A("8.42 km"); // model answer
A("{8.4168 8.4i69} *","Yes, but since the distance was only given to 3 significant figures, you have given an inappropriately precise answer");
A("{8.41 8.43} km kilom* ","Good answer!"); //accepts km as unit, or anything starting kilom
A("{8.41 8.43} ","Yes, but you really should have included the units.");
I("{8.41 8.43} * ","The units should be km.");
I("{8.4 8.5} *","You haven't given the answer accurately enough");
I("{3.2 3.3} *","No. You have used the conversion factor (0.6214 km/mi) the wrong way round.");
E("The conversion factor is 0.62137 mi/km. So 5.23 mi = 5.23 / 0.62137 mi * (km/mi) = 8.42 km, to 3 s.f.");

 

CBM and peers 

Author

Messages

 

Jane MacKenzie
Posts: 10

30/05/2007 10:32

 

Nancy and Tony, I wondered if you'd care to elaborate on the use of CBM in groups.

Jane

 

Nancy Curtin
Posts: 1

30/05/2007 14:05

 

I've had students do exercises that included CBM where each pair of students worked on a computer. This was first Yr medical students. I had about 5 or 6 pairs of students at once (but I'm sure that I could have dealt with more pairs). After a brief introduction to the system,they got on with it themselves. If a pair wanted to discuss anything with me, they did (about the science or technical problems or understanding the summary of their performance that they get at the end). It worked well in that they all finished the exercise, discussed between the members of own pair, and did not disintegrate into a general free-for-all-discussion. They were free to do the exercise again & always did better 2nd time through. I cannot remember doing this, but it obviously would be good to have a discussion at the end of the session about which were the hardest questions, suggestions for other questions, or improvements to explanations

 

 

Can CBM be used in subjective and objective questions?

Author

Messages

 

Jane MacKenzie
Posts: 10

30/05/2007 10:27

 

Tony, I think you said that the questions had to be objective. But I was wondering (perhaps my ignorance) about clinical judgement. Surely, there's room for assessments with a confidence rating in clinics. e.g. a scenario with patient X presenting with Y symptoms. What should the clinicians first action be: X-ray, other form of scan, observation etc. See now I sound quite dense. What do you think?

Jane

 

Tony Gardner-Medwin
Posts: 14

30/05/2007 11:24

 

Don't apologise! This type of Q is quite common in clinical exams, and I think it can be quite a big issue whether the answers are objective or subjective. Neither Nancy Curtin nor I are clinicians, so I hope there may be some around who might comment. Usually these Qs are presented as multiple-choice, or extended matching Q sets (where several Qs may share the same set of response options).

Clearly sometimes there may be options that are categorically wrong. A choice between options may also sometimes be a matter of fine clinical judgement. As far as I know, however, these Qs are usually marked with just one option being correct and all the others being marked wrong. You would need to provide a specific example to set up a debate about whether such categorical right/wrong marking is what should happen. I suppose in a lot of cases there is an implicit rephrasing of such Qs along the lines: "In this scenario, which option would generally nowadays be regarded as clinical best practice?" which makes the Q a bit more objective, albeit also a bit different. If two of the options seem almost equally acceptable, then nobody - experienced clinician or fresh student - should be expressing high confidence that the choice they eventually plump for will be the one that is marked correct. But this is really a criticism of a (possibly only hypothetical) system that treats such Qs as suitable for just right/wrong marking. There could of course be a different rubric that said that though the student must choose one option, two or more of the options may be marked correct - if there are issues of fine judgement between some of the options, then this would seem much more appropriate to me, and would enable the student once again to be confident (or not) about whether their answer was definitely consistent with best practice.

A recent clinical teacher at Imperial (Dr. Sara Marshall) interestingly mapped the C=1,2,3 decision as follows, for junior doctors when discussing a scenario:
1: I am guessing, but I think this is the correct answer
2: I am pretty sure I am correct but need advice before proceeding
3: I am happy to proceed
The real penalties for (3) in combination with a wrong judgement can be a lot worse than -6. But apparently this approach struck a useful chord with the junior doctors.

 

Jane MacKenzie
Posts: 10

30/05/2007 15:52

 

Hi Tony, yes it would be useful to have an example. It just seems to me that when talking to clinicians the clinical 'judgement' is fundamental to the learning process. However, at uppers levels (ie. the year or two before a clinician qualifies) a lot of the learning and assessment happens in clinics and the assessment might be highly subjective (don't tell any clinicians I said that). I think the concept of confidence rating might be very useful to clinical supervisors even if used quite informally. Maybe they do already?

Jane

 

Tony Gardner-Medwin
Posts: 14

30/05/2007 17:38

 

Thanks Jane. You wonder if maybe clinicians already use a form of CBM thinking informally, when assessing students in the clinic. I think yes, that's bound to be true. Part of the idea of CBM in objective testing is to regain part of what is normally lost without direct interaction between student and assessor. In a face to face viva or conversation we are always picking up subtle cues about whether the other person is hesitant or confident of their position. When we aren't sure, we challenge: 'You seem a bit unsure' or 'Really?', etc. to see what happens. A student who has not prepared his/her ground can easily be forced to admit it, or else feel the noose tightening as they are forced into a confident sounding error.

Doctors know that tricky clinical judgements require a sound knowledge base and an acute awareness of where one's knowledge reaches its limits. Partly I think because of this CBM element, vivas (oral assessments) are about the most effective assessments for getting students to do their homework beforehand. They are good for probing whether a student really knows the topic they are discussing. They are very good at establishing whether a student's written work has been plagiarised. But of course they are expensive and necessarily short, so tend to be unreliable at assessing breadth of knowledge.

 

 

Individualistic or peer?

Author

Messages

 

Steve Draper
Posts: 25

31/05/2007 12:02

 

Tony,
I'm going to try to express my reactions to your CBM work, and case study paper. My question really for these reactions is: do you have anything to say about them and/or are they silly?
1) first, I'm impressed: there is too little work that can show it has positive effects on learning; that has evidence of this, and some theory underpinning it too. That's why I think we are lucky to have you contribute to this conference; and why I recommended your work last week to the relevant subdean of our medical school.
2) My next feeling is that it's clever to use the indirect manipulation of a marks scheme to get students to learn better. Most of us first think of more direct manipulation such as study skills training, exhortation, etc. But when indirect works, then it leads to more integrated and less invasive approach. Other designs I admire often have a similar quality, though more often the other way round: manipulating something at the front end in learning activities and then getting b etter outputs only indirectly asked for e.g. Baxter gets them to do regular exercises and only enforces participation, and they later discover how much better prepared for the exam they are.
3) I also agree there is an independent benefit here of getting medical students to have a better grasp of their confidence in each item for practical professional reasons: that such thinking about confidence should continue to be part of their professional reasoning throughout their careers. I.e. b eing self-critical is a practial professional skill.

4) But it is all unrelentlingly individualistic. I wonder what your thoughts were about the (in the literature) more common route of using peer discussion to get students used to the idea that not everyone thinks the same, that they need to be able to give reasons for (and against) their view, to weigh alternative views. Classically (and in Piaget) it is peer discussion that is used as the place and stimulus for that.

What about Abercrombie's wonderful book, again with medical students.

Abercrombie, M.L.J. (1960) The anatomy of judgement: An investigation into the processes of perception and reasoning (
London: Free Association Books)

5) And again this can be justified in prof.skills terms. Whereas 50 years ago in the bad old days a doctor might expect his opinions to be taken as authority, that no longer works so well with patients. Since a lot of "cures" depend on patients following instructions ("complying"), being persuasive has a directd effect on health. CBM is part of the old world of never justifying yourself to others, just honing a private skill.

SteveD

 

Tony Gardner-Medwin
Posts: 14

31/05/2007 14:26

 

Thanks for your comments, Steve. Only one point (your last sentence) do I radically disagree with ("CBM is part of the old world of justifying yourself to others ..."). Rather, CBM is part of a new world in which you are instinctively always ready to justify your beliefs, or else acknowledge them as uncertain. A bit insulting to call this a new world - it is part of what has always been valued in every time and every clime.

When you have a hobby horse - as I certainly do over CBM (amongst many other things, I might add!) - there is a hazard that everybody thinks you want your idea to take over the world. Absolutely not. CBM is in no way meant to supplant peer or teacher based learning, or assessment methods that require hand-marking. But it is meant to supplant (in many, not all, situations*) the use of objective Qs without CBM. What I argue is that if you are going to use objective right/wrong questions (which many of us are forced to think about for all too familiar reasons), then you should combine them with CBM.

I am a great proponent of tutorials and interactive practical and
CAL sessions with teachers, and of peer learning. This despite the fact that most of the learning I did as a student was done by challenging myself and the satisfaction it gave me to achieve deep understanding by this route. I do argue that objective testing (and of course CBM) are often undervalued, and that a major benefit of using them to stimulate and test breadth of learning is that it frees staff time to indulge in the most satisfying aspects of interactive teaching - the development of the highest level skills.

Thanks for the Abercrombie ref.. I have already got it from the library (a great plus for an online conference!) and am pleased to see it was conceived at UCL. I think I shall be in strong agreement with its thesis. But the ability to toss around different sides to an argument is not restricted to group interaction - it can happen (albeit sometimes not so well) in a single brain. We need to encourage this in students in every way we can.

Tony GM

* For example, a colleague showed me a nice biochemistry exercise in which performance requires a multitude of small decisions (about genetic translation). To express confidence for each action would have involved twice as many clicks and little added value.

 

Tony Gardner-Medwin
Posts: 14

31/05/2007 14:56

 

I should have picked up on the merits of peer discussion. Dr. Curtin and I both encourage (as discussed in chat and elsewhere) students to work in pairs on formative and revision exercises, and things like practical follow-up exercises. When you watch the group interaction between students, it is often getting agreement on C=1,2, or 3 that stimulates the valuable interaction. Nancy pointed out that an element of what is happening here is that the students are choosing whether this is something they wish to be assessed on :- C=3 is tantamount to saying "we (or I) think we know this topic", and it is weighted more heavily. Each student is motivated to check up on the other's justification for thinking they know it or don't, and what arguments can be put forward to justify certainty or uncertainty. This idea of justifying uncertainty is something that sometimes stalls people. But it's a very real part of peer-peer (or good one-brain thinking) - where you eventually argue to the conclusion you must look this up in the book.

We think that 2 is the best number in peer discussion of CBM. Students tend it seems to interact more honestly in pairs. With 3 or more there is often a tendency to show off, or bluff confidence, and less willingness in the group to be the one who challenges such behaviour. Working on your own with CBM (and all its feedback, diagnosing such behaviour as overconfident) is probably the best way to persuade such a student to get real.

Tony GM & Nancy Curtin

 

Steve Draper
Posts: 25

31/05/2007 15:50

 

Tony

Thanks for those replies.

I should have anticipated them for what else you've written, really.

I suppose (this is a comment about style), that in the best tradition of science papers, you write about CBM in a single minded way, because that makes the arguments clearest (for the reader as well as author), but I fell into the trap of assuming that meant that was all you thought.

Another kind of paper would have you discuss the role of CBM in a full programme (along with the other methods you say you use as well), and how they fit together.

But thanks for replying to my somewhat lazy reading!

steveD

 

 

Provocative Thoughts

Author

Messages

 

Tony Gardner-Medwin
Posts: 14

30/05/2007 18:09

 

Is anyone interested in challenging (or even supporting) some of the deliberately provocative ideas we raised in our paper? If you want to, it might be good to make them into individual new topics in this forum.

IDEAS WE AGREE WITH:
1. Objective testing need NOT simply test factual knowledge and encourage rote learning.
2. Objective testing is for some (not all) purposes BETTER assessment than essays or problems.
3. The notion that you should use 'modern' question formats like single-best-answer or extended matching questions rather than 'outdated' True/False questions is often generalised far beyond any valid supporting evidence we know of. T/F questions are often BEST PRACTICE.
4. It is (common) BAD PRACTICE to include a 'Don't Know' option with T/F or Best-Option Qs.

IDEAS WE DON'T AGREE WITH:
5. All forms of negative marking are de-motivating to students. You must use carrots, not sticks.
6. Objective testing has no place in subjects like social science or psychology
7. True/False questions are harder to write than Multi-Choice questions
8. A uniform question type in exams should be used, to avoid confusing students

 

Jane MacKenzie
Posts: 10

31/05/2007 14:38

 

Hi Tony, I'll pick up one of your provocative thoughts. You say: "4. It is (common) BAD PRACTICE to include a 'Don't Know' option with T/F or Best-Option Qs." Could you elaborate cos I just don't see why?

Jane

 

Tony Gardner-Medwin
Posts: 14

31/05/2007 17:35

 

Fair enough! Some of these points are actually covered in slides from a recent Physiological Soc presentation* that you can see at http://www.ucl.ac.uk/~ucgbarg/tea/UCL06_tw_ppt.pdf. I'll answer your "don't know" point in a separate topic.

Thanks incidentally for your many stimulating contributions. It's great to get these kinds of questions, even though I'm not always sure I can handle them very well. Too many staff just decline to enter into debate about assessment or teaching.
Tony

*
Gardner-Medwin AR, Curtin NA (2006). Certainty-based marking at UCL and Imperial College. Physiological Society Teaching Workshop, Proc Physiol Soc series. , 3, WA4

 

 

BAD PRACTICE to include a "don't know" option

Author

Messages

 

Tony Gardner-Medwin
Posts: 14

31/05/2007 17:48

 

From Jane MacKenzie: Hi Tony, I'll pick up one of your provocative thoughts. You say: "4. It is (common) BAD PRACTICE to include a 'Don't Know' option with T/F or Best-Option Qs." Could you elaborate cos I just don't see why?
................

The implication of providing a don't know option is that students should use this option if their uncertainty is above some level. This is practically never sensible! It can only be rational for the student to omit a guess if the average penalty for a guess wold be worse than for a blank reply. This degree of negative marking is seldom deemed acceptable, and would put stress on candidates: Am I really so uncertain I should tick "don't know"?

To include "don't know" can actually selectively disadvantage the more able students - because they are the ones who have more insight into which answers are uncertain, and they could therefore be persuaded to omit uncertain answers, either because teachers exhort them to do so, or just by the implication of the presence of the "don't know" box - that ticking must sometimes be sensible. The same effect may also disadvantage the gullible or more deferential student who may be more inclined to follow teachers' advice, even if they have a hunch it's a bit dubious. Unless there is a very severe level of negative marking (for example, you could justify it if you use +1/-1 on best of 5 MCQs, where a guess would give you an average of -0.6 marks) students should always expect to gain by answering. It is never rational to believe that your answer is less likely to be right than a complete guess, so unless a guess would penalise you more than a "don't know", it is never rational to use the "don't know".

CBM of course does include strong (double) negative marking if you choose C=3. If a student is uncertain about an answer, they have the option to downgrade or avoid this negative marking by acknowledging uncertainty. Saying C=1 is equivalent to saying "I'm pretty unsure" (I think there is less than 2/3 probability I'm getting it right). It is worth answering at C=1 even if you think the answer is a near or total guess - though you're not going to pass an exam if all your answers are in this category.

Hope this makes sense, and thanks again.
Tony