FAQ - I have heard that negative marking of multiple-choice questions (MCQs) can have undesirable results. What is the research on this?

Answer

Negative marking or formula scoring are methods of assessment whereby marks are deducted from the overall test score for each wrong answer given.  This contrasts with positive marking or right scoring in which nothing is deducted for giving a wrong answer.

Theoretically, negative marking is intended to discourage guessing.  The largest systematic study comparing positive marking and negative marking was conducted by the Educational Testing Service in 1981 by Angoff and Schrader.  The first ETS study looked at 6260 final year high school students (approximately 17 years old) taking the Scholastic Aptitude Test and at 2306 students taking the Chemistry Advanced Placement Test.  They administered two versions of the tests with rights scoring directions or formula scoring (negative marking) directions.  They then scored all the tests using both scoring methods.  This allowed them to estimate whether examinees changed their performance due to directions, and whether the results changed significantly due to scoring algorithm.

Overall, SAT results showed that using right scoring produced slightly higher scores than formula scoring, but only one of the eight sub-tests showed a statistically significant difference.  This is a timed test, so many students did not finish.  This would, of course, predict that negative marking would result in a lower score (though not necessarily in a different rank order of examinees).

The Chemistry test showed no difference when scored using negative marking versus right scoring.  This suggests that subject-matter tests may be less subject to directions or scoring than aptitude tests, possibly because the examinees are less apt to omit items or perhaps when they guess, they are more likely to do so on the basis of partial information.

Timed tests magnify scoring differences because test takers tend to work more slowly under negative marking directions and therefore complete fewer items than in the same test taken with right scoring.  Examinees under right scoring instructions in the SAT guess more than under negative marking directions, and higher ability students guess more than lower ability ones.  This could suggest that the higher ability students either are more confident or follow directions better (since most rights scoring directions note that there will be no penalty for guessing).  Under negative marking, low ability students actually omit fewer items than high ability, again suggesting that they are not following directions as well. 

A second study by ETS looked at 55,780 students taking the Graduate Management Admission Test under the two sets of directions and scoring. Unlike the SAT, the higher ability students tended to omit fewer items no matter what the directions were.  This could be a matter of age or experience, as these students already had undergraduate degrees and are around 21 years of age.  This study also found that tests having longer time limits produced more reliable results.

A study in the UK of anaesthesiology students (Hammond et al, 1998) found that all candidates benefited from backing educated guesses and almost all benefited even from backing their wild guesses.  The same result has been found in other empirical studies (see Muijtjens et al. for a brief review).  This would suggest students are basing answers on more than random guessing, even when they don't think they are.

In the UK, half the medical schools (11 of 21) who responded to a survey in 1998 (Fowell et al) used negative marking of MCQs, though one was intending to stop.  It would be interesting to have schools compare experiences.

Conclusion: One thing that is clear is that students should told explicitly about the scoring that is being used.  If negative marking is used, it is important to indicate that answering based on partial knowledge (ie. being able to eliminate some options) is generally advisable, but random guessing is not.  This may cut down on wild guesses, but there is some suggestion that this is not a major problem for higher ability students anyway, which is generally going to include most dental, medical and vet students.  The disadvantage of negative scoring is that it takes examinees longer, the directions are more complicated, and it might discourage answering based on good partial knowledge.  Also, if the more able students are more likely to comply with instructions, negative marking may actually penalise the better students who might be less likely to give educated guesses. Of course, another consideration is whether there are desirable alternatives to MCQs, particularly true-false ones.  The National Board of Medical Examiners guide to writing test items (Case and Swanson) notes that "several studies to investigate the optimal number of options for multiple-choice items have consistently found that, other things being equal, more options are better than fewer options.  Based on items used in NBME exams, extended-matching items are more discriminating than any other format". (Stay tuned for a FAQ on EMIs.)


References: Angoff, W.H. and Schrader, W.B. (1981). A study of alternative methods for equating rights scores to formula scores. Research Report from the Educational Testing Service.

Case, S. M. and Swanson, D. B. (2001). Constructing written test questions for the basic and clinical sciences: Third Edition. National Board of Medical Examiners. Available at: http://www.nbme.org/nbme/itemwriting.htm.

Fowell SL, Maudsley G, Maguire P, Leinster SJ, and Bligh J. Student assessment in undergraduate medical education in the United Kingdom 1998. Medical Education, 2000, 34 (Suppl. 1), 1-78.

Hammond, E.J., McIndoe, A.K., Sansome, A.J. and Spargo, P.M. (1998). Multiple-choice examinations: adopting an evidence-based approach to exam technique. Anaesthesia, 53, pp. 1105-1108..

Muijtjens, A.M.M., van Mameren, H., Hoogenboom, R.J.I., Evers, J.L.H. and ven der Vleuten, C.P.M. (1999). The effect of a 'don't know' option on test scores: number-right and formula scoring compared. Medical Education, 33, pp. 267-275.

 

 

Disclaimer: This FAQ was written by Dr Jean McKendree and does not reflect an official endorsement by the HEA or any other organisation.  Any questions or queries should be sent to: enquiries@medev.ac.uk

Last updated: 04 July 2011

 
 
MEDEV, School of Medical Sciences Education Development,
Faculty of Medical Sciences, Newcastle University, NE2 4HH

|