Theme: Professionalism including diversity and performance based assessments

Setting the standard for a new assessment: comparing outcomes using the Angoff, borderline regression and Cohen methods

  • Alexandra Kirby, National School of Healthcare Science
  • Corresponding author *Kirby A, Chamberlain S, Gay S

One of the most challenging and technical aspects of assessment design and delivery is defining the pass standard to ensure that individuals get the outcomes that they truly deserve. The consequences of selecting an inappropriate standard setting method, and setting a pass mark that is either too high or too low can be significant for the test-takers and, for professional examinations, all stakeholders in the field in which the test-taker has, or has not, qualified. The assessment literature provides guidance on the various standard setting methods available and the practical and theoretical considerations in selecting and applying a method that is appropriately aligned to the assessment context. However, this guidance can be difficult to apply in practice for new assessments. This is particularly the case when the new assessment consists of assessment tasks that are newly-created and their difficulty is untested; there are no prior measures of cohort ability; and the whole assessment infrastructure lacks maturity.

This paper describes these standard setting challenges in the context of a new, national assessment for clinical scientists. Three standard setting methods are used to model the outcomes of this new assessment to explore the impact on pass and fail rates. The methods used are modified Angoff, borderline regression and Cohen. The findings suggest that, in the context of a new assessment, the borderline regression method may lead to artificially inflated pass marks. Cohen and Angoff, although very different approaches to setting standards, seemed to produce similar pass marks. It is concluded that, for new assessments where there is uncertainty about the likely performance of each test facet, a blend of methods or a compromise method may be necessary.

Examining Gender Bias within the Multiple Mini Interview Using Multifaceted Rasch Modelling

  • Corresponding author, Adrian Husbands, University Of Buckingham


The Multiple Mini Interview (MMI) is the primary admissions tool used to assess non-cognitive skills at an increasing number of medical schools. While statistically significant gender differences in performance have been observed in a number of studies none have compared gender differences among candidates at the same ability level. This study examines gender bias among MMI stations using Multi-faceted Rasch Model (MFRM).

Summary of work

A total of 563 candidates attempted the Dundee MMIs during the 2014-2015 admissions cycle. MFRM was used to adjust MMI scores for candidate ability, examiner stringency or leniency and station difficulty. Differential Item Functioning (DIF) analysis determined whether male or female candidates at the same level of ability were more likely to achieve higher station scores. Interpretation of results was conducted from a

Summary of results

Separation-index reliability for the MMI was acceptable (.91) and separated candidates into 3 distinct ability groups. All 22 MMI stations showed a good fit to the Rasch model. DIF parameter magnitudes ranged from 0.01 to 0.28 logits, with measurement errors of between 0.06 and 0.12 logits. While three stations showed statistically significant DIF (p

Re-Thinking Remediation: Using behavioural change theories to inform the development of remediation plans for doctors with performance concerns

  • Dr Linda Prescott-Clements, National Clinical Assessment Service (NCAS), NHSLA
  • Corresponding author, *Prescott-Clements L, Voller V, Bell M, Nestors N and van der Vleuten C

For practicing doctors, assessment is often experienced in the context of revalidation / recertification, whereby the outcomes are used as evidence to demonstrate that the practitioner is competent, safe and remains fit for practice. However, such assessment may also highlight areas of performance giving rise to concern. In this context, a comprehensive assessment of performance in the workplace, encompassing an assessment of the practitioner’s health, behaviour, and working environment, might have a ‘diagnostic’ role in determining the extent of performance concerns which can then be used to inform the practitioner’s requirements in terms of remediation.

Evidence suggests that performance concerns are often complex involving multifactorial issues, encompassing knowledge, skills and professional behaviours. It has also been established that practitioners may perform poorly, despite having the necessary knowledge and skills, and competence does not always lead to consistently good performance. In such instances, it is important that “where possible and appropriate“ practitioners are supported through effective remediation in order to return them to safe, clinical practice.

A review of the literature on remediation demonstrated that research in this area is in its infancy, and little is currently known about the effectiveness of remediation programmes and the design features or implementation strategies associated with success. Current strategies for the development of remediation programmes are to mostly ‘intuitive’, with few being based upon established cognitive or adult learning theories.

In recognition that performance concerns in practicing doctors often include behavioural issues, we have used behavioural change theories to explore known barriers to successful remediation such as insight, motivation, attitude, self-awareness and the working environment and have developed an approach to the creation of bespoke remediation programmes which target these issues in addition to knowledge and skills development. This novel approach will be described, and the evaluation of initial pilot testing will be presented.

Lessons from assessing Professionalism through monitoring Professional attitudes and behaviours

  • Dr David Kennedy, Newcastle University
  • Corresponding author, *Kennedy D, Lunn B


Demonstration of acceptable professional attitudes and behaviours is an expectation of graduates and a complex area of assessment in medical school. Building on the conscientiousness index, described by McLachlan et al. (2009), indicators of professionalism are monitored, reviewed and contribute to assessment of professional attitudes and behaviours. Adherence to procedures (including carrying identification and evaluation), adverse outcomes from disciplinary procedures (including assessment irregularities), attendance at compulsory teaching as well as reporting of unacceptable attitudes and behaviours (including attitudes and behaviours towards patients, peers and staff) contribute to the monitoring record.


A Professionalism Issue Notice (PIN) form was devised and made available to staff. PIN forms enable staff to report professionalism issues ranging from punctuality issues through to inappropriate attitudes and behaviours.

PINs are scores between 1 and 10, depending on severity, by a professionalism review panel that meets three times a year. Where the acceptable threshold in the monitoring record is breached students meet a curriculum officer and agree an action plan for improvement.

Lessons learnt

There has been a significant resource implication in recording, collating and reviewing all monitored data.

Initially there was some resistance by some staff to complete PIN forms. A perception that completion of a PIN for something trivial could cause a student to fail was dealt with by reassurance that no student could fail on the basis of a single PIN form.

The student body generally accepts this form of assessment and view it as a fairer and more valid method than the reflective essay used previously. Student feedback centres around ensuring students can access their own PIN forms and monitoring record. We enabled this in the 2015-16 academic year.

The process helps early identification of students in need of support and referral to wellbeing services.

Candidate use of a feedback site and how that relates to examination performance

  • Professor Brian Lunn, Newcastle University
  • Corresponding author, Woodhouse L, Kennedy D, Moss J, *Lunn B

Students surveys such as the UK National Student Survey consistently show a significant disparity between satisfaction with teaching and that for feedback and assessment. OSCEs do not lend themselves to providing personalised feedback in summative examinations. We developed a system to allow students to visualise and understand their OSCE performance. Beyond the initial investment in time developing this, the year-on-year academic time to use this system is minimal (less than two hours for a 20 station OSCE).

The site is well used by students with 65.94% of students visiting the site within 4 hours of result release and a mean of 2.95 visits per student over the following 4 months (range 0-15). Students valued both the amount of feedback available and the nature of it with significant improvement in satisfaction ratings (from 48% to 86% satisfied).

Students sit a 10 station formative OSCE after 4 months of Stage 3. They sit a further sequential OSCE at the end of the Stage. Feedback by station and skill domain was made available to all students. We analysed the correlation between student use of the site, how they used it and their performance at the end of year summative OSCE. We will discuss student behaviour in relation their initial exam performance and how that correlated with their end of stage performance.