Marking the Markers: A Comparative Study of AI and Human Assessment
Strand 2
Time: 2:00pm to 2:30pm
Theme: Assessment
Location: Richmond LT2
Presenter: Simon Brookes
Abstract:
This session shares findings from an empirical study comparing the marking and feedback generated by GPT-5 with that of three human markers across 10 student essays in a creative writing module. The study assessed both grade accuracy and feedback quality across multiple aspects, including actionability, specificity, rubric alignment, tone, and recognitive cues. The results are striking. Fewshot calibrated AI demonstrated 5.6 times greater consistency than human markers and produced feedback rated 61% more actionable. Zero-shot AI, given only a rubric and no examples, matched the overall quality of human feedback without any calibration. All AI setups achieved perfect rubric alignment compared to a human average of 3.47 out of 4. Where humans still held advantages, notably in tone and relational recognition, these seem to be prompt-steerable design choices rather than inherent limitations of AI. The session does not argue that AI should replace human markers. Instead, it presents these findings as a provocation: to explore what they reveal about the consistency and quality of human marking practices, what students are currently receiving, and what a more deliberate hybrid model could look like. Attendees will be encouraged to reflect on their own marking and feedback methods, consider the implications for assessment design in their disciplines, and engage with the ethical and pedagogical questions raised by AI’s growing competence in this domain.