09 October 2024

Can AI be used to mark accurately, fairly and consistently? - CAS AI TC meeting

Written by

Marta Bronowicka | Community Specialist

Can AI be Used to Mark Accurately, Fairly, and Consistently?

If you were unable to join us for the CAS Thematic Community meeting on "Can AI be Used to mark accurately, fairly, and consistently?", don't worry! You can catch up on all the content and a recording of the session below.

Key Takeaways:

AI marking systems, like large language models (LLMs), are being explored for marking free-text answers.
AI can mark short, one or two-mark questions with a high level of accuracy, but struggles with more complex ones.
Human teachers still display a degree of disagreement in marking, which highlights a challenge for AI replication.
The AI’s predictions provide useful formative feedback for students but need transparency around accuracy.
Feedback literacy—how students interpret and use feedback—was emphasised as crucial in AI-assisted learning environments.

At the recent CAS AI Thematic Community meeting, we explored the potential of AI to tackle one of the most complex tasks in education—marking free-text responses. The session was led by Harriet and Diane from the Raspberry Pi Foundation, who shared their experiences and findings from their research on the use of Large Language Models (LLMs) to mark free-text answers on the ADA Computer Science platform. Ada, a free online platform developed by the Raspberry Pi Foundation and the University of Cambridge, supports students and teachers with various resources, including the ability to practise questions and receive feedback, specifically focusing on GCSE and A-Level Computer Science.

The focus of this session was how AI, specifically ChatGPT-3 and ChatGPT-4, could be used to assist in marking student answers, particularly for short, exam-style questions. As Harriet and Diane pointed out, marking such questions has always posed challenges due to the wide variety of acceptable answers that students may provide. Traditional auto-marking systems can only handle structured responses, but free-text answers require a more nuanced approach—enter AI.

The pilot study discussed involved six teachers and their students answering 16 short-answer questions taken from a real OCR exam paper. These responses were marked by three teachers as well as by ChatGPT-3 and ChatGPT-4. The results were telling: while the AI did align with teachers in many cases, particularly for one or two-mark questions, discrepancies arose in more complex scenarios. Interestingly, the models tended to be stricter in some instances, and in others, they awarded marks that teachers might not have given.

Harriet and Diane explained that while the AI’s agreement with teachers hovered around 66-72% for simple questions, it still struggled with higher-mark questions. The AI was trained using detailed mark schemes, as well as correct, partially correct, and incorrect sample answers, which improved its accuracy. However, they stressed that AI marking remains a predictive tool and should be used with transparency. Ada’s platform now includes clear messaging to students and teachers, stating that AI-marked responses are predictions, with feedback provided to help students learn from the marking.

The key takeaway? AI has great potential to support marking but is not yet perfect. Teachers still play an essential role in the process, particularly with more subjective answers and higher-mark questions.

Next Steps:

The session encourages educators to reflect on how AI could be integrated into marking practices. Here are some questions to consider:

How would AI marking fit into your current assessment framework?
Would you feel comfortable using AI as a tool for formative feedback?
Could AI help reduce your marking workload, allowing you to focus on more complex aspects of teaching?

To experiment with these ideas in your classroom, you might want to:

Use the AI-marked questions on the Ada platform to support student revision.
Challenge your students to review AI feedback and cross-check it with human feedback to foster critical thinking.
Explore how AI tools can help students develop feedback literacy—encouraging them to reflect on their performance and improve.

Further Resources:

Watch the recording

Review the slides from the meeting

Ada Computer Science

Join AI Community

Explore CAS Thematic Communities