‘Inconsistent’ AI detection ‘should prompt assessment rethink’ – Times Higher Education

Home AI ‘Inconsistent’ AI detection ‘should prompt assessment rethink’ – Times Higher Education

Practical insights from and for
academics and university staff
Everything you need for each step
of your study abroad journey
The minor use of large language models (LLMs) by students in their work may be overstated by artificial intelligence (AI) detection tools, according to a paper.
At the same time, the research suggests, the tools may be undercounting a heavier reliance on programs such as ChatGPT.
For the study, published in Education and Information Technologies, researcher Lucky E. Atamhenwan fed 81 sample essays into Turnitin. The scripts ranged from those that were 100 per cent LLM-generated – either by ChatGPT, Copilot or Gemini – to those written solely by people.
Turnitin did not flag any of the essays that were 100 per cent human written as being generated by AI.
And in every instance in which the detector flagged AI-generated words, it was indeed due to the presence of LLM-generated work in those samples.
But the software struggled with the scripts that were partially AI-written, consistently failing to identify the correct percentage of LLM-generated work included.
For essays with a low percentage of LLM-generated words – between 15 per cent and 40 per cent – Turnitin’s AI score, which declares how much of a submission it considers to have been produced by the technology rather than by a human, was often higher than the actual amount.
But for scripts that had a high percentage of LLM-generated words – between 70 per cent and 100 per cent – the score was consistently lower.
Atamhenwan, the founder of AI company Genducate Learning and an academic at Central Queensland University, said the results should prompt universities to design assessments that do not require the use of detectors.
“In most student cohorts, the majority are ethical learners who avoid academic misconduct. Consequently, these findings suggest that students who use generative AI transparently and in line with institutional policies have nothing to fear,” he told Times Higher Education.
“Most institutional guidelines specify that an AI detector score alone does not prove misconduct. The findings confirm that relying solely on these scores would be erroneous. Instead, an AI score, especially above 60 per cent, should be treated as one key indicator alongside institutional generative AI usage and academic misconduct policies, and student transparency to evaluate potential academic misconduct.”
Sam Illingworth, a professor researching AI literacy at Edinburgh Napier University, said that the study raised serious questions about the use of AI detection tools.
Describing the use of AI by students whose first language is not English as a legitimate application of the technology, he noted that this could end up being flagged unjustly by some detection tools. Those who need AI’s help to “structure slightly” their essay could similarly fall foul of the tools.
“Why are we policing our students?” Illingworth said. “That’s not why I became an educator. Students should be co-curators of knowledge with us; we should be operating from a position of trust.”
In a statement, Josh Johnston, vice-president of AI at Turnitin, said that detecting AI-generated writing “should serve as a kick-off to a conversation” between teachers and their students.
“We developed the tool to minimise unfounded accusations, which is why we do not report AI writing less than 20 per cent, and we test to keep false positives under 1 per cent. Core to our design principle is the trade-off of missing some AI-written text in order to build a better student experience.
“That said, no detection tool is perfect. The study’s results show that Turnitin’s AI writing scores move in the right direction: papers with more AI writing receive higher AI scores. At the same time, given the study is looking at a small set of artificially mixed human- and AI-written documents, score similarities or differences could play out differently in real student submissions.”
georgia.luckhurst@timeshighereducation.com
Why register?
Or subscribe for unlimited access to:
Already registered or a current subscriber?
Tool developed by edtech giant used by customers 65 million times in three months since launch
Edtech giant prepares to offer customers new tool from April as it grapples with challenges posed by ChatGPT
Offering microcredentials can help universities better prepare students for work, claims edtech firm leader, but degrees still ‘primary currency’
AI glasses and other smart apparel may be impossible to keep out of exams, adding to universities’ woes about the future of assessments
University groups say plans to reduce funding for flagship research programme an ‘astonishing act of political incoherence’
Latest effort to lure leading scholars to Europe comes as concerns grow over political oversight of research in the US
New leader of research-intensive grouping says ‘challenging conversations’ needed but critics urge united front against political attacks
Hundreds of thousands of email addresses understood to be affected in hack claimed by notorious ShinyHunters group
Subscribe to Times Higher Education
As the voice of global higher education, THE is an invaluable daily resource. Subscribe today to receive unlimited news and analyses, commentary from the sharpest minds in international academia, our influential university rankings analysis and the latest insights from our World Summit series.

source

Leave a Reply

Your email address will not be published.