Artificial intelligence can be used for grading law school exams, but should it be? – ABA Journal

Home AI Artificial intelligence can be used for grading law school exams, but should it be? – ABA Journal
Artificial intelligence can be used for grading law school exams, but should it be? – ABA Journal

By Julianne Hill

AI in the classroom illustration
Artificial intelligence is being put to the test, literally, as it is being used to grade law students’ exams. But should it be?
Jack Graves, a professor at the Syracuse University College of Law in New York, says yes.
“AI doesn’t get tired; it doesn’t get distracted; it doesn’t get frustrated; it doesn’t say, ‘Why didn’t you understand this?’” Graves says. “If I use it correctly, I am absolutely certain it is more consistent in scoring essays than I am, and when you’re scoring on a curve, consistency is the No. 1 concern.”
Graves, who actively used ChatGPT to grade student work last semester but is considering shifting to Claude, says he finds no meaningful variances statistically between the scores given by AI and the grades that he’s given.
A recent paper published in the Journal of Law and Empirical Analysis backs that up.
The paper’s six authors, all law professors at different schools, compared OpenAI’s GPT-5’s ability to grade final written exams from four different subjects at top 30 law schools to the grading by human professors.
“The professors took old law school exams they’d actually given to law students and had graded themselves and fed them to AI along with the actual rubrics. Then they tried various prompting methods to grade the same exams to see how they compared the actual grades given by professors,” says Daniel Schwarcz, a study co-author and a professor at the University of Minnesota Law School.
“When provided with a detailed rubric, the [large language model] grades correlate with the human grader at Pearson correlation coefficients of up to 0.93,” according to the paper, meaning that there was a significant correlation between how a human grades and how AI grades an assignment.
For Schwarcz, the question is not whether professors can use AI but how.
“You should first grade on your own without an AI,” Schwarcz says. “And then you may want to use an AI to sort of see if there are any discrepancies and then double-check.”
Graves disagrees and lets AI take the lead in grading.
“Early on, I pretty much redid its grading. Now, I’m very comfortable spot-checking,” he says. “I don’t vet every single interaction.”
It’s all how you train the AI, which involves a front-end investment of time and money, Graves says. He has been using ChatGPT but is considering shifting to Claude Cowork.
His method involves inputting assignments and texts, collaborating with the tool to create rubrics, and providing it with model answers and sample assignments that he’s already graded. He carefully provides prompts to avoid sites such as Reddit and works to make the likelihood of hallucinations low.
“So far, in one semester of evidence and one semester of contracts and over 1,000 interactions, I have evidence of one outright hallucination,” Graves says. “It gets to know you and your system. It applies your input on a very nuanced and a very consistent basis. That’s much the way we historically train people to score the essays on a bar exam.”
But Schwarcz says he refuses to use AI as the first method of grading for “a procedural fairness reason.”
“Your students have an expectation that their exams will be graded by a human. It’s not fair to deprive them of that opportunity without really getting their buy-in,” Schwarcz adds.
Just as many universities require students to disclose whether they’re using an AI to write a paper, Schwarcz adds, professors have to disclose their use.
“The power dynamic is such that, what is a student supposed to do, right?” Schwarcz says.
Daniel W. Linna Jr., a senior lecturer and the director of law and technology initiatives at the Northwestern University Pritzker School of Law in Illinois, disagrees.
“We need to move forward and get past this idea of, ‘We’re going to prohibit; we’re going to disclose,’” says Linna, a 2018 ABA Journal Legal Rebel, noting that Northwestern University provides Microsoft Copilot and Microsoft 365 Copilot (formerly Copilot Enterprise). “The tools are here. They’re integrated in more and more places.”
While Linna does not use AI to grade, he does use Perusall, a collaborative reading and annotation platform, to measure engagement with readings.
While grading with AI, Graves makes sure to protect students’ identities. He carefully assigns each student’s work a number to eliminate privacy concerns and creates a project that is only facing him, so that the test is not available to others.
Grading with AI allows the professor to interrogate it to offer insights into the professor’s grading and teaching, Graves adds.
“I can ask it, ‘What are the 10 issues my students struggled with the most on this exam?’” he says. “It’s much better at remembering everything it did in the process of scoring than I am.”
That information can change how he handles the topic in the next semester’s class, Graves says.

Graves adds that AI bots could also administer oral exams, including tests during the semester for students to demonstrate progress and understanding of the material.
“I can see saying to a student, ‘We’ve reached week three, and we’ve covered this set of materials. You now have to go on this bot, take the test and return it to me a passing score,’” he says, adding that he would allow students to take it as many times as possible and to file protests if they think that the AI has treated them unfairly. “If you bomb it the first time, I don’t see it. Do it again and again. But to continue with the course, you have to submit to me a link that shows me that the bot said you pass.”
The paper’s authors agree.
“Even if they do not fully replace humans in the near future,” the paper concluded, “LLMs could soon be put to valuable tasks by law school professors, such as reviewing and validating professor grading, providing substantive feedback on ungraded midterms, and providing students feedback on self-administered practice exams.”

source

Leave a Reply

Your email address will not be published.