The assignments looked brilliant. The understanding didn’t.
That’s when an NYU business school professor decided to fight AI-assisted coursework with AI-powered oral exams.
Panos Ipeirotis, a professor at NYU’s Stern School of Business who teaches data science, wrote in a blog post published last week that he became concerned about student assignments that read like “a McKinsey memo” but lacked genuine understanding.
When he called on students in class and asked them to defend their submissions, many struggled to do so.
“If you cannot defend your own work live, then the written artifact is not measuring what you think it is measuring,” Ipeirotis wrote.
‘Fight fire with fire’
To counter that, he revived oral exams and enlisted an AI agent to administer them at scale, in an attempt to “fight fire with fire.”
“We need assessments that evolve toward formats that reward understanding, decision-making, and real-time reasoning,” Ipeirotis said.
“Oral exams used to be standard until they could not scale,” he added. “Now, AI is making them scalable again.”
In the blog post detailing the experiment, Ipeirotis said he and his colleague built the AI examiner using ElevenLabs’ conversational speech technology.
“Just write a prompt describing what the agent should ask the student, and you are done,” he said, adding that it took minutes to set up.
The oral exam had two parts. First, the AI agent questioned students about their capstone projects, probing their decisions and reasoning. Then it selected one of the cases discussed in class and pushed students to think through it live.
Over the course of nine days, the system assessed 36 students. Each session lasted about 25 minutes, with total compute costs coming in at about $15 for all 36 students. A human-run oral exam could cost hundreds of dollars at teaching-assistant rates, Ipeirotis wrote.
Ipeirotis also used AI to grade the exams. Three AI models — Claude, Gemini, and ChatGPT — independently assessed each transcript. They then reviewed one another’s evaluations, revised their scores, and produced a final grade, with Claude acting as the “chair” to synthesize the decision.
Ipeirotis said the “council of LLMs” graded more consistently than humans, and “more strictly, but more fairly.”
“The feedback was better than any human would produce,” he wrote, adding that the AI analysis also surfaced gaps in how the material had been taught.
However, students were divided. Only a small minority preferred the AI oral exams, and many found them more stressful than written tests — even while acknowledging they were a better measure of real understanding.
Still, Ipeirotis said the oral exams demonstrated “how learning is supposed to work.”
“The more you practice, the better you get,” Ipeirotis wrote.
Using AI in exams
Ipeirotis’ blog post comes as universities grapple with how to test students in the AI era.
A paper published in September in the academic journal “Assessment & Evaluation in Higher Education” said AI has turned student assessment into a “wicked problem.”
The study’s authors interviewed 20 unit chairs at a large Australian university in late 2024. Through hourlong Zoom interviews, they found instructors to be overwhelmed by heavier workloads, confusion surrounding AI use, and a lack of agreement on what an AI-proof assessment should look like.
Some faculty members told researchers that AI should be treated as a tool for students to master. Others view it as academic dishonesty that erodes learning. Many admitted they were unsure how to proceed.
In May, LinkedIn cofounder Reid Hoffman said on an episode of his podcast, “Possible,” that AI can make it easier for students to exploit traditional assessment formats, such as essays. Universities should rethink how they evaluate learning, he said, adding that students could soon expect an “AI examiner.”
Hoffman said that oral exams leave less room for shortcuts, requiring students to demonstrate real understanding.

