Generative AI know-how has superior so quickly over the previous few years that some consultants are already nervous about whether or not we have hit “peak AI.”
On a Harvard Business Review podcast final week, Socher stated we will degree up massive language fashions by forcing them to answer sure prompts in code.
Right now, massive language fashions simply “predict the next token, given the previous set of tokens,” Socher stated — tokens being the smallest information items which have which means in AI methods. So despite the fact that LLMs exhibit spectacular studying comprehension and coding abilities and may ace tough exams, AI fashions nonetheless are inclined to hallucinate — a phenomenon the place they convincingly spit out factual errors as reality.
And that’s particularly problematic after they’re posed with complicated mathematical questions, Socher stated.
He provided an instance a big language mannequin may fumble on: “If I gave a baby $5,000 at birth to invest in some no-fee stock index fund, and I assume some percentage of average annual returns, how much will they have by age two to five?”
A big language mannequin, he stated, would simply begin producing textual content based mostly on related questions it had been uncovered to previously. “It doesn’t actually say, ‘well, this requires me to think super carefully, do some real math and then give the answer,’” he defined.
But in the event you can “force” the mannequin to translate that query into laptop code and generate a solution based mostly on the output of that code, you’re extra more likely to get an correct reply, he stated.
Socher did not supply specifics on the method however did say that at You.com, they have been in a position to translate questions into Python. Broadly talking, programming will “give them so much more fuel for the next few years in terms of what they can do,” he added.
Socher’s feedback come because the rising roster of huge language fashions battle to outsmart OpenAI’s GPT-4. Gemini, “Google’s most capable AI model yet,” barely surpasses GPT-4 across important benchmarks like the MMLU, one of the most popular methods to gauge AI models’ knowledge and problem-solving skills. And while the go-to approach has simply been to scale these models in terms of the data and computing power they’re given, Socher suggests that approach might lead to a dead end.
“There’s only so much more data that is very useful for the model to train on,” he stated.