Stay informed with free updates
Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.
Two pioneers of reinforcement learning, a scientific technique that has been fundamental to the artificial intelligence boom, have warned against the unsafe deployment of AI models after winning this year’s Turing Award.
Andrew Barto, a professor emeritus at the University of Massachusetts, and Richard Sutton, a professor at the University of Alberta and former research scientist at DeepMind, have won the $1mn prize from the Association for Computing Machinery for developing the groundbreaking method.
Barto and Sutton developed reinforcement learning in the 1980s after they were inspired by psychology and how people learn. The machine learning technique, which rewards AI systems for behaving in a desired way, has helped power the success of some of the world’s top AI groups, such as OpenAI and Google.
The winners of the award, which is often dubbed the Nobel Prize of computing, said they were concerned about AI companies rushing to launch products before thoroughly testing them.
“Releasing software to millions of people without safeguards is not good engineering practice,” said Barto, likening it to building a bridge and testing it by having people use it.
“Engineering practice has evolved to try to mitigate the negative consequences of technology, and I don’t see that being practised by the companies that are developing,” he added.
The award, which is named after British mathematician Alan Turing, comes after AI breakthroughs were also recognised in both the chemistry and physics Nobel Prizes in October. This highlighted the importance of computing tools and data science in cracking complex scientific problems at far shorter timescales.
“The tools [Barto and Sutton] developed remain a central pillar of the AI boom and have rendered major advances, attracted legions of young researchers, and driven billions of dollars in investments. [Reinforcement learning’s] impact will continue well into the future,” said Jeff Dean, senior vice-president at Google, which sponsored the prize.
Google DeepMind used the technique to develop AlphaGo, an AI system that beat human players in the game Go, a major milestone in AI research. OpenAI also used a type of reinforcement learning that relies on human feedback to control ChatGPT’s output.
But both Barto and Sutton warned against the current pace of AI development, where firms are racing to launch models that are powerful but prone to making errors, raising unprecedented amounts of funding and investing billions in infrastructure like data centres to train and run AI.
Big Tech groups have said AI spending could exceed $320bn this year, while OpenAI, which launched ChatGPT in 2022, is currently raising $40bn in new funding at a $260bn valuation.
Barto criticised the AI sector for being motivated by business incentives, instead of furthering AI research. “The idea of having huge data centres and then charging a certain amount to use the software is motivating things, and that is not the motive that I would subscribe to,” he added.
OpenAI has argued it needs to unlock further investment through a more traditional corporate structure in order to achieve the company’s founding ‘mission’ of ensuring that artificial general intelligence (AGI) — a scenario where computer systems achieve similar or superior levels of intelligence to humans — benefits humanity.
But Sutton dismissed tech companies’ narrative around AGI as “hype”. “AGI is a weird term because there’s always been AI and people trying to understand intelligence.” He added that “systems that are more intelligent than people” will happen eventually through a better understanding of the human mind.
Barto and Sutton also criticised US President Donald Trump’s attempt to slash federal spending on scientific research and lay off staff at US science agencies.
This could have devastating consequences for US dominance in science, said Barto, who called it “wrong and a tragedy not only to this country but to the world”.
He added that the opportunities to do the kind of research that enabled their work in reinforcement learning would “disappear” without the freedom to explore abstract, unproven concepts.
Despite their concerns, both scientists are optimistic about the potential for reinforcement learning, combined with AI, to bring positive outcomes to the world.
“We have the potential to become less greedy and selfish and more aware of what’s going on in others . . . there are many things wrong in the world, but too much intelligence is not one of them,” said Sutton.