Unlock the Editor’s Digest for free
Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.
Rivals of Nvidia, which dominates the market for AI chips, have long hoped that an inflection point would help them make up lost ground.
That point may be at hand. So far, however, there is little sign of Nvidia ceding its lead — though it is still an open question as to whether the AI market will develop in ways that eventually erode its dominance.
The key issue is when the main focus in AI moves from training the large “foundation” models that underpin modern AI systems, to putting those models into widespread use in the applications used by large numbers of consumers and businesses.
With their ability to handle multiple computations in parallel, Nvidia’s powerful graphical processing units, or GPUs, have maintained their dominance of data-intensive AI training. By contrast, running queries against these AI models — known as inference — is a less demanding activity that could provide an opening for makers of less powerful — and cheaper — chips.
Anyone expecting a quick shift will have been disappointed. Nvidia’s lead in this newer market already looks formidable. Announcing its latest earnings on Thursday, it said more than 40 per cent of its data centre sales over the past 12 months were already tied to inference, accounting for more than $33bn in revenue. That is more than two and a half times the entire sales of Intel’s data centre division over the same period.
But how the inference market will develop from here is uncertain. Two questions will determine the outcome: whether the AI business continues to be dominated by a race to build ever larger AI models, and where most of the inference will take place.
Nvidia’s fortunes have been heavily tied to the race for scale. Chief executive Jensen Huang said this week that it takes “10, 20, 40 times the compute” to train each new generation of large AI models, guaranteeing huge demand for Nvidia’s forthcoming Blackwell chips. These new processors will also provide the most efficient way run inferences against these “multitrillion parameter models”, he added.
Yet it is not clear whether ever-larger models will continue to dominate the market, or whether these will eventually hit a point of diminishing returns. At the same time, smaller models that promise many of the same benefits, as well as less capable models designed for narrower tasks, are already coming into vogue. Meta, for instance, recently claimed that its new Llama 3.1 could match the performance of the advanced models such as OpenAI’s GPT-4, despite being far smaller.
Improved training techniques, often relying on larger amounts of high-quality data, have helped. Once trained, the biggest models can also be “distilled” in smaller versions. Such developments promise to bring more of the work of AI inference to smaller, or “edge”, data centres, and on to smartphones and PCs. “AI workloads will go closer to where the data is, or where the users are,” says Arun Chandrasekaran, an analyst at Gartner.
The range of competitors with an eye on this nascent market has been growing rapidly. Mobile chip company Qualcomm, for instance, has been the first to produce chips capable of powering a new class of AI-capable PCs, matching a design laid out by Microsoft — a development that throws down a direct challenge to longtime PC chip leader Intel.
The data centre market, meanwhile, has attracted a wide array of would-be competitors, from start-ups like Cerebras and Groq to tech giants like Meta and Amazon, which have developed their own inference chips.
It is inevitable that Nvidia will lose market share as AI inference moves to devices where it does not yet have a presence, and to the data centres of cloud companies that favour in-house chip designs. But to defend its turf, it is leaning heavily on the software strategy that has long acted as a moat around its hardware, with tools that make it easier for developers to put its chips to use.
This time, it is working on a wider range of enterprise software to help companies build applications that make best use of AI — something that would also guarantee demand for its chips. Nvidia disclosed this week that it expects its revenue from this software to reach an annual run-rate of $2bn by the end of this year. The figure is small for a company expected to produce total revenue of more than $100bn, but points to the increasing take-up of technologies that should increase the “stickiness” of products. The AI chip market may be entering a new phase, but Nvidia’s grip shows no signs of being loosened.