- Cloud and inference providers see rising demand for Nvidia H200 chips due to DeepSeek’s AI models.
- DeepSeek’s open-source models require powerful hardware to run the full model for inference.
- The trend runs counter to the Nvidia sell-off following growing awareness of DeepSeek.
Some cloud providers are experiencing a notable uptick in demand for Nvidia’s H200 chips after Chinese AI company DeepSeek burst into the race for the winning foundation model this month.
Though the stock market caught wind of the powerful yet efficient large language model Monday, sending Nvidia’s stock down 16%, DeepSeek, has been on the radar of AI researchers and developers since it released its first model, V2, in May 2024.
But the performance of V3, released in December, is what made AI developers sit up and take notice. When R1, the company’s reasoning model, which competes with OpenAI’s o1, was released in early January, demand for Nvidia’s H200s started climbing.
“The launch of DeepSeek R1 has significantly accelerated demand for H200. We’ve seen such strong interest that enterprises are pre-purchasing large blocks of Lambda’s H200 capacity, even before public availability,” said Robert Brooks, founding team member and vice president of revenue at cloud provider Lambda.
DeepSeek’s models are open source, which means users pay very little to use them. However, they still need hardware, or a cloud computing service to use them at scale.
Business Insider spoke with 10 cloud service and AI inference providers. Five reported a rapid increase in demand for Nvidia’s H200 graphics processing units this month.
Amazon Web Services and Coreweave declined to comment. Oracle, Google, and Microsoft did not respond to requests for comment.
This week, AWS, Microsoft, Google, and Nvidia have made DeepSeek models available on their various cloud and AI-developer platforms, or provided instructions for users to do so themselves.
Nvidia declined to comment, citing a quiet period before its February 26 earnings release.
AI cloud offerings have exploded in the last two years, creating a slew of options beyond the mainstays of cloud computing like Microsoft Azure, and Amazon Web Services.
The demand has come from a range of customers from startups and individual researchers to massive multinational firms.
“We’ve heard from half a dozen of the 50 largest companies in the world. I’m really not exaggerating,” Tuhin Srivastava, cofounder of inference provider Baseten, told BI.
Friday, semiconductor industry analysts at Seminanalysis reported “tangible effects” on pricing for H100 and H200 capacity in the market stemming from DeepSeek.
Total sales of Nvidia H200 GPUs have reached the “double digits billions, CFO Colette Kress said on the company’s November earnings call.
‘Exponential demand’ for Nvidia H200s
Karl Mozurkewich and his team at cloud provider Valdi saw H200 demand ramp up throughout January and at first, they didn’t know why.
The Valdi team doesn’t own chips, it acquires capacity from existing data centers and sells that capacity to customers. The company doesn’t know every use case for each chip it makes accessible, but it polled several H200 customers and all of them wanted the chips to run DeepSeek.
“Suddenly, R1 got everybody’s attention — it caught fire — and then it kind of went exponential,” Mozurkewich said.
American companies are eager to take advantage of DeepSeek’s model performance and reasoning innovations, but most are not keen to share their data with a Chinese firm. That means they can either use an API offered by a US firm or run the model on their own hardware.
Since the model is open source, it can be downloaded and run locally without sharing data with DeepSeek.
For Valdi, the majority of its H200 demand is coming from startups, Mazurkwich said.
“It appears the market is reacting to DeepSeek by grabbing the best GPUs available for testing as quickly as possible,” he said. “This makes sense, as most companies’ current GPUs are likely to continue to work on ongoing tasks they’ve been allocated to,” Mazurkewich continued.
Though many companies are still testing and experimenting, the Valdi team is seeing longer-term requests for additional hardware, suggesting an uptick in demand that could last beyond DeepSeek’s initial hype cycle.
Chip light, compute-heavy
DeepSeek’s models were trained with less powerful hardware than US models, according to the company’s research paper. This efficiency has spooked the stock market.
Players like Meta, OpenAI, and Microsoft have invested billions in AI infrastructure, with billions more on the way. Investors are concerned about whether all that capacity will be needed. DeepSeek was created with fewer, relatively weak chips (though the number is hotly debated).
Training chips aside, using the models for inference is a compute-intensive task, cloud providers say.
“It is not light and easy to run,” Srivastava said.
The size of a model is measured in “parameters.” More parameters require more compute. The most powerful versions of DeepSeek’s models have 678 billion parameters. That’s less than OpenAI’s ChatGPT-4 which has 1.76 trillion, but more than Meta’s largest Llama model, which has 405 billion.
Srivastava said most firms were avoiding the 405 billion parameter Llama model if they coud help it since the smaller version was much easier to run. DeepSeek offers smaller versions too, and even its most powerful version is cheaper to run, which has stoked excitement with firms who want to use the full model, the cloud providers said.
H200 chips are the only widely available Nvidia chip that can run DeepSeek’s V3 model in its full form on a single node (8 chips designed to work together).
You can also spread it across more lower-power GPUs, but that requires more expertise and leaves room for error. Adding that complexity almost inevitably slows down performance, Srivastava said.
Nvidia’s Blackwell chips will also be able to handle the full V3 model in one node, but these chips have just begun shipping this year.
With demand spiking, finding enough chips to run V3 or R1 at high speed is tough if it hasn’t already been allocated.
Baseten doesn’t own GPUs; it buys capacity from data centers that do and then tinkers with all the software connections to make models run smoothly. Some of its customers have their own hardware in their own data centers but still hire Baseten to optimize model performance.
Its customers especially value inference speed — the speed that enables an AI-generated voice to converse in real time for example. DeepSeek’s capacity at the open source price is a game-changer for its customers, according to Srivastava.
“It does feel like this is an inflection point,” he said.
Have a tip or an insight to share? Contact Emma at [email protected] or use the secure messaging app Signal: 443-333-9088