The AI boom isn’t slowing down; it’s accelerating.
Lin Qiao, a former Meta engineer who helped build PyTorch, now runs Fireworks AI, a $4 billion startup processing 15 trillion AI tokens a day, and she says demand is only just getting started.
“This is the year token consumption is going to grow exponentially,” Qiao told me in a recent interview.
Fireworks AI’s inference cloud platform is now processing roughly 15 trillion AI tokens per day, up from 13 trillion just a few months ago and 10 trillion in late 2025. (Models break down words and other inputs into numerical tokens to make them easier to process and understand. One token is about ¾ of a word. They’re also used to price AI model use, via an industry-standard cost per million tokens.
Qiao has been here before. Long before the current generative AI boom, she was inside Meta helping build PyTorch, the open-source framework that powered the first wave of modern AI adoption. Back then, there were no GPUs optimized for AI, no mature tooling, and no clear roadmap.
“We had to build everything from the ground up,” Qiao said.
The scale of that growth, Qiao said, reflects how quickly AI is embedding itself into everyday workflows across industries.
Token usage isn’t confined to tech teams. Qiao described finance departments using AI to automate forecasting, her own legal team building internal AI tools, and even gig workers creating music on demand with generative AI models. Her college-age daughter uses multiple AI systems simultaneously — one to generate answers and others to verify them.
“That’s the world we’re living in,” Qiao said. “Literally every single person is using these tools.”
That surge is rippling down the entire technology stack. GPU supply is tight, prices are rising, and even power infrastructure is under strain as companies race to deploy more AI capacity.
“The whole system is saturated,” Qiao said, describing bottlenecks stretching from semiconductor components to energy grids.
Her credibility on these trends stems from her role in building PyTorch, which helped democratize AI development across companies ranging from Tesla to Walmart. That early exposure showed her how quickly AI could spread beyond Silicon Valley into industries like agriculture and manufacturing.
Now, she sees a similar, but far faster, wave unfolding.
Why exist?
Still, a core question hangs over companies like Fireworks AI: why do they exist at all? If hyperscalers Amazon, Google, Microsoft, and Oracle already rent out GPUs, why not go directly to them?
Qiao’s answer is complexity and speed. Enterprises, she said, struggle to keep up with rapidly changing models and hardware, from new Nvidia chips arriving every few months to new AI models every few weeks. Fireworks handles that churn — optimizing performance, managing infrastructure, and helping customers migrate quickly — so they don’t have to.
For Qiao, the lesson from both PyTorch and Fireworks is consistent: once AI becomes usable, adoption accelerates dramatically. And based on current token volumes, that acceleration is just getting started.
Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at [email protected].

