During a press briefing during AMD’s 2025 Advancing AI event in San Jose, California, the technology giants CVP and GM of data centre GPU business, Andrew Dieckmann unveiled the company’s “most advanced AI platform”, the MI350 series.
As a result, the new tool delivers a “4x generational increase in AI compute, with 20 petaflops of FP4 and FP6 performance and 288GB of HBM memory per module,” supporting workloads of “520 billion parameters on a single GPU.”
MI350 shipping in Q3
Dieckmann also highlighted that MI350 offers significantly higher memory capacity, competitive memory bandwidth and more compute across a wide range of data types, including more than double the throughput in key formats like FP6 and FP64.
This makes it “an ideal solution for both the latest generative AI models and large scientific workloads,” he notes.
The Mi350 server solutions are now shipping, with “initial server and CSP deployments expected in Q3,” he reveals.
They support both “air- and liquid-cooled infrastructures,” with racks scaling “up to 96 or 128 GPUs per rack” and delivering “2.6 exaflops FP4 compute and 36 TB of HBM3e memory.”
Helios previewed
AMD also previewed its Helios rack, which is set to launch next year, the engine powering the next generation of AI- purpose-built for leadership in large-scale training and distributed inference,” he said.
Featuring “40 PFLOPS of FP4, 432 GB of HBM per GPU and nearly 20 TB/s of memory bandwidth,” Helios aims to deliver “up to 10x more AI performance on the most advanced frontier models.”
Noting Helios’ competitive advantages, he stated: “We have 50% more HBM memory capacity, 50% more HBM memory bandwidth and up to 50% more scale-out bandwidth.”
Meanwhile, on cooling trends, he said: “Most cutting-edge AI deployments are already going liquid-cooled,” and “long term, the economics of liquid cooling are better- and economics usually wins.”
Scaling AI networks
This comes as, according to Soni Jiandani, SVP networking technology and solutions group at AMD, “training datasets are doubling every eight months. That means demand is exceeding the pace of silicon advancements.”
Also speaking at the press briefing, Jiandani said this growth demands a new approach. “The only way is to build distributed system innovations that can scale from the node to the rack to being data centre wide, " she stated.
However, she said current options are limited. “The only two options customers have at their disposal today is either InfiniBand, which doesn’t scale, or Ethernet, which scales but was not designed to run AI networks.”
Her solution? evolving Ethernet using open standards.
“You can only do it with open systems, and you have to evolve Ethernet as the foundational building blocks,” she added.
Jiandani also pointed to the need for more reliable networks for long AI training runs. “Network issues caused 10% of all the disruptions during a 54-day LLaMA-3 model run,” she said, citing a Meta study.
To address this, AMD added features like quick failure detection, selective retransmission and NIC-level failover.
“Network disruptions will cause checkpoints in your workload during a large training job,” she explained. “If the network card had a failure, we have built-in failover… we will quickly go to the next NIC without interrupting your application,” she noted.
“We are doing it with a broad ecosystem across leading server companies… and corroborating with the networking partners to publish joint validated reference architectures.”
She concluded: “This is just chapter one. Lots of innovations are coming from AMD… to deliver more and more acceleration, more and more scale, more and more reliability through software.”
RELATED STORIES
Nokia taps AMD’s EPYC CPUs to power its Cloud Platform
AMD snaps up Untether AI team as energy-efficient chip startup folds