AI inference at scale? Do it with Supermicro’s AMD platforms
General-purpose servers aren’t up to running AI inferencing at scale. Supermicro’s servers, powered by AMD Instinct GPUs, are. They help organizations speed production at a lower cost.
As AI moves from the experiment stage to full-scale deployment, many organizations are discovering a new and unexpected bottleneck: their own infrastructure.
As they’re finding, running AI inferencing workloads on systems not designed specifically for AI can be both slow and costly.
It’s a big deal, because for many, inference—the process of running pre-trained AI models—is the next step in AI implementation. What’s needed is a scalable infrastructure that can adapt to new models, new deployment patterns and new accelerators.
Supermicro has a solution now: Its accelerated AI Platforms featuring AMD Instinct MI350 Series GPUs. These systems provide rack-scale infrastructure built for enterprises moving AI from proof-of-concept to production—and beyond.
Supermicro, working closely with AMD, has designed this platform for today’s AI challenges. The partners are also ensuring that these systems are flexible enough to support future AI use cases.
Supermicro says the benefits of using this new AI platform include:
- Faster time to production: With pre-validated modular systems, organizations can deploy in weeks rather than months.
- Lower cost per inference: The total cost of ownership (TOC) is lowered with memory-dense architecture, liquid cooling and AMD’s power-efficient GPUs.
- Enterprise software: AMD’s AI software stack provides blueprint designs and inference microservices, enabling faster deployments.
- Vendor independence: AMD’s ROCm software ecosystem is open to all.
What are the main challenges of moving from AI experiments to production?
Supermicro and AMD have designed these systems to overcome several common barriers to AI inferencing at scale. These barriers include:
- Inference demand: Running continuous, latency-sensitive workloads requires a purpose-built platform for sustained throughput.
- Scalability: An AI workload that runs fine in a low-volume pilot may fail to meet performance and cost requirements at production scale.
- Rigid architectures: Closed AI stacks are common, and they can make it difficult and costly for an org to adapt to new AI technology and workloads.
- Power & density constraints: Most data centers were built before the advent of enterprise AI, meaning they can’t easily meet AI’s thermal and power demands.
What’s Supermicro’s AMD-powered solution?
Supermicro Accelerated Solutions featuring AMD Instinct MI350 Series GPUs provide a rack-scale AI infrastructure platform designed for production-scale AI workloads.
These workloads can range from high-performance inference to large-scale training. And they can span deployment environments ranging from enterprise data centers to regional AI factories.
Also, the Supermicro platform can scale flexibly to match organizational demand. That’s true, the company says, whether teams are standing up initial AI capacity or expanding to full factory-scale operations.
Supermicro also says the platform reduces storage and data-pipeline bottlenecks. That helps sustain throughput, increase GPU utilization, and improve operational efficiency at scale.
What configurations are available?
The Supermicro AI Platforms offer several cluster size options, with preconfigured L11 bill of materials (BOM). Full rack configurations include servers, switches, cables and cooling solutions.
GPU options include the AMD Instinct MI350X and AMD Instinct MI355X. Depending on the model, a Supermicro server can pack anywhere from 32 to 1,024 of these GPUs.
CPU options include the dual-socket AMD EPYC 9005 Series with up to 192 cores per processor.
For this solution, Supermicro currently offers 3 SKUs. They’re available in three form factors: 4U (liquid cooled), 8U and 10U (both air cooled). Here are the Supermicro SKUs:
- AS -4126GS-NMR-LCC (4U, liquid-cooled)
- AS -8126GS-TNMR (8U)
- AS -A126GS-TNMR (10U)
What about the software stack?
To support these and other systems, AMD offers two powerful components for the AI software stack:
- AMD ROCm: This open software stack includes drivers, development tools and APIs.
- AMD Enterprise AI Suite: Built on AMD ROCm, this suite helps organizations deploy AI quickly and with less complexity.
Do More:
- Test-drive these and other Supermicro AMD-powered systems with the Supermicro Jumpstart program
- Learn more about AMD Instinct GPUs