AI inference at scale? Do it with Supermicro’s AMD platforms

General-purpose servers aren’t up to running AI inferencing at scale. Supermicro’s servers, powered by AMD Instinct GPUs, are. They help organizations speed production at a lower cost.

As AI moves from the experiment stage to full-scale deployment, many organizations are discovering a new and unexpected bottleneck: their own infrastructure.

As they’re finding, running AI inferencing workloads on systems not designed specifically for AI can be both slow and costly.

It’s a big deal, because for many, inference—the process of running pre-trained AI models—is the next step in AI implementation. What’s needed is a scalable infrastructure that can adapt to new models, new deployment patterns and new accelerators.

Supermicro has a solution now: Its accelerated AI Platforms featuring AMD Instinct MI350 Series GPUs. These systems provide rack-scale infrastructure built for enterprises moving AI from proof-of-concept to production—and beyond.

Supermicro, working closely with AMD, has designed this platform for today’s AI challenges. The partners are also ensuring that these systems are flexible enough to support future AI use cases.

Supermicro says the benefits of using this new AI platform include:

What are the main challenges of moving from AI experiments to production?

Supermicro and AMD have designed these systems to overcome several common barriers to AI inferencing at scale. These barriers include:

What’s Supermicro’s AMD-powered solution?

Supermicro Accelerated Solutions featuring AMD Instinct MI350 Series GPUs provide a rack-scale AI infrastructure platform designed for production-scale AI workloads.

These workloads can range from high-performance inference to large-scale training. And they can span deployment environments ranging from enterprise data centers to regional AI factories.

Also, the Supermicro platform can scale flexibly to match organizational demand. That’s true, the company says, whether teams are standing up initial AI capacity or expanding to full factory-scale operations.

Supermicro also says the platform reduces storage and data-pipeline bottlenecks. That helps sustain throughput, increase GPU utilization, and improve operational efficiency at scale.

What configurations are available?

The Supermicro AI Platforms offer several cluster size options, with preconfigured L11 bill of materials (BOM). Full rack configurations include servers, switches, cables and cooling solutions.

GPU options include the AMD Instinct MI350X and AMD Instinct MI355X. Depending on the model, a Supermicro server can pack anywhere from 32 to 1,024 of these GPUs.

CPU options include the dual-socket AMD EPYC 9005 Series with up to 192 cores per processor.

For this solution, Supermicro currently offers 3 SKUs. They’re available in three form factors: 4U (liquid cooled), 8U and 10U (both air cooled). Here are the Supermicro SKUs:

What about the software stack?

To support these and other systems, AMD offers two powerful components for the AI software stack:

Do More: