Earlier this month, AMD took the wraps off its highly anticipated AMD Instinct MI300 Series of generative AI accelerators and data-center acceleration processing units (APUs). During the announcement event, AMD president Victor Peng said the new components had been “designed with our most advanced technologies.”
Advanced technologies indeed. With the AMD Instinct MI300 Series, AMD is writing a brand-new chapter in the story of AI-adjacent technology.
Early AI developments relied on the equivalent of a hastily thrown-together stock car constructed of whichever spare parts happened to be available at the time. But those days are over.
Now the future of computing has its very own Formula 1 race car. It’s extraordinarily powerful and fine-tuned to nanometer tolerances.
A new paradigm
At the heart of this new accelerator series is AMD’s CDNA 3 architecture. This third generation employs advanced packaging that tightly couples CPUs and GPUs to bring high-performance processing to AI workloads.
AMD’s new architecture also uses 3D packaging technologies that integrate up to 8 vertically stacked accelerator complex dies (XCDs) and four I/O dies (IODs) that contain system infrastructure. The various systems are linked via AMD Infinity Fabric technology and are connected to 8 stacks of high-bandwidth memory (HBM).
High-bandwidth memory can provide far more bandwidth and yet much lower power consumption compared with the GDDR memory found in standard GPUs. Like many of AMD’s notable innovations, its HBM employs a 3D design.
In this case, the memory modules are stacked vertically to shorten the distance the data needs to travel. This also allows for smaller form factors.
AMD has implemented the HMB using a unified memory architecture. This is an increasingly popular design in which a single array of main-memory modules supports both the CPU and GPU simultaneously, speeding tasks and applications.
Unified memory is more efficient than traditional memory architecture. It offers the advantage of faster speeds along with lower power consumption and ambient temperatures. Also, data need not be copied from one set of memory to another.
Greater than the sum of its parts
What really makes AMD CDNA 3 unique is its chiplet-based architecture. The design employs a single logical processor that contains a dozen chiplets.
Each chiplet, in turn, is fabricated for either compute or memory. To communicate, all the chiplets are connected via the AMD Infinity Fabric network-on-chip.
The primary 5nm XCDs contain the computational elements of the processor along with the lowest levels of the cache hierarchy. Each XCD includes a shared set of global resources, including the scheduler, hardware queues and 4 asynchronous compute engines (ACE).
The 6nm IODs are dedicated to the memory hierarchy. These chiplets carry a newly redesigned AMD Infinity Cache and an HBM3 interface to the on-package memory. The AMD Infinity Cache boosts generational performance and efficiency by increasing cache bandwidth and reducing the number of off-chip memory accesses.
Scaling ever upward
System architects are constantly in the process of designing and building the world’s largest exascale-class supercomputers and AI systems. As such, they are forever reaching for more powerful processors capable of astonishing feats.
The AMD CDNA 3 architecture is an obvious step in the right direction. The new platform takes communication and scaling to the next level.
In particular, the advent of AMD’s 4th Gen Infinity Architecture Fabric offers architects a new level of connectivity that could help produce a supercomputer far more powerful than anything we have access to today.
It’s reasonable to expect that AMD will continue to iterate its new line of accelerators as time passes. AI research is moving at a breakneck pace, and enterprises are hungry for more processing power to fuel their R&D.
What will researchers think of next? We won’t have to wait long to find out.
- Download data sheets:
- Read a white paper: AMD CDNA 3 architecture: The all-new AMD GPU architecture for the modern era of HPC and AI
- Watch an on-demand webinar: Supermicro GPU servers with AMD MI300 Series
- Read a product brief: Supermicro and AMD deliver rackscale AI and HPC solutions with the new AMD Instinct MI300 Series accelerators