Sponsored by:

Visit AMD Visit Supermicro

Capture the full potential of IT

AMD presents its vision for the AI future: open, collaborative, for everyone

Featured content

AMD presents its vision for the AI future: open, collaborative, for everyone

Check out the highlights of AMD’s Advancing AI event—including new GPUs, software and developer resources.

Learn More about this topic
  • Applications:
  • Featured Technologies:

AMD advanced its AI vision at the “Advancing AI” event on June 12. The event, held live in the Silicon Valley city of San Jose, Calif., as well as online, featured presentations by top AMD executives and partners.

As many of the speakers made clear, AMD’s vision for AI is that it be open, developer-friendly, collaborative and useful to all.

AMD certainly believes the market opportunity is huge. During the day’s keynote, CEO Lisa Su said AMD now believes the total addressable market (TAM) for data-center AI will exceed $500 billion by as soon as 2028.

And that’s not all. Su also said she expects AI to move beyond the data center, finding new uses in edge computers, PCs, smartphone and other devices.

To deliver on this vision, Su explained, AMD is taking a three-pronged approach to AI:

  • Offer a broad portfolio of compute solutions.
  • Invest in an open development ecosystem.
  • Deliver full-stack solutions via investments and acquisitions.

The event, lasting over two hours, was also filled with announcements. Here are the highlights.

New: AMD Instinct MI350 Series

At the Advancing AI event, CEO Su formally announced the company’s AMD Instinct MI350 Series GPUs.

There are two models, the MI350X and MI355X. Though both are based on the same silicon, the MI355X supports higher thermals.

These GPUs, Su explained, are based on AMD’s 4th gen Instinct architecture, and each GPU comprises 10 chiplets containing a total of 185 billion transistors. The new Instinct solutions can be used for both AI training and AI inference, and they can also be configured in either liquid- or air-cooled systems.

Su said the MI355X delivers a massive 35x general increase in AI performance over the previous-generation Instinct MI300. For AI training, the Instinct MI355X offers up to 3x more throughput than the Instinct MI300. And in comparison with a leading competitive GPU, the new AMD GPU can create up to 40% more tokens per dollar.

AMD’s event also featured several representatives of companies already using AMD Instinct MI300 GPUs. They included Microsoft, Meta and Oracle.

Introducing ROCm 7 and AMD Developer Cloud

Vamsi Boppana, AMD’s senior VP of AI, announced ROCm 7, the latest version of AMD’s open-source AI software stack. ROCm 7 features improved support for industry-standard frameworks; expanded hardware compatibility; and new development tools, drivers, APIs and libraries to accelerate AI development and deployment.

Earlier in the day, CEO Su said AMD’s software efforts “are all about the developer experience.” To that end, Boppana introduced the AMD Developer Cloud, a new service designed for rapid, high-performance AI development.

He also said AMD is giving developers a 25-hour credit on the Developer Cloud with “no strings.” The new AMD Developer Cloud is generally available now.

Road Map: Instinct MI400, Helios rack, Venice CPU, Vulcano NIC

During the last segment of the AMD event, Su gave attendees a sneak peek at several forthcoming products:

  • Instinct MI400 Series: This GPU is being designed for both large-scale AI inference and training. It will be the heart of the Helios rack solution (see below) and provide what Su described as “the engine for the next generation of AI.” Expect performance of up to 40 petaflops, 432GB of HBM4 memory, and bandwidth of 19.6TB/sec.
  • Helios: The code name for a unified AI rack solution coming in 2026. As Su explained it, Helios will be a rack configuration that functions like a single AI engine, incorporating AMD’s EPYC CPU, Instinct GPU, Pensando Pollara network interface card (NIC) and ROCm software. Specs include up to 72 GPUs in a rack and 31TB of HBM3 memory.
  • Venice: This is the code name for the next generation of AMD EPYC server CPUs, Su said. They’ll be based on a 2nm form, feature up to 256 cores, and offer a 1.7x performance boost over the current generation.
  • Vulcano: A future NIC, it will be built using a 3nm form and feature speeds of up to 800Gb/sec.

Do More:

 

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What’s a NIC? And how can it empower AI?

Featured content

Tech Explainer: What’s a NIC? And how can it empower AI?

With the acceleration of AI, the network interface card is playing a new, leading role.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The humble network interface card (NIC) is getting a status boost from AI.

At a fundamental level, the NIC enables one computing device to communicate with others across a network. That network could be a rendering farm run by a small multimedia production house, an enterprise-level data center, or a global network like the internet.

From smartphones to supercomputers, most modern devices use a NIC for this purpose. On laptops, phones and other mobile devices, the NIC typically connects via a wireless antenna. For servers in enterprise data centers, it’s more common to connect the hardware infrastructure with Ethernet cables.

Each NIC—or NIC port, in the case of an enterprise NIC—has its own media access control (MAC) address. This unique identifier enables the NIC to send and receive relevant packets. Each packet, in turn, is a small chunk of a much larger data set, enabling it to move at high speeds.

Networking for the Enterprise

At the enterprise level, everything needs to be highly capable and powerful, and the NIC is no exception. Organizations operating full-scale data centers rely on NICs to do far more than just send emails and sniff packets (the term used to describe how a NIC “watches” a data stream, collecting only the data addressed to its MAC address).

Today’s NICs are also designed to handle complex networking tasks onboard, relieving the host CPU so it can work more efficiently. This process, known as smart offloading, relies on several functions:

  • TCP segmentation offloading: This breaks big data into small packets.
  • Checksum offloading: Here, the NIC independently checks for errors in the data.
  • Receive side scaling: This helps balance network traffic across multiple processor cores, preventing them from getting bogged down.
  • Remote Direct Memory Access (RDMA): This process bypasses the CPU and sends data directly to GPU memory.

Important as these capabilities are, they become even more vital when dealing with AI and machine learning (ML) workloads. By taking pressure off the CPU, modern NICs enable the rest of the system to focus on running these advanced applications and processing their scads of data.

This symbiotic relationship also helps lower a server’s operating temperature and reduce its power usage. The NIC does this by increasing efficiency throughout the system, especially when it comes to the CPU.

Enter the AI NIC

Countless organizations both big and small are clamoring to stake their claims in the AI era. Some are creating entirely new AI and ML applications; others are using the latest AI tools to develop new products that better serve their customers.

Either way, these organizations must deal with the challenges now facing traditional Ethernet networks in AI clusters. Remember, Ethernet was invented over 50 years ago.

AMD has a solution: a revolutionary NIC it has created for AI workloads, the AMD AI NIC card. Recently released, this NIC card is designed to provide the intense communication capabilities demanded by AI and ML models. That includes tightly coupled parallel processing, rapid data transfers and low-latency communications.

AMD says its AI NIC offers a significant advancement in addressing the issues IT managers face as they attempt to reconcile the broad compatibility of an aging network technology with modern AI workloads. It’s a specialized network accelerator explicitly designed to optimize data transfer within back-end AI networks for GPU-to-GPU communication.

To address the challenges of AI workloads, what’s needed is a network that can support distributed computing over multiple GPU nodes with low jitter and RDMA. The AMD AI NIC is designed to manage the unique communication patterns of AI workloads and offer high throughput across all available links. It also offers congestion avoidance, reduced tail latency, scalable performance, and fast job-completion times.

Validated NIC

Following rigorous validation by the engineers at Supermicro, the AMD AI NIC is now supported on the Supermicro 8U GPU Server (AS -8126GS-TNMR). This behemoth is designed specifically for AI, deep learning, high-performance computing (HPC), industrial automation, retail and climate modeling.

In this configuration, AMD’s smart AI-focused NIC can offload networking tasks. This lets the Supermicro SuperServer’s dual AMD EPYC 9000-series processors run at even higher efficiency.

In the Supermicro server, the new AMD AI NIC occupies one of the myriad PCI Express x16 slots. Other optional high-performance PCIe cards include a CPU-to-GPU interconnect and up to eight AMD Instinct GPU accelerators.

In the NIC of time

A chain is only as strong as its weakest link. The chain that connects our ever-expanding global network of AI operations is strengthened by the advent of NICs focused on AI.

As NICs grow more powerful, these advanced network interface cards will help fuel the expansion of the AI/ML applications that power our homes, offices, and everything in between. They’ll also help us bypass communication bottlenecks and speed time to market.

For SMBs and enterprises alike, that’s good news indeed.

Do More:

1

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Meet AMD’s new EPYC CPUs for SMBs—and Supermicro servers that support them

Featured content

Meet AMD’s new EPYC CPUs for SMBs—and Supermicro servers that support them

AMD introduced the AMD EPYC 4005 series processors for SMBs and cloud service providers. And Supermicro announced that the new AMD processors are now shipping in several of its servers.

Learn More about this topic
  • Applications:
  • Featured Technologies:

AMD this week introduced the AMD EPYC 4005 series processors. These are purpose-built CPUs designed to bring enterprise-level features and performance to small and medium businesses.

And Supermicro, wasting no time, also announced that several of its servers are now shipping with the new AMD EPYC 4005 CPUs.

EPYC 4005

The new AMD EPYC 4005 series processors are intended for on-prem users and cloud service providers who need powerful but cost-effective solutions in a 3U height form factor.

Target customers include SMBs, departmental and branch-office server users, and hosted IT service providers. Typical workloads for servers powered by the new CPUs will include general-purpose computing, dedicated hosting, code development, retail edge deployments, and content creation, AMD says.

“We’re delivering the right balance of performance, simplicity, and affordability,” says Derek Dicker, AMD’s corporate VP of enterprise and HPC. “That gives our customers and system partners the ability to deploy enterprise-class solutions that solve everyday business challenges.”

The new processors feature AMD’s ‘Zen 5’ core architecture and come in a single-socket package. Depending on model, they offer anywhere from 6 to 16 cores; up to 192GB of dual-channel DDR5 memory; 28 lanes of PCIe Gen 5 connectivity; and boosted performance of up to 5.7 GHz. One model of the AMD EPYC 4005 line also includes integrated AMD 3D V-Cache tech for a larger 128MB L3 cache and lower latency.

On a standard 42U rack, servers powered by AMD EPYC 4005 can provide up to 2,080 cores (that’s 13 3U servers x 10 nodes/server x 16 cores/node). That level of capacity can reduce a user’s size requirements while also lowering their TCO.

The new AMD CPUs follow the AMD EPYC 4004 series, introduced this time last year. The EPYC 4004 processors, still available from AMD, use the same AM5 socket as the 4005s.

Supermicro Servers

Also this week, Supermicro announced that several of its servers are now shipping with the new AMD EPYC 4005 series processors. Supermicro also introduced a new MicroCloud 3U server that’s available in 10-node and 5-node versions, both powered by the AMD EPYC 4005 CPUs.

"Supermicro continues to deliver first-to-market innovative rack-scale solutions for a wide range of use cases,” says Mory Lin, Supermicro’s VP of IoT, embedded and edge computing.

Like the AMD EPYC 4005 CPUs, the Supermicro servers are intended for SMBs, departmental and branch offices, and hosted IT service providers.

The new Supermicro MicroCloud 10-node server features single-socket AMD processors (your choice of either 4004 or the new 4005) as well as support for one single-width GPU accelerator card.

Supermicro’s new 5-node MicroCloud server also offers a choice of AMD EPYC 4004 or 4005 series processor. In contrast to the 10-node server, the 5-node version supports one double-width GPU accelerator card.

Supermicro has also added support for the new AMD EPYC 4005 series processors to several of its existing server lines. These servers include 1U, 2U and tower servers.

Have SMB, branch or hosting customers looking for affordable compute power? Tell them to:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Oil & gas spotlight: Fueling up with AI

Featured content

Oil & gas spotlight: Fueling up with AI

AI is helping industry players that include BP, Chevron and Shell automate a wide range of important use cases. To serve them, AMD and Supermicro offer powerful accelerators and servers.

Learn More about this topic
  • Applications:
  • Featured Technologies:

What’s artificial intelligence good for? For managers in the oil and gas industry, quite a lot.

Industry players that include Shell, BP, ExxonMobil and Chevron are already using machine learning and AI. Use cases include predictive maintenance, seismic data analysis, reservoir management and safety monitoring, says a recent report by Chirag Bharadwaj of consultants Appinventiv.

AI’s potential benefits for oil and gas companies are substantial. Anurag Jain of AI consultants Oyelabs cites estimates of AI lowering oil production costs by up to $5 a barrel with a 25% productivity gain, and increasing oil reserves by as much as 20% with enhanced resource recovery.

Along the same lines is a recent report from market watcher Global Growth Insights. It says adoption of AI in North American oil shale drilling has increased production efficiency by an impressive 20%.

All this has led Jain of Oyelabs to expect a big increase in the oil and gas industry’s AI spend. He predicts the industry’s worldwide spending on AI will rise from $3 billion last year to nearly $5.3 billion in 2028.

Assuming Jain is right, that would put the oil and gas industry’s AI spend at about 15% of its total IT spend. Last year, the industry spent nearly $20 billion on all IT goods and services worldwide, says Global Growth Insights.

Powerful Solutions

All this AI activity in the oil and gas industry hasn’t passed the notice of AMD and Supermicro. They’re on the case.

AMD is offering the industry its AMD Instinct MI300A, an accelerator that combines CPU cores and GPUs to fuel the convergence of high-performance computing (HPC) with AI. And Supermicro is offering rackmount servers driven by this AMD accelerator.

Here are some of the benefits the two companies are offering oil and gas companies:

  • An APU multi-chip architecture that enables dense compute, high-bandwidth memory integration, and chips for both CPU and GPU all in one.
  • Up to 2.6x the HPC performance/watt vs. the older AMD Instinct MI250X.
  • Up to 5.1x the AI-training workload performance with INT8 vs. the AMD Instinct MI250X. (INT8 is a fixed-point representation using 8 bits.)
  • Up to 128GB of unified HBM3 memory dedicated to GPUs. (HBM3 is a high-bandwidth memory chip technology that offers increased bandwidth, memory capacity and power efficiency, all in a smaller form factor.)
  • Double-precision power up to 122.6 TFLOPS with FP64 matrix HPC performance. (FP64 is a double-precision floating point format using 64 bits in memory.)
  • Complete, pre-validated solutions that are ready for rack-scale deployment on day one. These offer the choice of either 2U (liquid cooled) or 4U (air cooled) form factors.
     

If you have customers in oil and gas looking to get into AI, tell them about these Supermicro and AMD solutions.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Healthcare in the spotlight: Big challenges, big tech

Featured content

Healthcare in the spotlight: Big challenges, big tech

To meet some of their industry’s toughest challenges, healthcare providers are turning to advanced technology.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Healthcare providers face some tough challenges. Advanced technology can help.

As a recent report from consultants McKinsey & Co. points out, healthcare providers are dealing with some big challenges. These include rising costs, workforce shortages, an aging population, and increased competition from nontraditional parties.

Another challenge: Consumers expect their healthcare providers to offer new capabilities, such as digital scheduling and telemedicine, as well as better experiences.

One way healthcare providers hope to meet these two challenge streams is with advanced technology. Three-quarters of U.S. healthcare providers increased their IT spending in the last year, according to a survey conducted by consultants Bain & Co. The same survey found that 15% of healthcare providers already have an AI strategy in place, up from just 5% who had a strategy in 2023.

Generative AI is showing potential, too. Another survey, this one done by McKinsey, finds that over 70% of healthcare organizations are now either pursuing GenAI proofs-of-concept or are already implementing GenAI solutions.

Dynamic Duo

There’s a catch to all this: As healthcare providers adopt AI, they’re finding that the required datasets and advanced analytics don’t run well on their legacy IT systems.

To help, Supermicro and AMD are working together. They’re offering healthcare providers heavy-duty compute delivered at rack scale.

Supermicro servers powered by AMD Instinct MI300X GPUs are designed to accelerate AI and HPC workloads in healthcare. They offer the levels of performance, density and efficiency healthcare providers need to improve patient outcomes.

The AMD Instinct MI300X is designed to deliver high performance for GenAI workloads and HPC applications. It’s designed with no fewer than 304 high-throughput compute units. You also get AI-specific functions and 192GB of HBM3 memory, all of it based on AMD’s CDNA 3 architecture.

Healthcare providers can use Supermicro servers powered by AMD GPUs for next-generation research and treatments. These could include advanced drug discovery, enhanced diagnostics and imaging, risk assessments and personal care, and increased patient support with self-service tools and real-time edge analytics.

Supermicro points out that its servers powered by AMD Instinct GPUs deliver massive compute with rack-scale flexibility, as well as high levels of power efficiency.

Performance:

  • The powerful combination of CPUs, GPUs and HBM3 memory accelerates HPC and AI workloads.
  • HBM3 memory offers capacities of up to 192GB dedicated to the GPUs.
  • Complete solutions ship pre-validated, ready for instant deployment.
  • Double-precision power can serve up to 163.4 TFLOPS.

Flexibility:

  • Proven AI building-block architecture streamlines deployment at scale for the largest AI models.
  • An open AI ecosystem with AMD ROCm open software.
  • A unified computing platform with AMD Instinct MI300X plus AMD Infinity fabric and infrastructure.
  • Thanks to a modular design and build, users move faster to the correct configuration.

Efficiency:

  • Dual-zone cooling innovation, used by some of the most efficient supercomputers on the Green500 supercomputer list.
  • Improved density with 3rd Gen AMD CDNA, delivering 19,456 stream cores.
  • Chip-level power intelligence enables the AMD Instinct MI300X to deliver big power performance.
  • Purpose-built silicon design of the 3rd Gen AMD CDNA combines 5nm and 6nm fabrication processes.

Are your healthcare clients looking to unleash the potential of their data? Then tell them about Supermicro systems powered by the AMD MI300X GPUs.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

AMD Instinct MI300A blends GPU, CPU for super-speedy AI/HPC

Featured content

AMD Instinct MI300A blends GPU, CPU for super-speedy AI/HPC

CPU or GPU for AI and HPC? You can get the best of both with the AMD Instinct MI300A.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The AMD Instinct MI300A is the world’s first data center accelerated processing unit for high-performance computing and AI. It does this by integrating both CPU and GPU cores on a single package.

That makes the AMD Instinct MI300A highly efficient at running both HPC and AI workloads. It also makes the MI300A powerful enough to accelerate training the latest AI models.

Introduced about a year ago, the AMD Instinct MI300A accelerator is shipping soon. So are two Supermicro servers—one a liquid-cooled 2U system, the other an air-cooled 4U—each powered by four MI300A units.

Under the Hood

The technology of the AMD Instinct MI300A is impressive. Each MI300A integrates 24 AMD ‘Zen 4’ x86 CPU cores with 228 AMD CDNA 3 high-throughput GPU compute units.

You also get 128GB of unified HBM3 memory. This presents a single shared address space to CPU and GPU, all of which are interconnected into the coherent 4th Gen AMD Infinity architecture.

Also, the AMD Instinct MI300A is designed to be used in a multi-unit configuration. This means you can connect up to four of them in a single server.

To make this work, each APU has 1 TB/sec. of bidirectional connectivity through eight 128 GB/sec. AMD Infinity Fabric interfaces. Four of the interfaces are dedicated Infinity Fabric links. The other four can be flexibly assigned to deliver either Infinity Fabric or PCIe Gen 5 connectivity.

In a typical four-APU configuration, six interfaces are dedicated to inter-GPU Infinity Fabric connectivity. That supplies a total of 384 GB/sec. of peer-to-peer connectivity per APU. One interface is assigned to support x16 PCIe Gen 5 connectivity to external I/O devices. In addition, each MI300A includes two x4 interfaces to storage, such as M.2 boot drives, plus two USB Gen 2 or 3 interfaces.

Converged Computing

There’s more. The AMD Instinct MI300A was designed to handle today’s convergence of HPC and AI applications at scale.

To meet the increasing demands of AI applications, the APU is optimized for widely used data types. These include FP64, FP32, FP16, BF16, TF32, FP8 and INT8.

The MI300A also supports native hardware sparsity for efficiently gathering data from sparse matrices. This saves power and compute cycles, and it also lowers memory use.

Another element of the design aims at high efficiency by eliminating time-consuming data copy operations. The MI300A can easily offload tasks easily between the CPU and GPU. And it’s all supported by AMD’s ROCm 6 open software platform, built for HPC, AI and machine learning workloads.

Finally, virtualized environments are supported on the MI300A through SR-IOV to share resources with up to three partitions per APU. SR-IOV—short for single-root, input/output virtualization—is an extension of the PCIe spec. It allows a device to separate access to its resources among various PCIe functions. The goal: improved manageability and performance.

Fun fact: The AMD Instinct MI300A is a key design component of the El Capitan supercomputer recently dedicated by Lawrence Livermore Labs. This system can process over two quintillion (1018) calculations per second.

Supermicro Servers

As mentioned above, Supermicro now offers two server systems based on the AMD Instinct MI300A APU. They’re 2U and 4U systems.

These servers both take advantage of AMD’s integration features by combining four MI300A units in a single system. That gives you a total of 912 GPUs, 96 CPUs, and 512GB of HBM3 memory.

Supermicro says these systems can push HPC processing to Exascale levels, meaning they’re very, very fast. “Flop” is short for floating point operations per second, and “exa” indicates a 1 with 18 zeros after it. That’s fast.

Supermicro’s 2U server (model number AS -2145GH-TNMR-LCC) is liquid-cooled and aimed at HPC workloads. Supermicro says direct-to-chip liquid-cooling technology enables a nice TCO with over 51% data center energy cost savings. The company also cites a 70% reduction in fan power usage, compared with air-cooled solutions.

If you’re looking for big HPC horsepower, Supermicro’s got your back with this 2U system. The company’s rack-scale integration is optimized with dual AIOM (advanced I/O modules) and 400G networking. This means you can create a high-density supercomputing cluster with as many as 21 of Supermicro’s 2U systems in a 48U rack. With each system combining four MI300A units, that would give you a total of 84 APUs.

The other Supermicro server (model number AS -4145GH-TNMR) is an air-cooled 4U system, also equipped with four AMD Instinct MI300A accelerators, and it’s intended for converged HPC-AI workloads. The system’s mechanical airflow design keeps thermal throttling at bay; if that’s not enough, the system also has 10 heavy-duty 80mm fans.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

AMD’s new ROCm 6.3 makes GPU programming even better

Featured content

AMD’s new ROCm 6.3 makes GPU programming even better

AMD recently introduced version 6.3 of ROCm, its open software stack for GPU programming. New features included expanded OS support and other optimizations.

Learn More about this topic
  • Applications:
  • Featured Technologies:

There’s a new version of AMD ROCm, the open software stack designed to enable GPU programming from low-level kernel all the way up to end-user applications.  

The latest version, ROCm 6.3, adds features that include expanded operating system support, an open-source toolkit and more.

Rock On

AMD ROCm provides the tools for HIP (the heterogeneous-computing interface for portability), OpenCL and OpenMP. These include compilers, APIs, libraries for high-level functions, debuggers, profilers and runtimes.

ROCm is optimized for Generative AI and HPC applications, and it’s easy to migrate existing code into. Developers can use ROCm to fine-tune workloads, while partners and OEMs can integrate seamlessly with AMD to create innovative solutions.

The latest release builds on ROCm 6, which AMD introduced last year. Version 6 added expanded support for AMD Instinct MI300A and MI300X accelerators, key AI support features, optimized performance, and an expanded support ecosystem.

The senior VP of AMD’s AI group, Vamsi Boppana, wrote in a recent blog post: “Our vision is for AMD ROCm to be the industry’s premier open AI stack, enabling choice and rapid innovation.”

New Features

Here’s some of what’s new in AMD ROCm 6.3:

  • rocJPEG: A high-performance JPEG decode SDK for AMD GPUs.
  • ROCm compute profiler and system profiler: Previously known as Omniperf and Omnitrace, these have been renamed to reflect their new direction as part of the ROCm software stack.
  • Shark AI toolkit: This open-source toolkit is for high-performance serving of GenAI and  LLMs. Initial release includes support for the AMD Instinct MI300.
  • PyTorch 2.4 support: PyTorch is a machine learning library used for applications such as computer vision and natural language processing. Originally developed by Meta AI, it’s now part of the Linux Foundation umbrella.
  • Expanded OS support: This includes added support for Ubuntu 24.04.2 and 22.04.5; RHEL 9.5; and Oracle Linux 8.10. In addition, ROCm 6.3.1 includes support for both Debian 12 and the AMD Instinct MI325X accelerator.
  • Documentation updates: ROCm 6.3 offers clearer, more comprehensive guidance for a wider variety of use cases and user needs.

Super for Supermicro

Developers can use ROCm 6.3 to create tune workloads and create solutions for Supermicro GPU systems based on AMD Instinct MI300 accelerators.

Supermicro offers three such systems:

Are your customers building AI and HPC systems? Then tell them about the new features offered by AMD ROCm 6.3.

Do More:

 

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

Featured content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

In the latest MLPerf testing, the AMD Instinct MI300X Accelerator with ROCm software stack beat the competition with strong GenAI inference performance. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

New benchmarks using the AMD Instinct MI300X Accelerator show impressive performance that surpasses the competition.

This is great news for customers operating demanding AI workloads, especially those underpinned by large language models (LLMs) that require super-low latency.

Initial platform tests using MLPerf Inference v4.1 measured AMD’s flagship accelerator against the Llama 2 70B benchmark. This test is an indication for real-world applications, including natural language processing (NLP) and large-scale inferencing.

MLPerf is the industry’s leading benchmarking suite for measuring the performance of machine learning and AI workloads from domains that include vision, speech and NLP. It offers a set of open-source AI benchmarks, including rigorous tests focused on Generative AI and LLMs.

Gaining high marks from the MLPerf Inference benchmarking suite represents a significant milestone for AMD. It positions the AMD Instinct MI300X accelerator as a go-to solution for enterprise-level AI workloads.

Superior Instincts

The results of the LLaMA2-70B test are particularly significant. That’s due to the benchmark’s ability to produce an apples-to-apples comparison of competitive solutions.

In this benchmark, the AMD Instinct MI300X was compared with NVIDIA’s H100 Tensor Core GPU. The test concluded that AMD’s full-stack inference platform was better than the H100 at achieving high-performance LLMs, a workload that requires both robust parallel computing and a well-optimized software stack.

The testing also showed that because the AMD Instinct MI300X offers the largest GPU memory available—192GB of HBM3 memory—it was able to fit the entire LLaMA2-70B model into memory. Doing so helped to avoid network overhead by preventing model splitting. This, in turn, maximized inference throughput, producing superior results.

Software also played a big part in the success of the AMD Instinct series. The AMD ROCm software platform accompanies the AMD Instinct MI300X. This open software stack includes programming models, tools, compilers, libraries and runtimes for AI solution development on the AMD Instinct MI300 accelerator series and other AMD GPUs.

The testing showed that the scaling efficiency from a single AMD Instinct MI300X, combined with the ROCm software stack, to a complement of eight AMD Instinct accelerators was nearly linear. In other words, the system’s performance improved proportionally by adding more GPUs.

That test demonstrated the AMD Instinct MI300X’s ability to handle the largest MLPerf inference models to date, containing over 70 billion parameters.

Thinking Inside the Box

Benchmarking the AMD Instinct MI300X required AMD to create a complete hardware platform capable of addressing strenuous AI workloads. For this task, AMD engineers chose as their testbed the Supermicro AS -8125GS-TNMR2, a massive 8U complete system.

Supermicro’s GPU A+ Client Systems are designed for both versatility and redundancy. Designers can outfit the system with an impressive array of hardware, starting with two AMD EPYC 9004-series processors and up to 6TB of ECC DDR5 main memory.

Because AI workloads consume massive amounts of storage, Supermicro has also outfitted this 8U server with 12 front hot-swap 2.5-inch NVMe drive bays. There’s also the option to add four more drives via an additional storage controller.

The Supermicro AS -8125GS-TNMR2 also includes room for two hot-swap 2.5-inch SATA bays and two M.2 drives, each with a capacity of up to 3.84TB.

Power for all those components is delivered courtesy of six 3,000-watt redundant titanium-level power supplies.

Coming Soon: Even More AI power

AMD engineers continually push the limits of silicon and human ingenuity to expand the capabilities of their hardware. So it should come as little surprise that new iterations of the AMD Instinct series are expected to be released in the coming months. This past May, AMD officials said they plan to introduce AMD Instinct MI325, MI350 and MI400 accelerators.

Forthcoming Instinct accelerators, AMD says, will deliver advances including additional memory, support for lower-precision data types, and increased compute power.

New features are also coming to the AMD ROCm software stack. Those changes should include software enhancements including kernel improvements and advanced quantization support.

Are you customers looking for a high-powered, low-latency system to run their most demanding HPC and AI workloads? Tell them about these benchmarks and the AMD Instinct MI300X accelerators.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

Featured content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

The latest release of AMD’s free and open software stack for developing AI and HPC solutions delivers 5 important enhancements. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

If you develop AI and HPC solutions, you’ll want to know about the most recent release of AMD ROCm software, version 6.2.

ROCm, in case you’re unfamiliar with it, is AMD’s free and open software stack. It’s aimed at developers of artificial intelligence and high-performance computing (HPC) solutions on AMD Instinct accelerators. It's also great for developing AI and HPC solutions on AMD Instinct-powered servers from Supermicro. 

First introduced in 2016, ROCm open software now includes programming models, tools, compilers, libraries, runtimes and APIs for GPU programming.

ROCm version 6.2, announced recently by AMD, delivers 5 key enhancements:

  • Improved vLLM support 
  • Boosted memory efficiency & performance with Bitsandbytes
  • New Offline Installer Creator
  • New Omnitrace & Omniperf Profiler Tools (beta)
  • Broader FP8 support

Let’s look at each separately and in more detail.

LLM support

To enhance the efficiency and scalability of its Instinct accelerators, AMD is expanding vLLM support. vLLM is an easy-to-use library for the large language models (LLMs) that power Generative AI.

ROCm 6.2 lets AMD Instinct developers integrate vLLM into their AI pipelines. The benefits include improved performance and efficiency.

Bitsandbytes

Developers can now integrate Bitsandbytes with ROCm for AI model training and inference, reducing their memory and hardware requirements on AMD Instinct accelerators. 

Bitsandbytes is an open source Python library that enables LLMs while boosting memory efficiency and performance. AMD says this will let AI developers work with larger models on limited hardware, broadening access, saving costs and expanding opportunities for innovation.

Offline Installer Creator

The new ROCm Offline Installer Creator aims to simplify the installation process. This tool creates a single installer file that includes all necessary dependencies.

That makes deployment straightforward with a user-friendly GUI that allows easy selection of ROCm components and versions.

As the name implies, the Offline Installer Creator can be used on developer systems that lack internet access.

Omnitrace and Omniperf Profiler

The new Omnitrace and Omniperf Profiler Tools, both now in beta release, provide comprehensive performance analysis and a streamlined development workflow.

Omnitrace offers a holistic view of system performance across CPUs, GPUs, NICs and network fabrics. This helps developers ID and address bottlenecks.

Omniperf delivers detailed GPU kernel analysis for fine-tuning.

Together, these tools help to ensure efficient use of developer resources, leading to faster AI training, AI inference and HPC simulations.

FP8 Support

Broader FP8 support can improve the performance of AI inferencing.

FP8 is an 8-bit floating point format that provides a common, interchangeable format for both AI training and inference. It lets AI models operate and perform consistently across hardware platforms.

In ROCm, FP8 support improves the process of running AI models, particularly in inferencing. It does this by addressing key challenges such as the memory bottlenecks and high latency associated with higher-precision formats. In addition, FP8's reduced precision calculations can decrease the latency involved in data transfers and computations, losing little to no accuracy.  

ROCm 6.2 expands FP8 support across its ecosystem, from frameworks to libraries and more, enhancing performance and efficiency.

Do More:

Watch the related video podcast:

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Featured content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Catch the latest AI insights from leading researchers and market analysts.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Sales of artificial intelligence platform software show no sign of a slowdown. The road to true Generative AI disruption could be bumpy. And PCs with built-in AI capabilities are starting to sell.

That’s some of the latest AI insights from leading market researchers, analysts and pollsters. And here’s your research roundup.

AI Platforms Maintain Momentum

Is the excitement around AI overblown? Not at all, says market watcher IDC.

“The AI platforms market shows no sign of slowing down,” says IDC VP Ritu Jyoti.

IDC now believes that the market for AI platform software will maintain its momentum through at least 2028.

By that year, IDC expects, worldwide revenue for AI software will reach $153 billion. If so, that would mark a five-year compound annual growth rate (CAGR) of nearly 41%.

The market really got underway last year. That’s when worldwide AI platform software revenue hit $27.9 billion, an annual increase of 44%, IDC says.

Since then, lots of progress has been made. Fully half the organizations now deploying GenAI in production have already selected an AI platform. And IDC says most of the rest will do so in the next six months.

All that has AI software suppliers looking pretty smart.

Mixed Signals on GenAI

There’s no question that GenAI is having a huge impact. The question is how difficult it will be for GenAI-using organizations to achieve their desired results.

GenAI use is already widespread. In a global survey conducted earlier this year by management consultants McKinsey & Co., 65% of respondents said they use GenAI on a regular basis.

That was nearly double the percentage from McKinsey’s previous survey, conducted just 10 months earlier.

Also, three quarters of McKinsey’s respondents said they expect GenAI will lead their industries to significant or disruptive changes.

However, the road to GenAI could be bumpy. Separately, researchers at Gartner are predicting that by the end of 2025, at least 30% of all GenAI projects will be abandoned after their proof-of-concept (PoC). 

The reason? Gartner points to several factors: poor data quality, inadequate risk controls, unclear business value, and escalating costs.

“Executives are impatient to see returns on GenAI investments,” says Gartner VP Rita Sallam. “Yet organizations are struggling to prove and realize value.”

One big challenge: Many organizations investing in GenAI want productivity enhancements. But as Gartner points out, those gains can be difficult to quantify.

Further, implementing GenAI is far from cheap. Gartner’s research finds that a typical GenAI deployment costs anywhere from $5 million to $20 million.

That wide range of costs is due to several factors. These include the use cases involved, the deployment approaches used, and whether an organization seeks to be a market disruptor.

Clearly, an intelligent approach to GenAI can be a money-saver.

PCs with AI? Yes, Please

Leading PC makers hope to boost their hardware sales by offering new, built-in AI capabilities. It seems to be working.

In the second quarter of this year, 8.8 million PCs—that’s 14% of all shipped globally in the quarter—were AI-capable, says market analysts Canalys.

Canalys defines “AI-capable” pretty simply: It’s any desktop or notebook system that includes a chipset or block for one or more dedicated AI workloads.

By operating system, nearly 40% of the AI-capable PC shipped in Q2 were Windows systems, 60% were Apple macOS systems, and just 1% ran ChromeOS, Canalys says.

For the full year 2024, Canalys expects some 44 million AI-capable PCs to be shipped worldwide. In 2025, the market watcher predicts, these shipments should more than double, rising to 103 million units worldwide. There's nothing artificial about that boost.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Pages