Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

Supermicro FlexTwin now supports 5th gen AMD EPYC CPUs

Featured content

Supermicro FlexTwin now supports 5th gen AMD EPYC CPUs

FlexTwin, part of Supermicro’s H14 server line, now supports the latest AMD EPYC processors — and keeps things chill with liquid cooling.

 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Wondering about the server of the future? It’s available for order now from Supermicro.

The company recently added support for the latest 5th Gen AMD EPYC 9005 Series processors on its 2U 4-node FlexTwin server with liquid cooling.

This server is part of Supermicro’s H14 line and bears the model number AS -2126FT-HE-LCC. It’s a high-performance, hot-swappable and high-density compute system.

Intended users include oil & gas companies, climate and weather modelers, manufacturers, scientific researchers and research labs. In short, anyone who requires high-performance computing (HPC).

Each 2U system comprises four nodes. And each node, in turn, is powered by a pair of 5th Gen AMD EPYC 9005 processors. (The previous-gen AMD EPYC 9004 processors are supported, too.)

Memory on this Supermicro FlexTwin maxes out at 9TB of DDR5, courtesy of up to 24 DIMM slots. Expansions connect via PCIe 5.0, with one slot per node the standard and more available as an option.

The 5th Gen AMD EPYC processors, introduced last month, are designed for data center, AI and cloud customers. The series launched with over 25 SKUs offering up to 192 cores and all using AMD’s new “Zen 5” or “Zen 5c” architectures.

Keeping Cool

To keep things chill, the Supermicro FlexTwin server is available with liquid cooling only. This allows the server to be used for HPC, electronic design automation (EDA) and other demanding workloads.

More specifically, the FlexTwin server uses a direct-to-chip (D2C) cold plate liquid cooling setup, and each system also runs 16 counter-rotating fans. Supermicro says this cooling arrangement can remove up to 90% of server-generated heat.

The server’s liquid cooling also covers the 5th gen AMD processors’ more demanding cooling requirements; they’re rated at up to 500W of thermal design power (TDP). By comparison, some members of the previous, 4th gen AMD EPYC processors have a default TDP as low as 200W.

Build & Recycle

The Supermicro FlexTwin server also adheres to the company’s “Building Block Solutions” approach. Essentially, this means end users purchase these servers by the rack.

Supermicro says its Building Blocks let users optimize for their exact workload. Users also gain efficient upgrading and scaling.

Looking even further into the future, once these servers are ready for an upgrade, they can be recycled through the Supermicro recycling program.

In Europe, Supermicro follows the EU’s Waste Electrical and Electronic Equipment (WEEE) Directive. In the U.S., recycling is free in California; users in other states may have to pay a shipping charge.

Put it all together, and you’ve got a server of the future, available to order today.

Do More:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What is the AMD “Zen” core architecture?

Featured content

Tech Explainer: What is the AMD “Zen” core architecture?

Originally launched in 2017, this CPU architecture now delivers high performance and efficiency with ever-thinner processes.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The recent release of AMD’s 5th generation processors—formerly codenamed Turin—also heralded the introduction of the company’s “Zen 5” core architecture.

“Zen” is AMD’s name for a design ethos that prioritizes performance, scalability and efficiency. As any CTO will tell you, these 3 aspects are crucial for success in today’s AI era.

AMD originally introduced its “Zen” architecture in 2017 as part of a broader campaign to steal market share and establish dominance in the all-important enterprise IT space.

Subsequent generations of the “Zen” design have markedly increased performance and efficiency while delivering ever-thinner manufacturing processes.

Now and Zen

Since the “Zen” core’s original appearance in AMD Ryzen 1000-series processors, the architecture’s design philosophy has maintained its focus on a handful of vital aspects. They include:

  • A modular design. Known as Infinity Fabric, it facilitates efficient connectivity among multiple CPU cores and other components. This modular architecture enhances scalability and performance, both of which are vital for modern enterprise IT infrastructure.
  • High core counts and multithreading. Both are common to EPYC and Ryzen CPUs built using the AMD “Zen” core architecture. Simultaneous multithreading enables each core to process 2 threads. In the case of EPYC processors, this makes AMD’s CPUs ideal for multithreaded workloads that include Generative AI, machine learning, HPC and Big Data.
  • Advanced manufacturing processes. These allow faster, more efficient communication among individual CPU components, including multithreaded cores and multilevel caches. Back in 2017, the original “Zen” architecture was manufactured using a 14-nanometer (nm) process. Today’s new “Zen 5” and “Zen 5c” architectures (more on these below) reduce the lithography to just 4nm and 3nm, respectively.
  • Enhanced efficiency. This enables IT staff to better manage complex enterprise IT infrastructure. Reducing heat and power consumption is crucial, too, both in data centers and at the edge. The AMD “Zen” architecture makes this possible by offering enterprise-grade EPYC processors that offer up to 192 cores, yet require a maximum thermal design power (TDP) of only 500W.

The Two-Fold Path

The latest, fifth generation “Zen” architecture is divided into two segments: “Zen 5” and “Zen 5c.”

“Zen 5” employs a 4-nanometer (nm) manufacturing process to deliver up to 128 cores operating at up to 4.1GHz. It’s optimized for high per-core performance.

“Zen 5c,” by contrast, offers a 3nm lithography that’s reserved for AMD EPYC 96xx, 97xx, 98xx, and 99xx series processors. It’s optimized for high density and power efficiency.

The most powerful of these CPUs—the AMD EPYC 9965—includes an astonishing 192 cores, a maximum boost clock speed of 3.7GHz, and an L3 cache of 384MB.

Both “Zen 5” and “Zen 5c” are key components of the 5th gen AMD EPYC processors introduced earlier this month. Both have also been designed to achieve double-digit increases in instructions per clock cycle (IPC) and equip the core with the kinds of data handling and processing power required by new AI workloads.

Supermicro’s Satori

AMD isn’t the only brand offering bold, new tech to harried enterprise IT managers.

Supermicro recently introduced its new H14 servers, GPU-accelerated systems and storage servers powered by AMD EPYC 9005 Series processors and AMD Instinct MI325X Accelerators. A number of these servers also support the new AMD “Turin” CPUs.

The new product line features updated versions of Supermicro’s vaunted Hyper system, Twin multinode servers, and AI-inferencing GPU systems. All are now available with the user’s choice of either air or liquid cooling.

Supermicro says its collection of purpose-built powerhouses represents one of the industry’s most extensive server families. That should be welcome news for organizations intent on building a fleet of machines to meet the highly resource-intensive demands of modern AI workloads.

By designing its next-generation infrastructure around AMD 5th Generation components, Supermicro says it can dramatically increase efficiency by reducing customers’ total data-center footprints by at least two-thirds.

Enlightened IT for the AI Era

While AMD and Supermicro’s advances represent today’s cutting-edge technology, tomorrow is another story entirely.

Keeping up with customer demand and the dizzying pace of AI-based innovation means these tech giants will soon return with more announcements, tools and design methodologies. AMD has already promised a new accelerator, the AMD Instinct MI350, will be formally announced in the second half of 2025.

As far as enterprise CTOs are concerned, the sooner, the better. To survive and thrive amid heavy competition, they’ll need an evolving array of next-generation technology. That will help them reduce their bottom lines even as they increase their product offerings—a kind of technological nirvana.

Do More:

Watch a related video: 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

AMD intros CPUs, accelerators, networking for end-to-end AI infrastructure -- and Supermicro supports

Featured content

AMD intros CPUs, accelerators, networking for end-to-end AI infrastructure -- and Supermicro supports

AMD expanded its end-to-end AI infrastructure products for data centers with new CPUs, accelerators and network controllers. And Supermicro is already offering supporting servers. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

AMD today held a roughly two-hour conference in San Francisco during which CEO Lisa Su and other executives introduced a new generation of server processors, the next model in the Instinct MI300 Accelerator family, and new data-center networking devices.

As CEO Su told the audience the live and online audience, AMD is committed to offering end-to-end AI infrastructure products and solutions in an open, partner-dependent ecosystem.

Su further explained that AMD’s new AI strategy has 4 main goals:

  • Become the leader in end-to-end AI
  • Create an open AI software platform of libraries and models
  • Co-innovate with partners including cloud providers, OEMs and software creators
  • Offer all the pieces needed for a total AI solution, all the way from chips to racks to clusters and even entire data centers.

And here’s a look at the new data-center hardware AMD announced today.

5th Gen AMD EPYC CPUs

The EPYC line, originally launched in 2017, has become a big success for AMD. As Su told the event audience, there are now more than 950 EPYC instances at the largest cloud providers; also, AMD hardware partners now offer EPYC processors on more than 350 platforms. Market share is up, too: Nearly one in three servers worldwide (34%) now run on EPYC, Su said.

The new EPYC processors, formerly codenamed Turin and now known as the AMD EPYC 9005 Series, are now available for data center, AI and cloud customers.

The new CPUs also have a new core architecture known as Zen5. AMD says Zen5 outperforms the previous Zen4 generation by 17% on enterprise instructions-per-clock and up to 37% on AI and HPC workloads.

The new 5th Gen line has over 25 SKUs, and core count ranges widely, from as few as 8 to as many as 192. For example, the new AMD EPYC 9575F is a 65-core, 5GHz CPU designed specifically for GPU-powered AI solutions.

AMD Instinct MI325X Accelerator

About a year ago, AMD introduced the Instinct MI300 Accelerators, and since then the company committed itself to introducing new models on a yearly cadence. Sure enough, today Lisa Su introduced the newest model, the AMD Instinct MI325X Accelerator.

Designed for Generative AI performance and built on the AMD CDNA3 architecture, the new accelerator offers up to 256GB of HBM3E memory, and bandwidth up to 6TB/sec.

Shipments of the MI325X are set to begin in this year’s fourth quarter. Partner systems with the new AMD accelerator are expected to start shipping in next year’s first quarter.

Su also mentioned the next model in the line, the AMD Instinct MI350, which will offer up to 288GB of HBM3E memory. It’s set to be formally announced in the second half of next year.

Networking Devices

Forrest Norrod, AMD’s head of data-center solutions, introduced two networking devices designed for data centers running AI workloads.

The AMD Pensando Salina DPU is designed for front-end connectivity. It supports thruput of up to 400 Gbps.

The AMD Pensando Pollara 400, designed for back-end networks connecting multiple GPUs, is the industry’s first Ultra-Ethernet Consortium-ready AI NIC.

Both parts are sampling with customers now, and AMD expects to start general shipments in next year’s first half.

Both devices are needed, Norrod said, because AI dramatically raises networking demands. He cited studies showing that connectivity currently accounts for 40% to 75% of the time needed to run certain AI training and inference models.

Supermicro Support

Supermicro is among the AMD partners already ready with systems based on the new AMD processors and accelerator.

Wasting no time, Supermicro today announced new H14 series servers, including both Hyper and FlexTwin systems, that support the 5th gen AMD 9005 EPYC processors and AMD Instinct MI325X Accelerators.

The Supermicro H14 family includes three systems for AI training and inference workloads. Supermicro says the systems can also accommodate the higher thermal requirements of the new AMD EPYC processors, which are rated at up to 500W. Liquid cooling is an option, too.

Do More:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

Featured content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

In the latest MLPerf testing, the AMD Instinct MI300X Accelerator with ROCm software stack beat the competition with strong GenAI inference performance. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

New benchmarks using the AMD Instinct MI300X Accelerator show impressive performance that surpasses the competition.

This is great news for customers operating demanding AI workloads, especially those underpinned by large language models (LLMs) that require super-low latency.

Initial platform tests using MLPerf Inference v4.1 measured AMD’s flagship accelerator against the Llama 2 70B benchmark. This test is an indication for real-world applications, including natural language processing (NLP) and large-scale inferencing.

MLPerf is the industry’s leading benchmarking suite for measuring the performance of machine learning and AI workloads from domains that include vision, speech and NLP. It offers a set of open-source AI benchmarks, including rigorous tests focused on Generative AI and LLMs.

Gaining high marks from the MLPerf Inference benchmarking suite represents a significant milestone for AMD. It positions the AMD Instinct MI300X accelerator as a go-to solution for enterprise-level AI workloads.

Superior Instincts

The results of the LLaMA2-70B test are particularly significant. That’s due to the benchmark’s ability to produce an apples-to-apples comparison of competitive solutions.

In this benchmark, the AMD Instinct MI300X was compared with NVIDIA’s H100 Tensor Core GPU. The test concluded that AMD’s full-stack inference platform was better than the H100 at achieving high-performance LLMs, a workload that requires both robust parallel computing and a well-optimized software stack.

The testing also showed that because the AMD Instinct MI300X offers the largest GPU memory available—192GB of HBM3 memory—it was able to fit the entire LLaMA2-70B model into memory. Doing so helped to avoid network overhead by preventing model splitting. This, in turn, maximized inference throughput, producing superior results.

Software also played a big part in the success of the AMD Instinct series. The AMD ROCm software platform accompanies the AMD Instinct MI300X. This open software stack includes programming models, tools, compilers, libraries and runtimes for AI solution development on the AMD Instinct MI300 accelerator series and other AMD GPUs.

The testing showed that the scaling efficiency from a single AMD Instinct MI300X, combined with the ROCm software stack, to a complement of eight AMD Instinct accelerators was nearly linear. In other words, the system’s performance improved proportionally by adding more GPUs.

That test demonstrated the AMD Instinct MI300X’s ability to handle the largest MLPerf inference models to date, containing over 70 billion parameters.

Thinking Inside the Box

Benchmarking the AMD Instinct MI300X required AMD to create a complete hardware platform capable of addressing strenuous AI workloads. For this task, AMD engineers chose as their testbed the Supermicro AS -8125GS-TNMR2, a massive 8U complete system.

Supermicro’s GPU A+ Client Systems are designed for both versatility and redundancy. Designers can outfit the system with an impressive array of hardware, starting with two AMD EPYC 9004-series processors and up to 6TB of ECC DDR5 main memory.

Because AI workloads consume massive amounts of storage, Supermicro has also outfitted this 8U server with 12 front hot-swap 2.5-inch NVMe drive bays. There’s also the option to add four more drives via an additional storage controller.

The Supermicro AS -8125GS-TNMR2 also includes room for two hot-swap 2.5-inch SATA bays and two M.2 drives, each with a capacity of up to 3.84TB.

Power for all those components is delivered courtesy of six 3,000-watt redundant titanium-level power supplies.

Coming Soon: Even More AI power

AMD engineers continually push the limits of silicon and human ingenuity to expand the capabilities of their hardware. So it should come as little surprise that new iterations of the AMD Instinct series are expected to be released in the coming months. This past May, AMD officials said they plan to introduce AMD Instinct MI325, MI350 and MI400 accelerators.

Forthcoming Instinct accelerators, AMD says, will deliver advances including additional memory, support for lower-precision data types, and increased compute power.

New features are also coming to the AMD ROCm software stack. Those changes should include software enhancements including kernel improvements and advanced quantization support.

Are you customers looking for a high-powered, low-latency system to run their most demanding HPC and AI workloads? Tell them about these benchmarks and the AMD Instinct MI300X accelerators.

Do More:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

Featured content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

The latest release of AMD’s free and open software stack for developing AI and HPC solutions delivers 5 important enhancements. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

If you develop AI and HPC solutions, you’ll want to know about the most recent release of AMD ROCm software, version 6.2.

ROCm, in case you’re unfamiliar with it, is AMD’s free and open software stack. It’s aimed at developers of artificial intelligence and high-performance computing (HPC) solutions on AMD Instinct accelerators. It's also great for developing AI and HPC solutions on AMD Instinct-powered servers from Supermicro. 

First introduced in 2016, ROCm open software now includes programming models, tools, compilers, libraries, runtimes and APIs for GPU programming.

ROCm version 6.2, announced recently by AMD, delivers 5 key enhancements:

  • Improved vLLM support 
  • Boosted memory efficiency & performance with Bitsandbytes
  • New Offline Installer Creator
  • New Omnitrace & Omniperf Profiler Tools (beta)
  • Broader FP8 support

Let’s look at each separately and in more detail.

LLM support

To enhance the efficiency and scalability of its Instinct accelerators, AMD is expanding vLLM support. vLLM is an easy-to-use library for the large language models (LLMs) that power Generative AI.

ROCm 6.2 lets AMD Instinct developers integrate vLLM into their AI pipelines. The benefits include improved performance and efficiency.

Bitsandbytes

Developers can now integrate Bitsandbytes with ROCm for AI model training and inference, reducing their memory and hardware requirements on AMD Instinct accelerators. 

Bitsandbytes is an open source Python library that enables LLMs while boosting memory efficiency and performance. AMD says this will let AI developers work with larger models on limited hardware, broadening access, saving costs and expanding opportunities for innovation.

Offline Installer Creator

The new ROCm Offline Installer Creator aims to simplify the installation process. This tool creates a single installer file that includes all necessary dependencies.

That makes deployment straightforward with a user-friendly GUI that allows easy selection of ROCm components and versions.

As the name implies, the Offline Installer Creator can be used on developer systems that lack internet access.

Omnitrace and Omniperf Profiler

The new Omnitrace and Omniperf Profiler Tools, both now in beta release, provide comprehensive performance analysis and a streamlined development workflow.

Omnitrace offers a holistic view of system performance across CPUs, GPUs, NICs and network fabrics. This helps developers ID and address bottlenecks.

Omniperf delivers detailed GPU kernel analysis for fine-tuning.

Together, these tools help to ensure efficient use of developer resources, leading to faster AI training, AI inference and HPC simulations.

FP8 Support

Broader FP8 support can improve the performance of AI inferencing.

FP8 is an 8-bit floating point format that provides a common, interchangeable format for both AI training and inference. It lets AI models operate and perform consistently across hardware platforms.

In ROCm, FP8 support improves the process of running AI models, particularly in inferencing. It does this by addressing key challenges such as the memory bottlenecks and high latency associated with higher-precision formats. In addition, FP8's reduced precision calculations can decrease the latency involved in data transfers and computations, losing little to no accuracy.  

ROCm 6.2 expands FP8 support across its ecosystem, from frameworks to libraries and more, enhancing performance and efficiency.

Do More:

Watch the related video podcast:

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

HBM: Your memory solution for AI & HPC

Featured content

HBM: Your memory solution for AI & HPC

High-bandwidth memory shortens the information commute to keep pace with today’s powerful GPUs.

Learn More about this topic
  • Applications:
  • Featured Technologies:

As AI powered by GPUs transforms computing, conventional DDR memory can’t keep up.

The solution? High-bandwidth memory (HBM).

HBM is memory chip technology that essentially shortens the information commute. It does this using ultra-wide communication lanes.

An HBM device contains vertically stacked memory chips. They’re interconnected by microscopic wires known as through-silicon vias, or TSVs for short.

HBM also provides more bandwidth per watt. And, with a smaller footprint, the technology can also save valuable data-center space.

Here’s how: A single HBM stack can contain up to eight DRAM modules, with each module connected by two channels. This makes an HBM implementation of just four chips roughly equivalent to 30 DDR modules, and in a fraction of the space.

All this makes HBM ideal for workloads that utilize AI and machine learning, HPC, advanced graphics and data analytics.

Latest & Greatest

The latest iteration, HBM3, was introduced in 2022, and it’s now finding wide application in market-ready systems.

Compared with the previous version, HBM3 adds several enhancements:

  • Higher bandwidth: Up to 819 GB/sec., up from HBM2’s max of 460 GB/sec.
  • More memory capacity: 24GB per stack, up from HBM2’s 8GB
  • Improved power efficiency: Delivering more data throughput per watt
  • Reduced form factor: Thanks to a more compact design

However, it’s not all sunshine and rainbows. For one, HBM-equipped systems are more expensive than those fitted out with traditional memory solutions.

Also, HBM stacks generate considerable heat. Advanced cooling systems are often needed, adding further complexity and cost.

Compatibility is yet another challenge. Systems must be designed or adapted to HBM3’s unique interface and form factor.

In the Market

As mentioned above, HBM3 is showing up in new products. That very definitely includes both the AMD Instinct MI300A and MI300X series accelerators.

The AMD Instinct MI300A accelerator combines a CPU and GPU for running HPC/AI workloads. It offers HBM3 as the dedicated memory with a unified capacity of up to 128GB.

Similarly, the AMD Instinct MI300X is a GPU-only accelerator designed for low-latency AI processing. It contains HBM3 as the dedicated memory, but with a higher capacity of up to 192GB.

For both of these AMD Instinct MI300 accelerators, the peak theoretical memory bandwidth is a speedy 5.3TB/sec.

The AMD Instinct MI300X is also the main processor in Supermicro’s AS -8125GS-TNMR2, an H13 8U 8-GPU system. This system offers a huge 1.5TB of HBM3 memory in single-server mode, and an even huger 6.144TB at rack scale.

Are your customers running AI with fast GPUs, only to have their systems held back by conventional memory? Tell them to check out HBM.

Do More:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

Featured content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

High latency is a data-center manager’s worst nightmare. Help is here from an open-source solution known as CXL. It works by maintaining “memory coherence” between the CPU’s memory and memory on attached devices.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Latency is a crucial measure for every data center. Because latency measures the time it takes for data to travel from one point in a system or network to another, lower is generally better. A network with high latency has slower response times—not good.

Fortunately, the industry has come up with an open-source solution that provides a low-latency link between processors, accelerators and memory devices such as RAM and SSD storage. It’s known as Compute Express Link, or CXL for short.

CXL is designed to solve a couple of common problems. Once a processor uses up the capacity of its direct-attached memory, it relies on an SSD. This introduces a three-order-of-magnitude latency gap that can hurt both performance and total cost of ownership (TCO).

Another problem is that multicore processors are starving for memory bandwidth. This has become an issue because processors have been scaling in terms of cores and frequencies faster than their main memory channels. The resulting deficit leads to suboptimal use of the additional processor cores, as the cores have to wait for data.

CXL overcomes these issues by introducing a low-latency, memory cache coherent interconnect. CXL works for processors, memory expansion and AI accelerators such as the AMD Instinct MI300 series. The interconnect provides more bandwidth and capacity to processors, which increases efficiency and enables data-center operators to get more value from their existing infrastructure.

Cache-coherence refers to IT architecture in which multiple processor cores share the same memory hierarchy, yet retain individual L1 caches. The CXL interconnect reduces latency and increases performance throughout the data center.

The latest iteration of CXL, version 3.1, adds features to help data centers keep up with high-performance computational workloads. Notable upgrades include new peer-to-peer direct memory access, enhancements to memory pooling, and CXL Fabric improvements.

3 Ways to CXL

Today, there are three main types of CXL devices:

  • Type 1: Any device without integrated local memory. CXL protocols enable these devices to communicate and transfer memory capacity from the host processor.
  • Type 2: These devices include integrated memory, but also share CPU memory. They leverage CXL to enable coherent memory-sharing between the CPU and the CXL device.
  • Type 3: A class of devices designed to augment existing CPU memory. CXL enables the CPU to access external sources for increased bandwidth and reduced latency.

Hardware Support

As data-center architectures evolve, more hardware manufacturers are supporting CXL devices. One such example is Supermicro’s All-Flash EDSFF and NVM3 servers.

Supermicro’s cutting-edge appliances are optimized for resource-intensive workloads, including data-center infrastructure, data warehousing, hyperscale/hyperconverged and software-defined storage. To facilitate these workloads, Supermicro has included support for up to eight CXL 2.0 devices for advanced memory-pool sharing.

Of course, CXL can be utilized only on server platforms designed to support communication between the CPU, memory and CXL devices. That’s why CXL is built into the 4th gen AMD EPYC server processors.

These AMD EPYC processors include up to 96 ‘Zen 4’ 5nm cores. Each core includes 32MB per CCD of L3 cache, as well as up to 12 DDR5 channels supporting as much as 12TB of memory.

CXL memory expansion is built into the AMD EPYC platform. That makes these CPUs ideally suited for advanced AI and GenAI workloads.

Crucially, AMD also includes 256-bit AES-XTS and secure multikey encryption. This enables hypervisors to encrypt address space ranges on CXL-attached memory.

The Near Future of CXL

Like many add-on devices, CXL devices are often connected via the PCI Express (PCIe) bus. However, implementing CXL over PCIe 5.0 in large data centers has some drawbacks.

Chief among them is the way its memory pools remain isolated from each other. This adds latency and hampers significant resource-sharing.

The next generation of PCIe, version 6.0, is coming soon and will offer a solution. CXL for PCIe6.0 will offer twice as much throughput as PCIe 5.0.

The new PCIe standard will also add new memory-sharing functionality within the transaction layer. This will help reduce system latency and improve accelerator performance.

CXL is also leading to the start of disaggregated computing. There, resources that reside in different physical enclosures can be available to several applications.

Are your customers suffering from too much latency? The solution could be CXL.

Do More:

 

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Meet AMD's new Alveo V80 Compute Accelerator Card

Featured content

Meet AMD's new Alveo V80 Compute Accelerator Card

AMD’s new Alveo V80 Compute Accelerator Card has been designed to overcome performance bottlenecks in compute-intensive workloads that include HPC, data analytics and network security.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Are you or your customers looking for an accelerator for memory-bound applications with large data sets that require FPGA hardware adaptability? If so, then check out the new AMD Alveo V80 Compute Accelerator Card.

It was introduced by AMD at ISC High Performance 2024, an event held recently in Hamburg, Germany.

The thinking behind the new component is that for large-scale data processing, raw computational power is only half the equation. You also need lots of memory bandwidth.

Indeed, AMD’s new hardware adaptable accelerator is purpose-built to overcome performance bottlenecks for compute-intensive workloads with large data sets common to HPC, data analytics and network security applications. It’s powered by AMD’s 7nm Versal HBM Series adaptive system-on-chip (SoC).

Substantial gains

AMD says that compared with the previous-generation Alveo U55C, the new Alveo V80 offers up to 2x the memory bandwidth, 2x the PCIe bandwidth, 2x the logic density, and 4x the network bandwidth (820GB/sec.).

The card also features 4x200G networking, PCIe Gen4 and Gen5 interfaces, and DDR4 DIMM slots for memory expansion.

Appropriate workloads for the new AMD Alveo V80 include HPC, data analytics, FinTech/Blockchain, network security, computational storage, and AI compute.

In addition, the AMD Alveo V80 can scale to hundreds of nodes over Ethernet, creating compute clusters for HPC applications that include genomic sequencing, molecular dynamics and sensor processing.

Developers, too

A production board in a PCIe form factor, the AMD Alveo V80 is designed to offer a faster path to production than designing your own PCIe card.

Indeed, for FPGA developers, the V80 is fully enabled for traditional development via the Alveo Versal Example Design (AVED), which is available on Github.

This example design provides an efficient starting point using a pre-built subsystem implemented on the AMD Versal adaptive SoC. More specifically, it targets the new AMD Alveo V80 accelerator.

Supermicro offering

The new AMD accelerator is already shipping in volume, and you can get it from either AMD or an authorized distributor.

In addition, you can get the Alveo V80 already integrated into a partner-provided server.

Supermicro is integrating the new AMD Alveo V80 with its AMD EPYC processor-powered A+ servers. These include the Supermicro AS-4125GS-TNRT, a compact 4U server for deployments where compute density and memory bandwidth are critical.

Early user

AMD says one early customer for the new accelerator card is the Commonwealth Scientific Industrial Research Organization (CSIRO), the national research organization of Australia.

CSIRO plans to upgrade an older setup with 420 previous-generation AMD Alveo U55C accelerator cards, replacing them with the new Alveo V80.

 Because the new part is so much more powerful than its predecessor, the organization expects to reduce the number of cards it needs by two-thirds. That, in turn, should shrink the data-center footprint required and lower system costs.

If those sound like benefits you and your customers would find attractive, check out the AMD Alveo V80 links below.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Supermicro, Vast collaborate to deliver turnkey AI storage at rack scale

Featured content

Supermicro, Vast collaborate to deliver turnkey AI storage at rack scale

Supermicro and Vast Data are jointly offering an AMD-based turnkey solution that promises to simplify and accelerate AI and data pipelines.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro and Vast Data are collaborating to deliver a turnkey, full-stack solution for creating and expanding AI deployments.

This joint solution is aimed at hyperscalers, cloud service providers (CSPs) and large, data-centric enterprises in fintech, adtech, media and entertainment, chip design and high-performance computing (HPC).

Applications that can benefit from the new joint offering include enterprise NAS and object storage; high-performance data ingestion; supercomputer data access; scalable data analysis; and scalable data processing.

Vast, founded in 2016, offers a software data platform that enterprises and CSPs use for data-intensive computing. The platform is based on a distributed systems architecture, called DASE, that allows a system to run read and write operations at any scale. Vast’s customers include Pixar, Verizon and Zoom.

By collaborating with Supermicro, Vast hopes to extend its market. Currently, Vast sells to infrastructure providers at a variety of scales. Some of its largest customers have built 400 petabyte storage systems, and a few are even discussing systems that would store up to 2 exabytes, according to John Mao, Vast’s VP of technology alliances.

Supermicro and Vast have engaged with many of the same CSPs separately, supporting various parts of the solution. By formalizing this collaboration, they hope to extend their reach to new customers while increasing their sell-through to current customers.

Vast is also looking to the Supermicro alliance to expand its global reach. While most of Vast’s customers today are U.S.-based, Supermicro operates in over 100 countries worldwide. Supermicro also has the infrastructure to integrate, test and ship 5,000 fully populated racks per month from its manufacturing plants in California, Netherlands, Malaysia and Taiwan.

There’s also a big difference in size. Where privately held Vast has about 800 employees, publicly traded Supermicro has more than 5,100.

Rack solution

Now Vast and Supermicro have developed a new converged system using Supermicro’s Hyper A+ servers with AMD EPYC 9004 processors. The solution combines 2 separate Vast servers. 

This converged system is well suited to large service providers, where the typical Supermicro-powered Vast rack configuration will start at about 2PB, Mao adds.

Rack-scale configurations can cut costs by eliminating the need for single-box redundancy. This converged design makes the system more scalable and more cost-efficient.

Under the hood

One highlight of the joint project: It puts Vast’s DASE architecture on Supermicro’s  industry-standard servers. Each server will have both the compute and storage functions of a Vast cluster.

At the same time, the architecture is disaggregated via a high-speed Ethernet NVMe fabric. This allows each node to access all drives in the cluster.

The Vast platform architecture uses a series of what the company calls an EBox. Each EBox, in turn, contains 2 kinds of storage servers in a container environment: CNode (short for Compute Node) and DNode (short for Data Node). In a typical EBox, one CNode interfaces with client applications and writes directly to two DNode containers.

In this configuration, Supermicro’s storage servers can act as a hardware building block to scale Vast to hundreds of petabytes. It supports Vast’s requirement for multiple tiers of solid-state storage media, an approach that’s unique in the industry.

CPU to GPU

At the NAB Show, held recently in Las Vegas, Supermicro’s demos included storage servers, each powered by a single-socket AMD EPYC 9004 Series processor.

With up to 128 PCIe Gen 5 lanes, the AMD processor empowers the server to connect more SSDs via NVMe with a single CPU. The Supermicro storage server also lets users move data directly from storage to GPU memory supporting Nvidia’s GPU Direct storage protocol, essentially bypassing a GPU cluster’s CPU using RDMA.

If you or your customers are interested in the new Vast solution, get in touch with your local Supermicro sales rep or channel partner. Under the terms of the new partnership, Supermicro is acting as a Vast integrator and OEM. It’s also Vast’s only rack-scale partner.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

AMD and Supermicro: Pioneering AI Solutions

Featured content

AMD and Supermicro: Pioneering AI Solutions

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Bringing AMD Instinct to the Forefront

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions. Their shared vision? To enable organizations to unlock the full potential of AI workloads, from training massive language models to accelerating complex simulations.

The AMD Instinct MI300 Series: Changing The AI Acceleration Paradigm

At the heart of this collaboration lies the AMD Instinct MI300 Series—a family of accelerators designed to redefine performance boundaries. These accelerators combine high-performance AMD EPYC™ 9004 series CPUs with the powerful AMD InstinctTM MI300X GPU accelerators and 192GB of HBM3 memory, creating a formidable force for AI, HPC, and technical computing.

Supermicro’s H13 Generation of GPU Servers

Supermicro’s H13 generation of GPU Servers serves as the canvas for this technological masterpiece. Optimized for leading-edge performance and efficiency, these servers integrate seamlessly with the AMD Instinct MI300 Series. Let’s explore the highlights:

8-GPU Systems for Large-Scale AI Training:

  • Supermicro’s 8-GPU servers, equipped with the AMD Instinct MI300X OAM accelerator, offer raw acceleration power. The AMD Infinity Fabric™ Links enable up to 896GB/s of peak theoretical P2P I/O bandwidth, while the 1.5TB HBM3 GPU memory fuels large-scale AI models.
  • These servers are ideal for LLM Inference and training language models with trillions of parameters, minimizing training time and inference latency, lowering the TCO and maximizing throughput.

Benchmarking Excellence

But what about real-world performance? Fear not! Supermicro’s ongoing testing and benchmarking efforts have yielded remarkable results. The continued engagement between AMD and Supermicro performance teams enabled Supermicro to test pre-release ROCm versions with the latest performance optimizations and publicly released optimization like Flash Attention 2 and vLLM. The Supermicro AMD-based system AS -8125GS-TNMR2 showcases AI inference prowess, especially on models like Llama-2 70B, Llama-2 13B, and Bloom 176B. The performance? Equal to or better than AMD’s published results from the Dec. 6 Advancing AI event.

Image - Blog - AMD and Supermicro Pioneering AI Solutions

Charles Liang’s Vision

In the words of Charles Liang, President and CEO of Supermicro:

“We are very excited to expand our rack scale Total IT Solutions for AI training with the latest generation of AMD Instinct accelerators. Our proven architecture allows for fully integrated liquid cooling solutions, giving customers a competitive advantage.”

Conclusion

The AMD-Supermicro partnership isn’t just about hardware and software stacks; it’s about pushing boundaries, accelerating breakthroughs, and shaping the future of AI. So, as we raise our virtual glasses, let’s toast to innovation, collaboration, and the relentless pursuit of performance and excellence.

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Pages