Performance Intensive Computing

Capture the full potential of IT

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

Featured content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

In the latest MLPerf testing, the AMD Instinct MI300X Accelerator with ROCm software stack beat the competition with strong GenAI inference performance.

Applications:
Featured Technologies:

New benchmarks using the AMD Instinct MI300X Accelerator show impressive performance that surpasses the competition.

This is great news for customers operating demanding AI workloads, especially those underpinned by large language models (LLMs) that require super-low latency.

Initial platform tests using MLPerf Inference v4.1 measured AMD’s flagship accelerator against the Llama 2 70B benchmark. This test is an indication for real-world applications, including natural language processing (NLP) and large-scale inferencing.

MLPerf is the industry’s leading benchmarking suite for measuring the performance of machine learning and AI workloads from domains that include vision, speech and NLP. It offers a set of open-source AI benchmarks, including rigorous tests focused on Generative AI and LLMs.

Gaining high marks from the MLPerf Inference benchmarking suite represents a significant milestone for AMD. It positions the AMD Instinct MI300X accelerator as a go-to solution for enterprise-level AI workloads.

Superior Instincts

The results of the LLaMA2-70B test are particularly significant. That’s due to the benchmark’s ability to produce an apples-to-apples comparison of competitive solutions.

In this benchmark, the AMD Instinct MI300X was compared with NVIDIA’s H100 Tensor Core GPU. The test concluded that AMD’s full-stack inference platform was better than the H100 at achieving high-performance LLMs, a workload that requires both robust parallel computing and a well-optimized software stack.

The testing also showed that because the AMD Instinct MI300X offers the largest GPU memory available—192GB of HBM3 memory—it was able to fit the entire LLaMA2-70B model into memory. Doing so helped to avoid network overhead by preventing model splitting. This, in turn, maximized inference throughput, producing superior results.

Software also played a big part in the success of the AMD Instinct series. The AMD ROCm software platform accompanies the AMD Instinct MI300X. This open software stack includes programming models, tools, compilers, libraries and runtimes for AI solution development on the AMD Instinct MI300 accelerator series and other AMD GPUs.

The testing showed that the scaling efficiency from a single AMD Instinct MI300X, combined with the ROCm software stack, to a complement of eight AMD Instinct accelerators was nearly linear. In other words, the system’s performance improved proportionally by adding more GPUs.

That test demonstrated the AMD Instinct MI300X’s ability to handle the largest MLPerf inference models to date, containing over 70 billion parameters.

Thinking Inside the Box

Benchmarking the AMD Instinct MI300X required AMD to create a complete hardware platform capable of addressing strenuous AI workloads. For this task, AMD engineers chose as their testbed the Supermicro AS -8125GS-TNMR2, a massive 8U complete system.

Supermicro’s GPU A+ Client Systems are designed for both versatility and redundancy. Designers can outfit the system with an impressive array of hardware, starting with two AMD EPYC 9004-series processors and up to 6TB of ECC DDR5 main memory.

Because AI workloads consume massive amounts of storage, Supermicro has also outfitted this 8U server with 12 front hot-swap 2.5-inch NVMe drive bays. There’s also the option to add four more drives via an additional storage controller.

The Supermicro AS -8125GS-TNMR2 also includes room for two hot-swap 2.5-inch SATA bays and two M.2 drives, each with a capacity of up to 3.84TB.

Power for all those components is delivered courtesy of six 3,000-watt redundant titanium-level power supplies.

Coming Soon: Even More AI power

AMD engineers continually push the limits of silicon and human ingenuity to expand the capabilities of their hardware. So it should come as little surprise that new iterations of the AMD Instinct series are expected to be released in the coming months. This past May, AMD officials said they plan to introduce AMD Instinct MI325, MI350 and MI400 accelerators.

Forthcoming Instinct accelerators, AMD says, will deliver advances including additional memory, support for lower-precision data types, and increased compute power.

New features are also coming to the AMD ROCm software stack. Those changes should include software enhancements including kernel improvements and advanced quantization support.

Are you customers looking for a high-powered, low-latency system to run their most demanding HPC and AI workloads? Tell them about these benchmarks and the AMD Instinct MI300X accelerators.

Do More:

Check out the AMD Engineering Insight: Unveiling MLPerf results on AMD Instinct MI300X accelerators

Dig into the AMD Instinct MI300X accelerators

Check the specs on Supermicro’s 8U server powered by AMD Instinct MI300X accelerators

Read a product brief: Supermicro and AMD deliver rack scale AI and HPC solutions with the new AMD Instinct MI300 series accelerators

Featured videos

Events

Find AMD & Supermicro Elsewhere

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

Featured content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

The latest release of AMD’s free and open software stack for developing AI and HPC solutions delivers 5 important enhancements.

Applications:
Featured Technologies:

If you develop AI and HPC solutions, you’ll want to know about the most recent release of AMD ROCm software, version 6.2.

ROCm, in case you’re unfamiliar with it, is AMD’s free and open software stack. It’s aimed at developers of artificial intelligence and high-performance computing (HPC) solutions on AMD Instinct accelerators. It's also great for developing AI and HPC solutions on AMD Instinct-powered servers from Supermicro.

First introduced in 2016, ROCm open software now includes programming models, tools, compilers, libraries, runtimes and APIs for GPU programming.

ROCm version 6.2, announced recently by AMD, delivers 5 key enhancements:

Improved vLLM support
Boosted memory efficiency & performance with Bitsandbytes
New Offline Installer Creator
New Omnitrace & Omniperf Profiler Tools (beta)
Broader FP8 support

Let’s look at each separately and in more detail.

LLM support

To enhance the efficiency and scalability of its Instinct accelerators, AMD is expanding vLLM support. vLLM is an easy-to-use library for the large language models (LLMs) that power Generative AI.

ROCm 6.2 lets AMD Instinct developers integrate vLLM into their AI pipelines. The benefits include improved performance and efficiency.

Bitsandbytes

Developers can now integrate Bitsandbytes with ROCm for AI model training and inference, reducing their memory and hardware requirements on AMD Instinct accelerators.

Bitsandbytes is an open source Python library that enables LLMs while boosting memory efficiency and performance. AMD says this will let AI developers work with larger models on limited hardware, broadening access, saving costs and expanding opportunities for innovation.

Offline Installer Creator

The new ROCm Offline Installer Creator aims to simplify the installation process. This tool creates a single installer file that includes all necessary dependencies.

That makes deployment straightforward with a user-friendly GUI that allows easy selection of ROCm components and versions.

As the name implies, the Offline Installer Creator can be used on developer systems that lack internet access.

Omnitrace and Omniperf Profiler

The new Omnitrace and Omniperf Profiler Tools, both now in beta release, provide comprehensive performance analysis and a streamlined development workflow.

Omnitrace offers a holistic view of system performance across CPUs, GPUs, NICs and network fabrics. This helps developers ID and address bottlenecks.

Omniperf delivers detailed GPU kernel analysis for fine-tuning.

Together, these tools help to ensure efficient use of developer resources, leading to faster AI training, AI inference and HPC simulations.

FP8 Support

Broader FP8 support can improve the performance of AI inferencing.

FP8 is an 8-bit floating point format that provides a common, interchangeable format for both AI training and inference. It lets AI models operate and perform consistently across hardware platforms.

In ROCm, FP8 support improves the process of running AI models, particularly in inferencing. It does this by addressing key challenges such as the memory bottlenecks and high latency associated with higher-precision formats. In addition, FP8's reduced precision calculations can decrease the latency involved in data transfers and computations, losing little to no accuracy.

ROCm 6.2 expands FP8 support across its ecosystem, from frameworks to libraries and more, enhancing performance and efficiency.

Do More:

Read the AMD ROCm 6.2 release notes

Browse and download AMD ROCm documentation

Check out Supermicro systems powered by AMD Instinct MI300 accelerators, which can be developed for using ROCm

Watch the related video podcast:

Featured videos

Events

Find AMD & Supermicro Elsewhere

Why CSPs Need Hyperscaling

Featured content

Why CSPs Need Hyperscaling

Today’s cloud service providers need IT infrastructures that can scale like never before.

Applications:
Featured Technologies:

Hyperscaling IT infrastructure may be one of the toughest challenges facing cloud service providers (CSPs) today.

The term hyperscale refers to an IT architecture’s ability to scale in response to increased demand.

Hyperscaling is tricky, in large part because demand is a constantly moving target. Without much warning, a data center’s IT demand can increase exponentially due to a myriad of factors.

That could mean a public emergency, the failure of another CSP’s infrastructure, or simply the rampant proliferation of data—a common feature of today’s AI environment.

To meet this growing demand, CSPs have a lot to manage. That includes storage measured in exabytes, AI workloads of massive complexity, and whatever hardware is needed to keep system uptime as close to 100% as possible.

The hardware alone can be a real challenge. CSPs now oversee both air- and liquid-powered cooling systems, redundant power sources, diverse networking gear, and miles of copper and fiber-optic cabling. It’s a real handful.

Design with CSPs in Mind

To help CSPs cope with this seemingly overwhelming complexity, Supermicro offers purpose-built hardware designed to tackle the world’s most demanding workloads.

Enterprise-class servers like Supermicro’s H13 and A+ server series offer CSPs powerful platforms built to handle the rigors of resource-intensive AI workloads. They’ve been designed to scale quickly and efficiently as demand and data inevitably increase.

Take the Supermicro GrandTwin. This innovative solution puts the power and flexibility of multiple independent servers in a single enclosure.

The design helps lower operating expenses by enabling shared resources, including a space-saving 2U enclosure, heavy-duty cooling system, backplane and N+1 power supplies.

To help CSPs tackle the world’s most demanding AI workloads, Supermicro offers GPU server systems. These include a massive—and massively powerful—8U eight-GPU server.

Supermicro H13 GPU servers are powered by 4th-generation AMD EPYC processors. These cutting-edge chips are engineered to help high-end applications perform better and return faster.

To make good on those lofty promises, AMD included more and faster cores, higher bandwidth to GPUs and other devices, and the ability to address vast amounts of memory.

Theory Put to Practice

Capable and reliable hardware is a vital component for every modern CSP, but it’s not the only one. IT infrastructure architects must consider not just their present data center requirements but how to build a bridge to the requirements they’ll face tomorrow.

To help build that bridge, Supermicro offers an invaluable list: 10 essential steps for scaling the CSP data center.

A few highlights include:

Standardize and scale: Supermicro suggests CSPs standardize around a preferred configuration that offers the best compute, storage and networking capabilities.

Plan ahead for support: To operate a sophisticated data center 24/7 is to embrace the inevitability of technical issues. IT managers can minimize disruption and downtime when some-thing goes wrong by choosing a support partner who can solve problems quickly and efficiently.

Simplify your supply chain: Hyperscaling means maintaining the ability to move new infra-structure into place fast and without disruption. CSPs can stack the odds in their favor by choosing a partner that is ever ready to deliver solutions that are integrated, validated, and ready to work on day one.

Do More:

Hyperscaling for CSPs will be the focus of a session at the upcoming Supermicro Open Storage Summit ‘24, which streams live Aug. 13 - Aug. 29.

The CSP session, set for Aug. 20, will cover the ways in which CSPs can seamlessly scale their AI operations across thousands of GPUs while ensuring industry-leading reliability, security and compliance capabilities. The speakers will feature representatives from Supermicro, AMD, Vast Data and Solidigm.

Learn more and register now to attend the 2024 Supermicro Open Storage Summit.

Featured videos

Events

Find AMD & Supermicro Elsewhere

You’re invited to attend the Supermicro Open Storage Summit ‘24

Featured content

You’re invited to attend the Supermicro Open Storage Summit ‘24

Join this free online event being held August 13 – 29.

Applications:
Featured Technologies:

Into storage? Then learn about the latest storage innovations at the Supermicro Open Storage Summit ’24. It’s an online event happening over three weeks, August 13 – 29. And it’s free to attend.

The theme of this year’s summit is “enabling software-defined storage from enterprise to AI.” Sessions are aimed at anyone involved with data storage, whether you’re a CIO, IT support professional, or anything in between.

The Supermicro Open Storage Summit ’24 will bring together executives and technical experts from the entire software-defined storage ecosystem. They’ll talk about the latest developments enabling storage solutions.

Each session will feature Supermicro product experts along with leaders from both hardware and software suppliers. Together, these companies give a boost to the software-defined storage solution ecosystem.

Seven Sessions

This year’s Open Storage Summit will feature seven sessions. They’ll cover topics and use cases that include storage for AI, CXL, storage architectures and much more.

Hosting and moderating duties will be filled by Rob Strechay, managing director and principal analyst at theCUBE Research. His company provides IT leaders with competitive intelligence, market analysis and trend tracking.

All the Storage Summit sessions will start at 10 a.m. PDT / 1 p.m. EDT and run for 45 minutes. All sessions will also be available for on-demand viewing later. But by attending a live session, you’ll be able to participate in the X-powered Q&A with the speakers.

What’s On Tap

What can you expect? To give you an idea, here are a few of the scheduled sessions:

Aug. 14: AI and the Future of Media Storage Workflows: Innovations for the Entertainment Industry

Whether it’s movies, TV, or corporate videos, the post-production process including editing, special effects, coloring, and distribution requires both high-performance and large-capacity solutions. In this session, Supermicro, Quantum, AMD and Western Digital will discuss how primary and secondary storage is optimized for post-production workflows.

Aug. 20: Hyperscale AI: Secure Data Services for CSPs

Cloud services providers must seamlessly scale their AI operations across thousands of GPUs, while ensuring industry-leading reliability, security, and compliance capabilities. Speakers from Supermicro, AMD, VAST Data, and Solidigm will explain how CSPs can deploy AI models at an unprecedented scale with confidence and security.

There’s a whole lot more, too. Learn more about the Supermicro Open Storage Summit ’24 and register to attend now.

Featured videos

Events

Find AMD & Supermicro Elsewhere

Tech Explainer: What is multi-tenant storage?

Featured content

Tech Explainer: What is multi-tenant storage?

Similar to the way an apartment building lets tenants share heat, hot water and other services, multitenancy lets users share storage resources for fast development and low costs.

Applications:
Featured Technologies:

Multi-tenant storage—also referred to as multitenancy—helps organizations develop applications faster and more efficiently.

It does this by enabling multiple users to both share the resources of a centralized storage architecture and customize their storage environments without affecting the others.

You can think of multi-tenant storage as being like an apartment building. The building’s tenants share a common infrastructure and related services, such as heat, hot water and electricity. Yet each tenant can also set up their individual apartment to suit their unique needs.

When it comes to data storage, leveraging a multi-tenant approach also helps lower each user’s overhead costs. It does this by distributing maintenance fees across all users. Also, tenants can share applications, security features and other infrastructure.

Multitenancy for Cloud, SaaS, AI

Chances are, your customers are already using multi-tenant storage architecture to their advantage. Public cloud platforms such as Microsoft Azure, Amazon Web Services and Google Cloud all serve multiple tenants from a shared infrastructure.

Popular SaaS providers including Dropbox also employ multitenancy to offer millions of customers a unique experience based on a common user interface. Each user’s data store is available to them only, despite its being kept in a common data warehouse.

AI-related workloads will become increasingly common in multi-tenant environments, too. That includes the use of large language models (LLMs) to enable Generative AI. Also, certain AI and ML workloads may be more effective in situations in which they feed—and are fed by—multiple tenants.

In addition, all users in a multitenancy environment can contribute data for AI training, which requires enormous quantities of data. And because each tenant creates a unique data set, this process may offer a wider array of training data more efficiently compared to a single source.

What’s more, data flowing in the other direction—from the AI model to each tenant—also increases efficiency. By sharing a common AI application, tenants gain access to a larger, more sophisticated resource than they would with single tenancy.

Choosing the Right Solution

Whether your customers opt for single tenant, multi-tenant or a combination of the two, they must deploy hardware that can withstand rigorous workloads.

Supermicro’s ASG-1115S–NE3X12R storage server is just such a storage solution. This system offers eight front hot-swap E3.S 1T PCIe 5.0 x4 NVMe drive bays; four front fixed E3.S 2T PCIe 5.0 x8 CXL Type 3 drive bays; and two M.2 NVMe slots.

Processing gets handled by a single AMD EPYC 9004-series CPU. It offers up to 128 cores and 6TB of ECC DDR5 main memory.

Considering the Supermicro storage server’s 12 drives, eight heavy-duty fans and 1600W redundant Titanium Level power supply, you might assume that it takes up a lot of rack space. But no. Astonishingly, the entire system is housed in a single 1U chassis.

Do More:

Learn about and register to attend the Supermicro Open Storage Summit ‘24, streaming live Aug. 13 - 29.

Download the data sheet: Supermicro ASG-1115S-NE3X12R storage server powered by an AMD EPYC 9004 series processor

Featured videos

Events

Find AMD & Supermicro Elsewhere

Research Roundup: AI boosts project management & supply chains, HR woes, SMB supplier overload

Featured content

Research Roundup: AI boosts project management & supply chains, HR woes, SMB supplier overload

Catch up on the latest IT market intelligence from leading researchers.

Applications:
Featured Technologies:

Artificial intelligence is boosting both project management and supply chains. Cybersecurity spending is on a tear. And small and midsize businesses are struggling with more suppliers than employees.

That’s some of the latest IT intelligence from leading industry watchers. And here’s your research roundup.

AI for PM

What’s artificial intelligence good for? One area is project management.

In a new survey, nearly two-thirds of project managers (63%) reported improved productivity and efficiency with AI integration.

The survey was conducted by Capterra, an online marketplace for software and services. As part of a larger survey, the company polled 2,500 project managers in 12 countries.

Nearly half the respondents (46%) said they use AI in their project management tools. Capterra then dug in deeper with this second group—totaling 1,153 project managers—to learn what kinds of benefits they’re enjoying with AI.

Among the findings:

Over half the AI-using project managers (54%) said they use the technology for risk management. That’s the top use case reported.
Project managers plan to increase their AI spending by an average of 36%.
Nine in 10 project managers (90%) said their AI investments earned a positive return in the last 12 months.
Improved productivity as a result of using AI was reported by nearly two-thirds of the respondents (63%).
Looking ahead, respondents expect the areas of greatest impact from AI to be task automation, predictive analytics and project planning.

AI for Supply Chains, Too

A new report from consulting firm Accenture finds that the most mature supply chains are 23% more profitable than others. These supply-chain leaders are also six times more likely than others to use AI and Generative AI widely.

To figure this out, Accenture analyzed nearly 1,150 companies in 15 countries and 10 industries. Accenture then identified the 10% of companies that scored highest on its supply-chain maturity scale.

This scale was based on the degree to which an organization uses GenAI, advanced machine learning and other new technologies for autonomous decision-making, advanced simulations and continuous improvement. The more an organization does this, the higher was their score.

Accenture also found that supply-chain leaders achieved an average profit margin of 11.8%, compared with an average margin of 9.6% among the others. (That’s the 23% profit gain mentioned earlier.) The leaders also delivered 15% better returns to shareholders: 8.5% vs. 7.4% for others.

HR: Help Wanted

If solving customer pain points is high on your agenda—and it should be—then here’s a new pain point to consider: Fewer than 1 in 4 human relations functions say they’re getting full business value from their HR technology.

In other words, something like 75% of HR executives could use some IT help. That’s a lot of business.

The assessment comes from research and analysis firm Gartner, based on its survey of 85 HR leaders conducted earlier this year. Among Gartner’s findings:

Only about 1 in 3 HR executives (35%) feel confident that their approach to HR technology helps to achieve their organization’s business objectives.
Two out of three HR executives believe their HR function’s effectiveness will be hurt if they don’t improve their technology.

Employees are unhappy with HR technology, too. Earlier this year, Gartner also surveyed more than 1,200 employees. Nearly 7 in 10 reported experiencing at least one barrier when interacting with HR technology over the previous 12 months.

Cybersecurity’s Big Spend

Looking for a growth market? Don’t overlook cybersecurity.

Last year, worldwide spending on cybersecurity products totaled $106.8 billion. That’s a lot of money. But event better, it marked a 15% increase over the previous year’s spending, according to market watcher IDC.

Looking ahead, IDC expects this double-digit growth rate to continue for at least the next five years. By 2028, IDC predicts, worldwide spending on cybersecurity products will reach $200 billion—nearly double what was spent in 2023.

By category, the biggest cybersecurity spending last year went to network security: $27.4 billion. After that came endpoint security ($21.6 billion last year) and security analytics ($20 billion), IDC says.

Why such strong spending? In part because cybersecurity is now a board-level topic.

“Cyber risk,” says Frank Dickson, head of IDC’s security and trust research, “is business risk.”

SMBs: Too Many Suppliers

It’s not easy standing out as a supplier of small and midsize business customers. A new survey finds the average SMB has nine times more suppliers than it does employees—and actually uses only about 1 in 4 of those suppliers.

The survey, conducted by spend-management system supplier Spendesk, focused on customers in Europe. (Which makes sense, as Spendesk is headquartered in Paris.) Spendesk examined 4.7 million suppliers used by a sample of its 5,000 customers in the UK, France, Germany and Spain.

Keeping many suppliers while using only a few of them? That’s not only inefficient, but also costly. Spendesk estimates that its SMB customers could be collectively losing some $1.24 billion in wasted time and management costs.

And there’s more at stake, too. A recent study by management consultants McKinsey & Co. finds that small and midsize organizations—those with anywhere from 1 to 200 employees—are actually big business.

By McKinsey’s reckoning, SMBs account for more than 90% of all businesses by number … roughly half the global GDP … and more than two-thirds of all business jobs.

Fun fact: Nearly 1 in 5 of the largest businesses originally started as small businesses.

Do More:

Explore AI solutions from AMD

Browse Supermicro’s AI systems for enterprise workloads

Featured videos

Events

Find AMD & Supermicro Elsewhere

HBM: Your memory solution for AI & HPC

Featured content

HBM: Your memory solution for AI & HPC

High-bandwidth memory shortens the information commute to keep pace with today’s powerful GPUs.

Applications:
Featured Technologies:

As AI powered by GPUs transforms computing, conventional DDR memory can’t keep up.

The solution? High-bandwidth memory (HBM).

HBM is memory chip technology that essentially shortens the information commute. It does this using ultra-wide communication lanes.

An HBM device contains vertically stacked memory chips. They’re interconnected by microscopic wires known as through-silicon vias, or TSVs for short.

HBM also provides more bandwidth per watt. And, with a smaller footprint, the technology can also save valuable data-center space.

Here’s how: A single HBM stack can contain up to eight DRAM modules, with each module connected by two channels. This makes an HBM implementation of just four chips roughly equivalent to 30 DDR modules, and in a fraction of the space.

All this makes HBM ideal for workloads that utilize AI and machine learning, HPC, advanced graphics and data analytics.

Latest & Greatest

The latest iteration, HBM3, was introduced in 2022, and it’s now finding wide application in market-ready systems.

Compared with the previous version, HBM3 adds several enhancements:

Higher bandwidth: Up to 819 GB/sec., up from HBM2’s max of 460 GB/sec.
More memory capacity: 24GB per stack, up from HBM2’s 8GB
Improved power efficiency: Delivering more data throughput per watt
Reduced form factor: Thanks to a more compact design

However, it’s not all sunshine and rainbows. For one, HBM-equipped systems are more expensive than those fitted out with traditional memory solutions.

Also, HBM stacks generate considerable heat. Advanced cooling systems are often needed, adding further complexity and cost.

Compatibility is yet another challenge. Systems must be designed or adapted to HBM3’s unique interface and form factor.

In the Market

As mentioned above, HBM3 is showing up in new products. That very definitely includes both the AMD Instinct MI300A and MI300X series accelerators.

The AMD Instinct MI300A accelerator combines a CPU and GPU for running HPC/AI workloads. It offers HBM3 as the dedicated memory with a unified capacity of up to 128GB.

Similarly, the AMD Instinct MI300X is a GPU-only accelerator designed for low-latency AI processing. It contains HBM3 as the dedicated memory, but with a higher capacity of up to 192GB.

For both of these AMD Instinct MI300 accelerators, the peak theoretical memory bandwidth is a speedy 5.3TB/sec.

The AMD Instinct MI300X is also the main processor in Supermicro’s AS -8125GS-TNMR2, an H13 8U 8-GPU system. This system offers a huge 1.5TB of HBM3 memory in single-server mode, and an even huger 6.144TB at rack scale.

Are your customers running AI with fast GPUs, only to have their systems held back by conventional memory? Tell them to check out HBM.

Do More:

Check out the Supermicro glossary: What is HBM3?

Get tech specs (including HBM) for the AMD Instinct MI300 series accelerators

Download the datasheet: Supermicro’s H13 8U 8-CPU system powered by AMD Instinct MI300X accelerators

Featured videos

Events

Find AMD & Supermicro Elsewhere

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

Featured content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

High latency is a data-center manager’s worst nightmare. Help is here from an open-source solution known as CXL. It works by maintaining “memory coherence” between the CPU’s memory and memory on attached devices.

Applications:
Featured Technologies:

Latency is a crucial measure for every data center. Because latency measures the time it takes for data to travel from one point in a system or network to another, lower is generally better. A network with high latency has slower response times—not good.

Fortunately, the industry has come up with an open-source solution that provides a low-latency link between processors, accelerators and memory devices such as RAM and SSD storage. It’s known as Compute Express Link, or CXL for short.

CXL is designed to solve a couple of common problems. Once a processor uses up the capacity of its direct-attached memory, it relies on an SSD. This introduces a three-order-of-magnitude latency gap that can hurt both performance and total cost of ownership (TCO).

Another problem is that multicore processors are starving for memory bandwidth. This has become an issue because processors have been scaling in terms of cores and frequencies faster than their main memory channels. The resulting deficit leads to suboptimal use of the additional processor cores, as the cores have to wait for data.

CXL overcomes these issues by introducing a low-latency, memory cache coherent interconnect. CXL works for processors, memory expansion and AI accelerators such as the AMD Instinct MI300 series. The interconnect provides more bandwidth and capacity to processors, which increases efficiency and enables data-center operators to get more value from their existing infrastructure.

Cache-coherence refers to IT architecture in which multiple processor cores share the same memory hierarchy, yet retain individual L1 caches. The CXL interconnect reduces latency and increases performance throughout the data center.

The latest iteration of CXL, version 3.1, adds features to help data centers keep up with high-performance computational workloads. Notable upgrades include new peer-to-peer direct memory access, enhancements to memory pooling, and CXL Fabric improvements.

3 Ways to CXL

Today, there are three main types of CXL devices:

Type 1: Any device without integrated local memory. CXL protocols enable these devices to communicate and transfer memory capacity from the host processor.

Type 2: These devices include integrated memory, but also share CPU memory. They leverage CXL to enable coherent memory-sharing between the CPU and the CXL device.

Type 3: A class of devices designed to augment existing CPU memory. CXL enables the CPU to access external sources for increased bandwidth and reduced latency.

Hardware Support

As data-center architectures evolve, more hardware manufacturers are supporting CXL devices. One such example is Supermicro’s All-Flash EDSFF and NVM3 servers.

Supermicro’s cutting-edge appliances are optimized for resource-intensive workloads, including data-center infrastructure, data warehousing, hyperscale/hyperconverged and software-defined storage. To facilitate these workloads, Supermicro has included support for up to eight CXL 2.0 devices for advanced memory-pool sharing.

Of course, CXL can be utilized only on server platforms designed to support communication between the CPU, memory and CXL devices. That’s why CXL is built into the 4th gen AMD EPYC server processors.

These AMD EPYC processors include up to 96 ‘Zen 4’ 5nm cores. Each core includes 32MB per CCD of L3 cache, as well as up to 12 DDR5 channels supporting as much as 12TB of memory.

CXL memory expansion is built into the AMD EPYC platform. That makes these CPUs ideally suited for advanced AI and GenAI workloads.

Crucially, AMD also includes 256-bit AES-XTS and secure multikey encryption. This enables hypervisors to encrypt address space ranges on CXL-attached memory.

The Near Future of CXL

Like many add-on devices, CXL devices are often connected via the PCI Express (PCIe) bus. However, implementing CXL over PCIe 5.0 in large data centers has some drawbacks.

Chief among them is the way its memory pools remain isolated from each other. This adds latency and hampers significant resource-sharing.

The next generation of PCIe, version 6.0, is coming soon and will offer a solution. CXL for PCIe6.0 will offer twice as much throughput as PCIe 5.0.

The new PCIe standard will also add new memory-sharing functionality within the transaction layer. This will help reduce system latency and improve accelerator performance.

CXL is also leading to the start of disaggregated computing. There, resources that reside in different physical enclosures can be available to several applications.

Are your customers suffering from too much latency? The solution could be CXL.

Do More:

Check out Supermicro All-Flash ESFF and NVMe

Learn more about AMD Instinct MI300 series accelerators

Read the white paper: AMD Delivers Breakthrough Memory Performance with DDR5 DRAM and CXL Support

Featured videos

Events

Find AMD & Supermicro Elsewhere

Supermicro intros MicroCloud server powered by AMD EPYC 4004 CPUs

Featured content

Supermicro intros MicroCloud server powered by AMD EPYC 4004 CPUs

Supermicro’s latest 3U server, the Supermicro MicroCloud, supports up to 10 nodes of AMD’s entry-level server processor. With this server and the high-density enclosure, Supermicro offers an efficient, high-density and affordable solution for SMBs, corporate departments and branches, and hosted IT service providers.

Applications:
Featured Technologies:

Supermicro’s latest H13 server is powered by the AMD EPYC 4004 series processors introduced last month. Designated the Supermicro MicroCloud AS -3015MR-H10TNR, this server is designed to run cloud-native workloads for small and midsized businesses (SMBs), corporate departments and branch offices, and hosted IT service providers.

Intended workloads for the new server include web hosting, cloud gaming and content-delivery networks.

10 Nodes, 3U Form

This new Supermicro MicroCloud server supports up to 10 nodes in a 3U form factor. In addition, as many as 16 enclosures can be loaded into a single track, providing a total of 160 individual nodes.

Supermicro says customers using the new MicroCloud server can increase their computing density by 3.3X compared with industry-standard 1U rackmount servers at rack scale.

The new server also supports high-performance peripherals with either two PCIe 4.0 x8 add-on cards or one x16 full-height, full-width GPU accelerator. System memory maxes out at 192GB. And the unit gets air-cooled by five heavy-duty fans.

4004 for SMBs

The AMD EPYC 4004 series processors bring an entry-level family of CPUs to AMD’s EPYC line. They’re designed for use in entry-level servers used by organizations that typically don’t require either hosting on the public cloud or more powerful server processors.

The new AMD EPYC 4004 series is initially offered as eight SKUs, all designed for use in single-processor systems. They offer from 8 to 16 ‘Zen 4’ cores with up to 32 threads; 128MB of L3 cache; 2 DDR channels with a memory capacity of up to 192GB; and 28 lanes of PCIe 5 connectivity.

More Than One

Supermicro is also using the new AMD EPYC 4004 series processors to power three other server lines.

That includes a 1U server designed for web hosting and SMB applications. A 2U server aimed specifically at companies in financial services. And towers intended for content creation, entry-level servers, workstations and even desktops.

All are designed to be high-density, efficient and affordable. Isn’t that what your SMB customers are looking for?

Do More:

Get tech specs from the datasheet: Supermicro MicroCloud Server AS -3015MR-H10TNR

Explore other Supermicro servers powered by AMD EPYC 4004 processors: Power, Efficiency and Cost-Optimized Servers

Read the related blog post: AMD intros entry-level CPU for SMBs, enterprise branches, hosted service providers

Featured videos

Events

Find AMD & Supermicro Elsewhere

Meet AMD's new Alveo V80 Compute Accelerator Card

Featured content

Meet AMD's new Alveo V80 Compute Accelerator Card

AMD’s new Alveo V80 Compute Accelerator Card has been designed to overcome performance bottlenecks in compute-intensive workloads that include HPC, data analytics and network security.

Applications:
Featured Technologies:

Are you or your customers looking for an accelerator for memory-bound applications with large data sets that require FPGA hardware adaptability? If so, then check out the new AMD Alveo V80 Compute Accelerator Card.

It was introduced by AMD at ISC High Performance 2024, an event held recently in Hamburg, Germany.

The thinking behind the new component is that for large-scale data processing, raw computational power is only half the equation. You also need lots of memory bandwidth.

Indeed, AMD’s new hardware adaptable accelerator is purpose-built to overcome performance bottlenecks for compute-intensive workloads with large data sets common to HPC, data analytics and network security applications. It’s powered by AMD’s 7nm Versal HBM Series adaptive system-on-chip (SoC).

Substantial gains

AMD says that compared with the previous-generation Alveo U55C, the new Alveo V80 offers up to 2x the memory bandwidth, 2x the PCIe bandwidth, 2x the logic density, and 4x the network bandwidth (820GB/sec.).

The card also features 4x200G networking, PCIe Gen4 and Gen5 interfaces, and DDR4 DIMM slots for memory expansion.

Appropriate workloads for the new AMD Alveo V80 include HPC, data analytics, FinTech/Blockchain, network security, computational storage, and AI compute.

In addition, the AMD Alveo V80 can scale to hundreds of nodes over Ethernet, creating compute clusters for HPC applications that include genomic sequencing, molecular dynamics and sensor processing.

Developers, too

A production board in a PCIe form factor, the AMD Alveo V80 is designed to offer a faster path to production than designing your own PCIe card.

Indeed, for FPGA developers, the V80 is fully enabled for traditional development via the Alveo Versal Example Design (AVED), which is available on Github.

This example design provides an efficient starting point using a pre-built subsystem implemented on the AMD Versal adaptive SoC. More specifically, it targets the new AMD Alveo V80 accelerator.

Supermicro offering

The new AMD accelerator is already shipping in volume, and you can get it from either AMD or an authorized distributor.

In addition, you can get the Alveo V80 already integrated into a partner-provided server.

Supermicro is integrating the new AMD Alveo V80 with its AMD EPYC processor-powered A+ servers. These include the Supermicro AS-4125GS-TNRT, a compact 4U server for deployments where compute density and memory bandwidth are critical.

Early user

AMD says one early customer for the new accelerator card is the Commonwealth Scientific Industrial Research Organization (CSIRO), the national research organization of Australia.

CSIRO plans to upgrade an older setup with 420 previous-generation AMD Alveo U55C accelerator cards, replacing them with the new Alveo V80.

Because the new part is so much more powerful than its predecessor, the organization expects to reduce the number of cards it needs by two-thirds. That, in turn, should shrink the data-center footprint required and lower system costs.

If those sound like benefits you and your customers would find attractive, check out the AMD Alveo V80 links below.

Do more:

Visit the product page: AMD Alveo V80 Compute Accelerator

Download the product brief: AMD Alveo V80 Compute Accelerator Card

Browse the technical data sheet: AMD Alveo V80 Data Center Accelerator Cards

Check out the Supermicro AS -4125GS-TNRT server

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Featured content

Featured videos

Events

Find AMD & Supermicro Elsewhere

Related Content

Pages