Sponsored by:

Visit AMD Visit Supermicro

Capture the full potential of IT

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

Featured content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

High latency is a data-center manager’s worst nightmare. Help is here from an open-source solution known as CXL. It works by maintaining “memory coherence” between the CPU’s memory and memory on attached devices.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Latency is a crucial measure for every data center. Because latency measures the time it takes for data to travel from one point in a system or network to another, lower is generally better. A network with high latency has slower response times—not good.

Fortunately, the industry has come up with an open-source solution that provides a low-latency link between processors, accelerators and memory devices such as RAM and SSD storage. It’s known as Compute Express Link, or CXL for short.

CXL is designed to solve a couple of common problems. Once a processor uses up the capacity of its direct-attached memory, it relies on an SSD. This introduces a three-order-of-magnitude latency gap that can hurt both performance and total cost of ownership (TCO).

Another problem is that multicore processors are starving for memory bandwidth. This has become an issue because processors have been scaling in terms of cores and frequencies faster than their main memory channels. The resulting deficit leads to suboptimal use of the additional processor cores, as the cores have to wait for data.

CXL overcomes these issues by introducing a low-latency, memory cache coherent interconnect. CXL works for processors, memory expansion and AI accelerators such as the AMD Instinct MI300 series. The interconnect provides more bandwidth and capacity to processors, which increases efficiency and enables data-center operators to get more value from their existing infrastructure.

Cache-coherence refers to IT architecture in which multiple processor cores share the same memory hierarchy, yet retain individual L1 caches. The CXL interconnect reduces latency and increases performance throughout the data center.

The latest iteration of CXL, version 3.1, adds features to help data centers keep up with high-performance computational workloads. Notable upgrades include new peer-to-peer direct memory access, enhancements to memory pooling, and CXL Fabric improvements.

3 Ways to CXL

Today, there are three main types of CXL devices:

  • Type 1: Any device without integrated local memory. CXL protocols enable these devices to communicate and transfer memory capacity from the host processor.
  • Type 2: These devices include integrated memory, but also share CPU memory. They leverage CXL to enable coherent memory-sharing between the CPU and the CXL device.
  • Type 3: A class of devices designed to augment existing CPU memory. CXL enables the CPU to access external sources for increased bandwidth and reduced latency.

Hardware Support

As data-center architectures evolve, more hardware manufacturers are supporting CXL devices. One such example is Supermicro’s All-Flash EDSFF and NVM3 servers.

Supermicro’s cutting-edge appliances are optimized for resource-intensive workloads, including data-center infrastructure, data warehousing, hyperscale/hyperconverged and software-defined storage. To facilitate these workloads, Supermicro has included support for up to eight CXL 2.0 devices for advanced memory-pool sharing.

Of course, CXL can be utilized only on server platforms designed to support communication between the CPU, memory and CXL devices. That’s why CXL is built into the 4th gen AMD EPYC server processors.

These AMD EPYC processors include up to 96 ‘Zen 4’ 5nm cores. Each core includes 32MB per CCD of L3 cache, as well as up to 12 DDR5 channels supporting as much as 12TB of memory.

CXL memory expansion is built into the AMD EPYC platform. That makes these CPUs ideally suited for advanced AI and GenAI workloads.

Crucially, AMD also includes 256-bit AES-XTS and secure multikey encryption. This enables hypervisors to encrypt address space ranges on CXL-attached memory.

The Near Future of CXL

Like many add-on devices, CXL devices are often connected via the PCI Express (PCIe) bus. However, implementing CXL over PCIe 5.0 in large data centers has some drawbacks.

Chief among them is the way its memory pools remain isolated from each other. This adds latency and hampers significant resource-sharing.

The next generation of PCIe, version 6.0, is coming soon and will offer a solution. CXL for PCIe6.0 will offer twice as much throughput as PCIe 5.0.

The new PCIe standard will also add new memory-sharing functionality within the transaction layer. This will help reduce system latency and improve accelerator performance.

CXL is also leading to the start of disaggregated computing. There, resources that reside in different physical enclosures can be available to several applications.

Are your customers suffering from too much latency? The solution could be CXL.

Do More:

 

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

At Computex, AMD & Supermicro CEOs describe AI advances you’ll be adopting soon

Featured content

At Computex, AMD & Supermicro CEOs describe AI advances you’ll be adopting soon

At Computex Taiwan, Lisa Su of AMD and Charles Liang of Supermicro delivered keynotes that focused on AI, liquid cooling and energy efficiency.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The chief executives of both AMD and Supermicro used their Computex keynote addresses to describe their companies’ AI products and, in the case of AMD, pre-announce important forthcoming products.

Computex 2024 was held this past week in Taipei, Taiwan, with the conference theme of “connecting AI.” Exhibitors featured some 1,500 companies from around the world, and keynotes were delivered by some of the IT industry’s top executives.

That included Lisa Su, chairman and CEO of AMD, and Charles Liang, founder and CEO of Supermicro. Here's some of what they previewed at Computex 2024

Lisa Su, AMD: Top priority is AI

Su of AMD presented one of this Computex’s first keynotes. Anyone who thought she might discuss topics other than AI was quickly set straight.

“AI is our number one priority,” Su told the crowd. “We’re at the beginning of an incredibly exciting time for the industry as AI transforms virtually every business, improves our quality of life, and reshapes every part of the computing market.”

AMD intends to lead in AI solutions by focusing on three priorities, she added: delivering a broad portfolio of high-performance, energy-efficient compute engines (including CPUs, GPUs and NPUs); enabling an open and developer-friendly ecosystem; and co-innovating with partners.

The latter point was supported during Su’s keynote by brief visits from several partner leaders. They included Pavan Dhavulari, corporate VP of Windows devices at Microsoft; Christian Laforte, CTO of Stability AI; and (via a video link) Microsoft CEO Satya Nadella.

Fairly late in Su’s hour-plus keynote, she held up AMD’s forthcoming 5th gen EPYC server processor, codenamed Turin. It’s scheduled to ship by year’s end.

As Su explained, Turin will feature up to 192 cores and 384 threads, up from the current generation’s max of 128 cores and 256 threads. Turin will contain 13 chiplets built in both 3-nm and 6-nm processor technology. Yet it will be available as a drop-in replacement for existing EPYC platforms, Su said.

Turin processors will use AMD’s new ‘Zen5’ cores, which Su also announced at Computex. She described AMD’s ‘Zen5’ as “the highest performance and most energy-efficient core we’ve ever built.”

Su also discussed AMD’s MI3xx family of accelerators. The MI300, introduced this past December, has become the fastest ramping product in AMD’s history, she said. Microsoft’s Nadella, during his short presentation, bragged that his company’s cloud was the first to deliver general availability of virtual machines using the AMD MI300X accelerator.

Looking ahead, Su discussed three forthcoming Instinct accelerators on AMD’s road map: The MI325, MI350 and MI400 series.

The AMD Instinct MI325, set to launch later this year, will feature more memory (up to 288GB) and higher memory bandwidth (6TB/sec.) than the MI300. But the new component will still use the same infrastructure as the MI300, making it easy for customers to upgrade.

The next series, MI350, is set for launch next year, Su said. It will then use AMD’s new CDNA4 architecture, which Su said “will deliver the biggest generational AI leap in our history.” The MI350 will be built on 3nm process technology, but will still offer a drop-in upgrade from both the MI300 and MI325.

The last of the three, the MI400 series, is set to start shipping in 2026. That’s also when AMD will deliver a new generation of CDNA, according to Su.

Both the MI325 and MI350 series will leverage the same industry standard universal baseboard OCP server design used by MI300. Su added: “What that means is, our customers can adopt this new technology very quickly.”

Charles Liang, Supermicro: Liquid cooling is the AI future

Liang dedicated his Computex keynote to the topics of liquid cooling and “green” computing.

“Together with our partners,” he said, “we are on a mission to build the most sustainable data centers.”

Liang predicted a big change from the present, where direct liquid cooling (DLC) has a less-than-1% share of the data center market. Supermicro is targeting 15% of new data center deployments in the next year, and Liang hopes that will hit 30% in the next two years.

Driving this shift, he added, are several trends. One, of course, is the huge uptake of AI, which requires high-capacity computing.

Another is the improvement of DLC technology itself. Where DLC system installations used to take 4 to 12 months, Supermicro is now doing them in just 2 to 4 weeks, Liang said. Where liquid cooling used to be quite expensive, now—when TCO and energy savings are factored in—“DLC can be free, with a big bonus,” he said. And where DLC systems used to be unreliable, now they are high performing with excellent uptime.

Supermicro now has capacity to ship 1,000 rack scale solutions with liquid cooling per month, Liang said. In fact, the company is shipping over 50 liquid-cooled racks per day, with installations typically completed within just 2 weeks.

“DLC,” Liang said, “is the wave of the future.”

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Research Roundup: AI edition

Featured content

Research Roundup: AI edition

Catch up on the latest research and analysis around artificial intelligence.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI is the No. 1 AI solution being deployed. Three in 4 knowledge workers are already using AI. The supply of workers with AI skills can’t meet the demand. And supply chains can be helped by AI, too.

Here’s your roundup of the latest in AI research and analysis.

GenAI is No. 1

Generative AI isn’t just a good idea, it’s now the No. 1 type of AI solution being deployed.

In a survey recently conducted by research and analysis firm Gartner, more than a quarter of respondents (29%) said they’ve deployed and are now using GenAI.

That was a higher percentage than any other type of AI in the survey, including natural language processing, machine learning and rule-based systems.

The most common way of using GenAI, the survey found, is embedding it in existing applications. For example, using Microsoft Copilot for 365. This was cited by about 1 in 3 respondents (34%).

Other approaches mentioned by respondents included prompt engineering (cited by 25%), fine-tuning (21%) and using standalone tools such as ChatGPT (19%).

Yet respondents said only about half of their AI projects (48%) make it into production. Even when that happens, it’s slow. Moving an AI project from prototype to production took respondents an average of 8 months.

Other challenges loom, too. Nearly half the respondents (49%) said it’s difficult to estimate and demonstrate an AI project’s value. They also cited a lack of talent and skills (42%), lack of confidence in AI technology (40%) and lack of data (39%).

Gartner conducted the survey in last year’s fourth quarter and released the results earlier this month. In all, valid responses were culled from 644 executives working for organizations in the United States, the UK and Germany.

AI ‘gets real’ at work

Three in 4 knowledge workers (75%) now use AI at work, according to the 2024 Work Trend Index, a joint project of Microsoft and LinkedIn.

Among these users, nearly 8 in 10 (78%) are bringing their own AI tools to work. That’s inspired a new acronym: BYOAI, short for Bring Your Own AI.

“2024 is the year AI at work gets real,” the Work Trend report says.

2024 is also a year of real challenges. Like the Gartner survey, the Work Trend report finds that demonstrating AI’s value can be tough.

In the Microsoft/LinkedIn survey, nearly 8 in 10 leaders agreed that adopting AI is critical to staying competitive. Yet nearly 6 in 10 said they worry about quantifying the technology’s productivity gains. About the same percentage also said their organization lacks an AI vision and plan.

The Work Trend report also highlights the mismatch between AI skills demand and supply. Over half the leaders surveyed (55%) say they’re concerned about having enough AI talent. And nearly two-thirds (65%) say they wouldn’t hire someone who lacked AI skills.

Yet fewer than 4 in 10 users (39%) have received AI training from their company. And only 1 in 4 companies plan to offer AI training this year.

The Work Trend report is based on a mix of sources: a survey of 31,000 people in 31 countries; labor and hiring trends on the LinkedIn site; Microsoft 365 productivity signals; and research with Fortune 500 customers.

AI skills: supply-demand mismatch

The mismatch between AI skills supply and demand was also examined recently by market watcher IDC. It expects that by 2026, 9 of every 10 organizations will be hurt by an overall IT skills shortage. This will lead to delays, quality issues and revenue loss that IDC predicts will collectively cost these organizations $5.5 trillion.

To be sure, AI skills are currently the most in-demand skill for most organizations. The good news, IDC finds, is that more than half of organizations are now using or piloting training for GenAI.

“Getting the right people with the right skills into the right roles has never been more difficult,” says IDC researcher Gina Smith. Her prescription for success: Develop a “culture of learning.”

AI helps supply chains, too

Did you know AI is being used to solve supply-chain problems?

It’s a big issue. Over 8 in 10 global businesses (84%) said they’ve experienced supply-chain disruptions in the last year, finds a survey commissioned by Blue Yonder, a vendor of supply-chain solutions.

In response, supply-chain executives are making strategic investments in AI and sustainability, Blue Yonder finds. Nearly 8 in 10 organizations (79%) said they’ve increased their investments in supply-chain operations. Their 2 top areas of investment were sustainability (cited by 48%) and AI (41%).

The survey also identified the top supply-chain areas for AI investment. They are planning (cited by 56% of those investing in AI), transportation (53%) and order management (50%).

In addition, 8 in 10 respondents to the survey said they’ve implemented GenAI in their supply chains at some level. And more than 90% said GenAI has been effective in optimizing their supply chains and related decisions.

The survey, conducted by an independent research firm with sponsorship by Blue Yonder, was fielded in March, with the results released earlier this month. The survey received responses from more than 600 C-suite and senior executives, all of them employed by businesses or government agencies in the United States, UK and Europe.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

AMD and Supermicro: Pioneering AI Solutions

Featured content

AMD and Supermicro: Pioneering AI Solutions

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Bringing AMD Instinct to the Forefront

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions. Their shared vision? To enable organizations to unlock the full potential of AI workloads, from training massive language models to accelerating complex simulations.

The AMD Instinct MI300 Series: Changing The AI Acceleration Paradigm

At the heart of this collaboration lies the AMD Instinct MI300 Series—a family of accelerators designed to redefine performance boundaries. These accelerators combine high-performance AMD EPYC™ 9004 series CPUs with the powerful AMD InstinctTM MI300X GPU accelerators and 192GB of HBM3 memory, creating a formidable force for AI, HPC, and technical computing.

Supermicro’s H13 Generation of GPU Servers

Supermicro’s H13 generation of GPU Servers serves as the canvas for this technological masterpiece. Optimized for leading-edge performance and efficiency, these servers integrate seamlessly with the AMD Instinct MI300 Series. Let’s explore the highlights:

8-GPU Systems for Large-Scale AI Training:

  • Supermicro’s 8-GPU servers, equipped with the AMD Instinct MI300X OAM accelerator, offer raw acceleration power. The AMD Infinity Fabric™ Links enable up to 896GB/s of peak theoretical P2P I/O bandwidth, while the 1.5TB HBM3 GPU memory fuels large-scale AI models.
  • These servers are ideal for LLM Inference and training language models with trillions of parameters, minimizing training time and inference latency, lowering the TCO and maximizing throughput.

Benchmarking Excellence

But what about real-world performance? Fear not! Supermicro’s ongoing testing and benchmarking efforts have yielded remarkable results. The continued engagement between AMD and Supermicro performance teams enabled Supermicro to test pre-release ROCm versions with the latest performance optimizations and publicly released optimization like Flash Attention 2 and vLLM. The Supermicro AMD-based system AS -8125GS-TNMR2 showcases AI inference prowess, especially on models like Llama-2 70B, Llama-2 13B, and Bloom 176B. The performance? Equal to or better than AMD’s published results from the Dec. 6 Advancing AI event.

Image - Blog - AMD and Supermicro Pioneering AI Solutions

Charles Liang’s Vision

In the words of Charles Liang, President and CEO of Supermicro:

“We are very excited to expand our rack scale Total IT Solutions for AI training with the latest generation of AMD Instinct accelerators. Our proven architecture allows for fully integrated liquid cooling solutions, giving customers a competitive advantage.”

Conclusion

The AMD-Supermicro partnership isn’t just about hardware and software stacks; it’s about pushing boundaries, accelerating breakthroughs, and shaping the future of AI. So, as we raise our virtual glasses, let’s toast to innovation, collaboration, and the relentless pursuit of performance and excellence.

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

10 best practices for scaling the CSP data center — Part 1

Featured content

10 best practices for scaling the CSP data center — Part 1

Cloud service providers, here are best practices—courtesy of Supermicro—to help you design and deploy rack-scale data centers. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Cloud service providers, here are 10 best practices—courtesy of Supermicro—that you can follow for designing and deploying rack-scale data centers. All are based on Supermicro’s real-world experience with customers around the world.

Best Practice No. 1: First standardize, then scale

First, select a configuration of compute, storage and networking. Then scale these configurations up and down into setups you designate as small, medium and large.

Later, you can deploy these standard configurations at various data centers with different numbers of users, workload sizes and growth estimates.

Best Practice No. 2: Optimize the configuration

Good as Best Practice No. 1 is, it may not work if you handle a very wide range of workloads. If that’s the case, then you may want to instead optimize the configuration.

Here’s how. First, run the software on the rack configuration to determine the best mix of CPUs, including cores, memory, storage and I/O. Then consider setting up different sets of optimized configurations.

For example, you might send AI training workloads to GPU-optimized servers. But a database application on a standard 2-socket CPU system.

Best Practice No. 3: Plan for tech refreshes 

When it comes to technology, the only constant is change itself. That doesn’t mean you can just wait around for the latest, greatest upgrade. Instead, do some strategic planning.

That might mean talking with key suppliers about their road maps. What are their plans for transitions, costs, supply chains and more?

Also consider that leading suppliers now let you upgrade some server components without having to replace the entire chassis. That reduces waste. That could also help you get more power from your current racks, servers and power requirements.

Best Practice No. 4: Look for new architectures

New architectures can help you increase power at lower cost. For example, AMD and Supermicro offer data-center accelerators that let you run AI workloads on a mix of GPUs and CPUs, a less costly alternative to all-GPU setups.

To find out if you could benefit from new architectures, talk with your suppliers about running proof-of-concept (PoC) trials of their new technologies. In other words, try before you buy.

Best Practice No. 5: Create a support plan

Sure, you need to run 24x7, but that doesn’t mean you have to pay third parties for all of that. Instead, determine what level of support you can provide in-house. For what remains, you can either staff up or outsource.

When you do outsource, make sure your supplier has tested your software stack before. You want to be sure that, should you have a problem, the supplier will be able to respond quickly and correctly.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

10 best practices for scaling the CSP data center — Part 2

Featured content

10 best practices for scaling the CSP data center — Part 2

Cloud service providers, here are more best practices—courtesy of Supermicro—that you can follow for designing and deploying rack-scale data centers. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Cloud service providers, here are 5 more best practices—courtesy of Supermicro—that you can follow for designing and deploying rack-scale data centers. All are based on Supermicro’s real-world experience with customers around the world.

Best Practice No. 6: Design at the data-center level

Consider your entire data center as a single unit, complete with its range of both strengths and weaknesses. This will help you tackle such macro-level issues as the separation of hot and cold aisles, forced air cooling, and the size of chillers and fans.

If you’re planning an entirely new data center, remember to include a discussion of cooling tech. Why? Because the physical infrastructure needed for an air-cooled center is quite different than that needed for liquid cooling.

Best Practice No. 7: Understand & consider liquid cooling

We’re approaching the limits of air cooling. A new approach, one based on liquid cooling, promises to keep processors and accelerators running within their design limits.

Liquid cooling can also reduce a data center’s Power Usage Effectiveness (PUE) ratio, a measure of how much energy is used by a center’s computing equipment. This cooling tech can also minimize the need for HVAC cooling power.

Best Practice No. 8: Measure what matters

You can’t improve what you don’t measure. So make sure you are measuring such important factors as your data center’s CPU, storage and network utilization.

Good tools are available that can take these measurements at the cluster level. These tools can also identify both bottlenecks and levels of component over- or under-use.

Best Practice No. 9: Manage jobs better

A CSP’s data center is typically used simultaneously by many customers. That pretty much means using a job-management scheduler tool.

One tricky issue is over-demand. That is, what do you do if you lack enough resources to satisfy all requests for compute, storage or networking? A job scheduler can help here, too.

Best Practice No. 10: Simplify your supply chain

Sure, competition across the industry is a good thing, driving higher innovation and lower prices. But within a single data center, standardizing on just a single supplier could be the winning ticket.

This approach simplifies ordering, installation and support. And if something should go wrong, then you’ll have only the proverbial “one throat to choke.”

Can you still use third-party hardware as appropriate? Sure. And with a single main supplier, that integration should be simpler, too.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Data-center service providers: ready for transformation?

Featured content

Data-center service providers: ready for transformation?

An IDC researcher argues that providers of data-center hosting services face new customer demands that require them to create new infrastructure stacks. Key elements will include rack-scale integration, accelerators and new CPU cores. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

If your organization provides data-center hosting services, brace yourself. Due to changing customer demands, you’re about to need an entirely new infrastructure stack.

So argues Chris Drake, a senior research director at market watcher IDC, in a recently published white paper sponsored by Supermicro and AMD, The Power of Now: Accelerate the Datacenter.

In his white paper, Drake asserts that this new data center infrastructure stack will include new CPU cores, accelerated computing, rack-scale integration, a software-defined architecture, and the use of a micro-services application environment.

Key drivers

That’s a challenging list. So what’s driving the need for this new infrastructure stack? According to Drake, changing customer requirements.

More specifically, a growing need for hosted IT requirements. For reasons related to cost, security and performance, many IT shops are choosing to retain proprietary workloads on premises and in private-cloud environments.

While some of these IT customers have sufficient capacity in their data centers to host these workloads on prem, many don’t. They’ll rely instead on service providers for a range of hosted IT requirements. To meet this demand, Drake says, service providers will need to modernize.

Another driver: growing customer demand for raw compute power, a direct result of their adoption of new, advanced computing tools. These include analytics, media streaming, and of course the various flavors of artificial intelligence, including machine learning, deep learning and generative AI.

IDC predicts that spending on servers ranging in price from $10K to $250K will rise from a global total of $50.9 billion in 2022 to $97.4 billion in 2027. That would mark a 5-year compound annual growth rate of nearly 14%.

Under the hood

What will building this new infrastructure stack entail? Drake points to 5 key elements:

  • Higher-performing CPU cores: These include chiplet-based CPU architectures that enable the deployment of composable hardware architectures. Along with distributed and composable hardware architectures, these can enable more efficient use of shared resources and more scalable compute performance.
  • Accelerated computing: Core CPU processing will increasingly be supplemented by hardware accelerators, including those for AI. They’ll be needed to support today’s—and tomorrow’s—increasingly diverse range of high-performance and data-intensive workloads.
  • Rack-scale integration: Pre-tested racks can facilitate faster deployment, integration and expansion. They can also enable a converged-infrastructure approach to building and scaling a data center.
  • Software-defined data center technology: In this approach, virtualization concepts such as abstraction and pooling are extended to a data center’s compute, storage, networking and other resources. The benefits include increased efficiency, better management and more flexibility.
  • A microservices application architecture: This approach divides large applications into smaller, independently functional units. In so doing, it enables a highly modular and agile way for applications to be developed, maintained and upgraded.

Plan for change

Rome wasn’t built in a day. Modernizing a data center will take time, too.

To help service providers implement a successful modernization, Drake of IDC offers this 6-point action plan:

1. Develop a transformation road map: Aim to strike a balance between harnessing new technology opportunities on the one hand and being realistic about your time frames, costs and priorities on the other.

2. Work with a full-stack portfolio vendor: You want a solution that’s tailored for your needs, not just an off-the-rack package. “Full stack” here means a complete offering of servers, hardware accelerators, storage and networking equipment—as well as support services for all of the above.

3. Match accelerators to your workloads: You don’t need a Formula 1 race car to take the kids to school. Same with your accelerators. Sure, you may have workloads that require super-low latency and equally high thruput. But you’re also likely to be supporting workloads that can take advantage of more affordable CPU-GPU combos. Work with your vendors to match their hardware with your workloads.

4. Seek suppliers with the right experience: Work with tech vendors that know what you need. Look for those with proven track records of helping service providers to transform and scale their infrastructures.

5. Select providers with supply-chain ownership: Ideally, your tech vendors will fully own their supply chains for boards, systems and rack designs such as liquid-cooling systems. That includes managing the vertical integration needed to combine these elements. The right supplier could help you save costs and get to market faster.

6. Create a long-term plan: Plan for the short term, but also look ahead into the future. Technology isn’t sitting still, and neither should you. Plan for technology refreshes. Ask your vendors for their road maps, and review them. Decide what you can support in-house versus what you’ll probably need to hand off to partners.

Now do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

AMD CTO: ‘AI across our entire portfolio’

Featured content

AMD CTO: ‘AI across our entire portfolio’

In a presentation for industry analysts, AMD chief technology officer Mark Papermaster laid out the company’s vision for artificial intelligence everywhere — from PC and edge endpoints to the largest hypervisor servers.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The current buildout of the artificial intelligence infrastructure is an event as big as the original launch of the internet.

AI, now mainly an expense, will soon be monetized. Thousands of AI applications are coming.

And AMD plans to embed AI across its entire product portfolio. That will include components and software on everything from PCs and edge sensors to the largest servers used by the big cloud hypervisors.

These were among the comments of Mark Papermaster, AMD’s executive VP and CTO, during a recent fireside chat hosted by stock research firm Arete Research. During the hour-long virtual presentation, Papermaster answered questions from moderator Brett Simpson of Arete and attending stock analysts. Here are the highlights.

The overall AI market

AMD has said it believes the total addressable market (TAM) for AI through 2027 is $400 billion. “That surprised a lot of people,” Papermaster said, but AMD believes a huge AI infrastructure is needed.

That will begin with the major hyperscalers. AWS, Google Cloud and Microsoft Azure are among those looking at massive AI buildouts.

But there’s more. AI is not only in the domain of these massive clusters. Individual businesses will be looking for AI applications that can drive productivity and enhance the customer experience.

The models for these kinds of AI systems are typically smaller. They can be run on smaller clusters, too, whether on-premises or in the cloud.

AI will also make its way into endpoint devices. They’ll include PCs, embedded devices, and edge sensors.

Also, AI is more than just compute. AI systems also require robust memory, storage and networking.

“We’re thrilled to bring AI across our entire product portfolio,” Papermaster said.

Looking at the overall AI market, AMD expects to see a compound annual growth rate of 70%. “I know that seems huge,” Papermaster said. “But we are investing to capture that growth.”

AI pricing

Pricing considerations need to take into account more than just the price of a GPU, Papermaster argued. You really have to look at the total cost of ownership (TCO).

The market is operating with an underlying premise: Demand for AI compute is insatiable. That will drive more and more compute into a smaller area, delivering more efficient power per FLOP, the most common measure of AI compute performance.

Right now, the AI compute model is dominated by a single player. But AMD is now bringing the competition. That includes the recently announced MI300 accelerator. But as Papermaster pointed out, there’s more, too. “We have the right technology for the right purpose,” he said.

That includes using not only GPUs, but also (where appropriate) CPUs. These workloads can include AI inference, edge computing, and PCs. In this way, user organizations can better manage their overall CapEx spend.

As moderator Simpson reminded him, Papermaster is fond of saying that customers buy road maps. So naturally he was asked about AMD’s plans for the AI future. Papermaster mainly deferred, saying more details will be forthcoming. But he also reminded attendees that AMD’s investments in AI go back several years and include its ROCm software enablement stack.

Training vs. inference

Training and inference are currently the two biggest AI workloads. Papermaster believes we’ll see the AI market bifurcate along their two lines.

Training depends on raw computational power in a vast cluster. For example, the popular ChatGPT generative AI tool uses a model with over a trillion parameters. That’s where AMD’s MI300 comes into play, Papermaster said, “because it scales up.”

This trend will continue, because for large language models (LLMs), the issue is latency. How quickly can you get a response? That requires not only fast processors, but also equally fast memory.

More specific inferencing applications, typically run after training is completed, are a different story, Papermaster said, adding: “Essentially, it’s ‘I’ve trained my model; now I want to organize it.’” These workloads are more concise and less demanding of both power and compute, meaning they can run on more affordable GPU-CPU combinations.

Power needs for AI

User organizations face a challenge: While running an AI system requires a lot of power, many data centers are what Papermaster called “power-gated.” In other words, they’re unable to drive up compute capacity to AI levels using current technology.

AMD is on the case. In 2020, the company committed itself to driving a 30x improvement in power efficiency for its products by 2025. Papermaster said the company is still on track to deliver that.

To do so, he added, AMD is thinking in terms of “holistic design.” That means not just hardware, but all the way through an application to include the entire stack.

One promising area involves AI workloads that can use AI approximation. These are applications that, unlike HPC workloads, do not need incredible levels of accuracy. As a result, performance is better for lower-precision arithmetic than it is for high-precision. “Not all AI models are created equally,” Papermaster said. “You’ll need smaller models, too.”

AMD is among those who have been surprised by the speed of AI adoption. In response, AMD has increased its projection of AI sales this year from $2 billion to $3.5 billion, what Papermaster called the fastest ramp AMD has ever seen.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

AMD Instinct MI300 Series: Take a deeper dive in this advanced technology

Featured content

AMD Instinct MI300 Series: Take a deeper dive in this advanced technology

Take a look at the innovative technology behind the new AMD Instinct MI300 Series accelerators.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Earlier this month, AMD took the wraps off its highly anticipated AMD Instinct MI300 Series of generative AI accelerators and data-center acceleration processing units (APUs). During the announcement event, AMD president Victor Peng said the new components had been “designed with our most advanced technologies.”

Advanced technologies indeed. With the AMD Instinct MI300 Series, AMD is writing a brand-new chapter in the story of AI-adjacent technology.

Early AI developments relied on the equivalent of a hastily thrown-together stock car constructed of whichever spare parts happened to be available at the time. But those days are over.

Now the future of computing has its very own Formula 1 race car. It’s extraordinarily powerful and fine-tuned to nanometer tolerances.

A new paradigm

At the heart of this new accelerator series is AMD’s CDNA 3 architecture. This third generation employs advanced packaging that tightly couples CPUs and GPUs to bring high-performance processing to AI workloads.

AMD’s new architecture also uses 3D packaging technologies that integrate up to 8 vertically stacked accelerator complex dies (XCDs) and four I/O dies (IODs) that contain system infrastructure. The various systems are linked via AMD Infinity Fabric technology and are connected to 8 stacks of high-bandwidth memory (HBM).

High-bandwidth memory can provide far more bandwidth and yet much lower power consumption compared with the GDDR memory found in standard GPUs. Like many of AMD’s notable innovations, its HBM employs a 3D design.

In this case, the memory modules are stacked vertically to shorten the distance the data needs to travel. This also allows for smaller form factors.

AMD has implemented the HMB using a unified memory architecture. This is an increasingly popular design in which a single array of main-memory modules supports both the CPU and GPU simultaneously, speeding tasks and applications.

Unified memory is more efficient than traditional memory architecture. It offers the advantage of faster speeds along with lower power consumption and ambient temperatures. Also, data need not be copied from one set of memory to another.

Greater than the sum of its parts

What really makes AMD CDNA 3 unique is its chiplet-based architecture. The design employs a single logical processor that contains a dozen chiplets.

Each chiplet, in turn, is fabricated for either compute or memory. To communicate, all the chiplets are connected via the AMD Infinity Fabric network-on-chip.

The primary 5nm XCDs contain the computational elements of the processor along with the lowest levels of the cache hierarchy. Each XCD includes a shared set of global resources, including the scheduler, hardware queues and 4 asynchronous compute engines (ACE).

The 6nm IODs are dedicated to the memory hierarchy. These chiplets carry a newly redesigned AMD Infinity Cache and an HBM3 interface to the on-package memory. The AMD Infinity Cache boosts generational performance and efficiency by increasing cache bandwidth and reducing the number of off-chip memory accesses.

Scaling ever upward

System architects are constantly in the process of designing and building the world’s largest exascale-class supercomputers and AI systems. As such, they are forever reaching for more powerful processors capable of astonishing feats.

The AMD CDNA 3 architecture is an obvious step in the right direction. The new platform takes communication and scaling to the next level.

In particular, the advent of AMD’s 4th Gen Infinity Architecture Fabric offers architects a new level of connectivity that could help produce a supercomputer far more powerful than anything we have access to today.

It’s reasonable to expect that AMD will continue to iterate its new line of accelerators as time passes. AI research is moving at a breakneck pace, and enterprises are hungry for more processing power to fuel their R&D.

What will researchers think of next? We won’t have to wait long to find out.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

Featured content

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro didn’t waste any time.

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Here’s a quick look, plus links for more technical details:

Supermicro 8-GPU server with AMD Instinct MI300X: AS -8125GS-TNMR2

This big 8U rackmount system is powered by a pair of AMD EPYC 9004 Series CPUs and 8 AMD Instinct MI300X accelerator GPUs. It’s designed for training and inference on massive AI models with a total of 1.5TB of HBM3 memory per server node.

The system also supports 8 high-speed 400G networking cards, which provide direct connectivity for each GPU; 128 PCIe 5.0 lanes; and up to 16 hot-swap NVMe drives.

It’s an air-cooled system with 5 fans up front and 5 more in the rear.

Quad-APU systems with AMD Instinct MI300A accelerators: AS -2145GH-TNMR and AS -4145GH-TNMR

These two rackmount systems are aimed at converged HPC-AI and scientific computing workloads.

They’re available in the user’s choice of liquid or air cooling. The liquid-cooled version comes in a 2U rack format, while the air-cooled version is packaged as a 4U.

Either way, these servers are powered by four AMD Instinct MI300A accelerators, which combine CPUs and GPUs in an APU. That gives each server a total of 96 AMD ‘Zen 4’ cores, 912 compute units, and 512GB of HBM3 memory. Also, PCIe 5.0 expansion slots allow for high-speed networking, including RDMA to APU memory.

Supermicro says the liquid-cooled 2U system provides a 50%+ cost savings on data-center energy. Another difference: The air-cooled 4U server provides more storage and an extra 8 to 16 PCIe acceleration cards.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Pages