Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

Featured content

Developing AI and HPC solutions? Check out the new AMD ROCm 6.2 release

The latest release of AMD’s free and open software stack for developing AI and HPC solutions delivers 5 important enhancements. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

If you develop AI and HPC solutions, you’ll want to know about the most recent release of AMD ROCm software, version 6.2.

ROCm, in case you’re unfamiliar with it, is AMD’s free and open software stack. It’s aimed at developers of artificial intelligence and high-performance computing (HPC) solutions on AMD Instinct accelerators. It's also great for developing AI and HPC solutions on AMD Instinct-powered servers from Supermicro. 

First introduced in 2016, ROCm open software now includes programming models, tools, compilers, libraries, runtimes and APIs for GPU programming.

ROCm version 6.2, announced recently by AMD, delivers 5 key enhancements:

  • Improved vLLM support 
  • Boosted memory efficiency & performance with Bitsandbytes
  • New Offline Installer Creator
  • New Omnitrace & Omniperf Profiler Tools (beta)
  • Broader FP8 support

Let’s look at each separately and in more detail.

LLM support

To enhance the efficiency and scalability of its Instinct accelerators, AMD is expanding vLLM support. vLLM is an easy-to-use library for the large language models (LLMs) that power Generative AI.

ROCm 6.2 lets AMD Instinct developers integrate vLLM into their AI pipelines. The benefits include improved performance and efficiency.

Bitsandbytes

Developers can now integrate Bitsandbytes with ROCm for AI model training and inference, reducing their memory and hardware requirements on AMD Instinct accelerators. 

Bitsandbytes is an open source Python library that enables LLMs while boosting memory efficiency and performance. AMD says this will let AI developers work with larger models on limited hardware, broadening access, saving costs and expanding opportunities for innovation.

Offline Installer Creator

The new ROCm Offline Installer Creator aims to simplify the installation process. This tool creates a single installer file that includes all necessary dependencies.

That makes deployment straightforward with a user-friendly GUI that allows easy selection of ROCm components and versions.

As the name implies, the Offline Installer Creator can be used on developer systems that lack internet access.

Omnitrace and Omniperf Profiler

The new Omnitrace and Omniperf Profiler Tools, both now in beta release, provide comprehensive performance analysis and a streamlined development workflow.

Omnitrace offers a holistic view of system performance across CPUs, GPUs, NICs and network fabrics. This helps developers ID and address bottlenecks.

Omniperf delivers detailed GPU kernel analysis for fine-tuning.

Together, these tools help to ensure efficient use of developer resources, leading to faster AI training, AI inference and HPC simulations.

FP8 Support

Broader FP8 support can improve the performance of AI inferencing.

FP8 is an 8-bit floating point format that provides a common, interchangeable format for both AI training and inference. It lets AI models operate and perform consistently across hardware platforms.

In ROCm, FP8 support improves the process of running AI models, particularly in inferencing. It does this by addressing key challenges such as the memory bottlenecks and high latency associated with higher-precision formats. In addition, FP8's reduced precision calculations can decrease the latency involved in data transfers and computations, losing little to no accuracy.  

ROCm 6.2 expands FP8 support across its ecosystem, from frameworks to libraries and more, enhancing performance and efficiency.

Do More:

Watch the related video podcast:

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Featured content

Research Roundup, AI Edition: platform power, mixed signals on GenAI, smarter PCs

Catch the latest AI insights from leading researchers and market analysts.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Sales of artificial intelligence platform software show no sign of a slowdown. The road to true Generative AI disruption could be bumpy. And PCs with built-in AI capabilities are starting to sell.

That’s some of the latest AI insights from leading market researchers, analysts and pollsters. And here’s your research roundup.

AI Platforms Maintain Momentum

Is the excitement around AI overblown? Not at all, says market watcher IDC.

“The AI platforms market shows no sign of slowing down,” says IDC VP Ritu Jyoti.

IDC now believes that the market for AI platform software will maintain its momentum through at least 2028.

By that year, IDC expects, worldwide revenue for AI software will reach $153 billion. If so, that would mark a five-year compound annual growth rate (CAGR) of nearly 41%.

The market really got underway last year. That’s when worldwide AI platform software revenue hit $27.9 billion, an annual increase of 44%, IDC says.

Since then, lots of progress has been made. Fully half the organizations now deploying GenAI in production have already selected an AI platform. And IDC says most of the rest will do so in the next six months.

All that has AI software suppliers looking pretty smart.

Mixed Signals on GenAI

There’s no question that GenAI is having a huge impact. The question is how difficult it will be for GenAI-using organizations to achieve their desired results.

GenAI use is already widespread. In a global survey conducted earlier this year by management consultants McKinsey & Co., 65% of respondents said they use GenAI on a regular basis.

That was nearly double the percentage from McKinsey’s previous survey, conducted just 10 months earlier.

Also, three quarters of McKinsey’s respondents said they expect GenAI will lead their industries to significant or disruptive changes.

However, the road to GenAI could be bumpy. Separately, researchers at Gartner are predicting that by the end of 2025, at least 30% of all GenAI projects will be abandoned after their proof-of-concept (PoC). 

The reason? Gartner points to several factors: poor data quality, inadequate risk controls, unclear business value, and escalating costs.

“Executives are impatient to see returns on GenAI investments,” says Gartner VP Rita Sallam. “Yet organizations are struggling to prove and realize value.”

One big challenge: Many organizations investing in GenAI want productivity enhancements. But as Gartner points out, those gains can be difficult to quantify.

Further, implementing GenAI is far from cheap. Gartner’s research finds that a typical GenAI deployment costs anywhere from $5 million to $20 million.

That wide range of costs is due to several factors. These include the use cases involved, the deployment approaches used, and whether an organization seeks to be a market disruptor.

Clearly, an intelligent approach to GenAI can be a money-saver.

PCs with AI? Yes, Please

Leading PC makers hope to boost their hardware sales by offering new, built-in AI capabilities. It seems to be working.

In the second quarter of this year, 8.8 million PCs—that’s 14% of all shipped globally in the quarter—were AI-capable, says market analysts Canalys.

Canalys defines “AI-capable” pretty simply: It’s any desktop or notebook system that includes a chipset or block for one or more dedicated AI workloads.

By operating system, nearly 40% of the AI-capable PC shipped in Q2 were Windows systems, 60% were Apple macOS systems, and just 1% ran ChromeOS, Canalys says.

For the full year 2024, Canalys expects some 44 million AI-capable PCs to be shipped worldwide. In 2025, the market watcher predicts, these shipments should more than double, rising to 103 million units worldwide. There's nothing artificial about that boost.

Do more:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Featured content

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Lamini, provider of an LLM platform for developers, turns to Supermicro’s high-performance servers powered by AMD CPUs and GPUs to run its new Memory Tuning stack.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI systems powered by large language models (LLMs) have a serious problem: Their answers can be inaccurate—and sometimes, in the case of AI “hallucinations,” even fictional.

For users, the challenge is equally serious: How do you get precise factual accuracy—that is, correct answers with zero hallucinations—while upholding the generalization capabilities that make LLMs so valuable?

A California-based company, Lamini, has come up with an innovative solution. And its software stack runs on Supermicro servers powered by AMD CPUs and GPUs.

Why Hallucinations Happen

Here’s the premise underlying Lamini’s solution: Hallucinations happen because the right answer is clustered with other, incorrect answers. As a result, the model doesn’t know that a nearly right answer is in fact wrong.

To address this issue, Lamini’s Memory Tuning solution teaches the model that getting the answer nearly right is the same as getting it completely wrong. Its software does this by tuning literally millions of expert adapters with precise facts on top of any open-source LLM, such as Llama 3 or Mistral 3.

The Lamini model retrieves only the most relevant experts from an index at inference time. The goal is high accuracy, high speed and low cost.

More than Fine-Tuning

Isn’t this just LLM fine-tuning? Lamini says no, its Memory Tuning is fundamentally different.

Fine-tuning can’t ensure that a model’s answers are faithful to the facts in its training data. By contrast, Lamini says, its solution has been designed to deliver output probabilities that are not just close, but exactly right.

More specifically, Lamini promises its solution can deliver 95% LLM accuracy with 10x fewer hallucinations.

In the real world, Lamini says one large customer used its solution and raised LLM accuracy from 50% to 95%, and reduced the rate of AI hallucinations from an unreliable 50% to just 5%.

Investors are certainly impressed. Earlier this year Lamini raised $25 million from an investment group that included Amplify Partners, Bernard Arnault and AMD Ventures. Lamini plans to use the funding to accelerate its expert AI development and expand its cloud infrastructure.

Supermicro Solution

As part of its push to offer superior LLM tuning, Lamini chose Supermicro’s GPU server — model number AS -8125S-TNMR2 — to train LLM models in a reasonable time.

This Supermicro 8U system is powered by dual AMD EPYC 9000 series CPUs and eight AMD Instinct MI300X GPUs.

The GPUs connect with CPUs via a standard PCIe 5 bus. This gives fast access when the CPU issues commands or sends data from host memory to the GPUs.

Lamini has also benefited from Supermicro’s capacity and quick delivery schedule. With other GPUs makers facing serious capacity issues, that’s an important benefit for both Lamini and its customers.

“We’re thrilled to be working with Supermicro,” says Lamini co-founder and CEO Sharon Zhou.

Could your customers be thrilled by Lamini, too? Check out the “do more” links below.

Do More:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Why CSPs Need Hyperscaling

Featured content

Why CSPs Need Hyperscaling

Today’s cloud service providers need IT infrastructures that can scale like never before.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Hyperscaling IT infrastructure may be one of the toughest challenges facing cloud service providers (CSPs) today.

The term hyperscale refers to an IT architecture’s ability to scale in response to increased demand.

Hyperscaling is tricky, in large part because demand is a constantly moving target. Without much warning, a data center’s IT demand can increase exponentially due to a myriad of factors.

That could mean a public emergency, the failure of another CSP’s infrastructure, or simply the rampant proliferation of data—a common feature of today’s AI environment.

To meet this growing demand, CSPs have a lot to manage. That includes storage measured in exabytes, AI workloads of massive complexity, and whatever hardware is needed to keep system uptime as close to 100% as possible.

The hardware alone can be a real challenge. CSPs now oversee both air- and liquid-powered cooling systems, redundant power sources, diverse networking gear, and miles of copper and fiber-optic cabling. It’s a real handful.

Design with CSPs in Mind

To help CSPs cope with this seemingly overwhelming complexity, Supermicro offers purpose-built hardware designed to tackle the world’s most demanding workloads.

Enterprise-class servers like Supermicro’s H13 and A+ server series offer CSPs powerful platforms built to handle the rigors of resource-intensive AI workloads. They’ve been designed to scale quickly and efficiently as demand and data inevitably increase.

Take the Supermicro GrandTwin. This innovative solution puts the power and flexibility of multiple independent servers in a single enclosure.

The design helps lower operating expenses by enabling shared resources, including a space-saving 2U enclosure, heavy-duty cooling system, backplane and N+1 power supplies.

To help CSPs tackle the world’s most demanding AI workloads, Supermicro offers GPU server systems. These include a massive—and massively powerful—8U eight-GPU server.

Supermicro H13 GPU servers are powered by 4th-generation AMD EPYC processors. These cutting-edge chips are engineered to help high-end applications perform better and return faster.

To make good on those lofty promises, AMD included more and faster cores, higher bandwidth to GPUs and other devices, and the ability to address vast amounts of memory.

Theory Put to Practice

Capable and reliable hardware is a vital component for every modern CSP, but it’s not the only one. IT infrastructure architects must consider not just their present data center requirements but how to build a bridge to the requirements they’ll face tomorrow.

To help build that bridge, Supermicro offers an invaluable list: 10 essential steps for scaling the CSP data center.

A few highlights include:

  • Standardize and scale: Supermicro suggests CSPs standardize around a preferred configuration that offers the best compute, storage and networking capabilities.
  • Plan ahead for support: To operate a sophisticated data center 24/7 is to embrace the inevitability of technical issues. IT managers can minimize disruption and downtime when some-thing goes wrong by choosing a support partner who can solve problems quickly and efficiently.
  • Simplify your supply chain: Hyperscaling means maintaining the ability to move new infra-structure into place fast and without disruption. CSPs can stack the odds in their favor by choosing a partner that is ever ready to deliver solutions that are integrated, validated, and ready to work on day one.

Do More:

Hyperscaling for CSPs will be the focus of a session at the upcoming Supermicro Open Storage Summit ‘24, which streams live Aug. 13 - Aug. 29.

The CSP session, set for Aug. 20, will cover the ways in which CSPs can seamlessly scale their AI operations across thousands of GPUs while ensuring industry-leading reliability, security and compliance capabilities. The speakers will feature representatives from Supermicro, AMD, Vast Data and Solidigm.

Learn more and register now to attend the 2024 Supermicro Open Storage Summit.

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

Featured content

Tech Explainer: What is CXL — and how can it help you lower data-center latency?

High latency is a data-center manager’s worst nightmare. Help is here from an open-source solution known as CXL. It works by maintaining “memory coherence” between the CPU’s memory and memory on attached devices.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Latency is a crucial measure for every data center. Because latency measures the time it takes for data to travel from one point in a system or network to another, lower is generally better. A network with high latency has slower response times—not good.

Fortunately, the industry has come up with an open-source solution that provides a low-latency link between processors, accelerators and memory devices such as RAM and SSD storage. It’s known as Compute Express Link, or CXL for short.

CXL is designed to solve a couple of common problems. Once a processor uses up the capacity of its direct-attached memory, it relies on an SSD. This introduces a three-order-of-magnitude latency gap that can hurt both performance and total cost of ownership (TCO).

Another problem is that multicore processors are starving for memory bandwidth. This has become an issue because processors have been scaling in terms of cores and frequencies faster than their main memory channels. The resulting deficit leads to suboptimal use of the additional processor cores, as the cores have to wait for data.

CXL overcomes these issues by introducing a low-latency, memory cache coherent interconnect. CXL works for processors, memory expansion and AI accelerators such as the AMD Instinct MI300 series. The interconnect provides more bandwidth and capacity to processors, which increases efficiency and enables data-center operators to get more value from their existing infrastructure.

Cache-coherence refers to IT architecture in which multiple processor cores share the same memory hierarchy, yet retain individual L1 caches. The CXL interconnect reduces latency and increases performance throughout the data center.

The latest iteration of CXL, version 3.1, adds features to help data centers keep up with high-performance computational workloads. Notable upgrades include new peer-to-peer direct memory access, enhancements to memory pooling, and CXL Fabric improvements.

3 Ways to CXL

Today, there are three main types of CXL devices:

  • Type 1: Any device without integrated local memory. CXL protocols enable these devices to communicate and transfer memory capacity from the host processor.
  • Type 2: These devices include integrated memory, but also share CPU memory. They leverage CXL to enable coherent memory-sharing between the CPU and the CXL device.
  • Type 3: A class of devices designed to augment existing CPU memory. CXL enables the CPU to access external sources for increased bandwidth and reduced latency.

Hardware Support

As data-center architectures evolve, more hardware manufacturers are supporting CXL devices. One such example is Supermicro’s All-Flash EDSFF and NVM3 servers.

Supermicro’s cutting-edge appliances are optimized for resource-intensive workloads, including data-center infrastructure, data warehousing, hyperscale/hyperconverged and software-defined storage. To facilitate these workloads, Supermicro has included support for up to eight CXL 2.0 devices for advanced memory-pool sharing.

Of course, CXL can be utilized only on server platforms designed to support communication between the CPU, memory and CXL devices. That’s why CXL is built into the 4th gen AMD EPYC server processors.

These AMD EPYC processors include up to 96 ‘Zen 4’ 5nm cores. Each core includes 32MB per CCD of L3 cache, as well as up to 12 DDR5 channels supporting as much as 12TB of memory.

CXL memory expansion is built into the AMD EPYC platform. That makes these CPUs ideally suited for advanced AI and GenAI workloads.

Crucially, AMD also includes 256-bit AES-XTS and secure multikey encryption. This enables hypervisors to encrypt address space ranges on CXL-attached memory.

The Near Future of CXL

Like many add-on devices, CXL devices are often connected via the PCI Express (PCIe) bus. However, implementing CXL over PCIe 5.0 in large data centers has some drawbacks.

Chief among them is the way its memory pools remain isolated from each other. This adds latency and hampers significant resource-sharing.

The next generation of PCIe, version 6.0, is coming soon and will offer a solution. CXL for PCIe6.0 will offer twice as much throughput as PCIe 5.0.

The new PCIe standard will also add new memory-sharing functionality within the transaction layer. This will help reduce system latency and improve accelerator performance.

CXL is also leading to the start of disaggregated computing. There, resources that reside in different physical enclosures can be available to several applications.

Are your customers suffering from too much latency? The solution could be CXL.

Do More:

 

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

AMD and Supermicro: Pioneering AI Solutions

Featured content

AMD and Supermicro: Pioneering AI Solutions

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Bringing AMD Instinct to the Forefront

In the constantly evolving landscape of AI and machine learning, the synergy between hardware and software is paramount. Enter AMD and Supermicro, two industry titans who have joined forces to empower organizations in the new world of AI with cutting-edge solutions. Their shared vision? To enable organizations to unlock the full potential of AI workloads, from training massive language models to accelerating complex simulations.

The AMD Instinct MI300 Series: Changing The AI Acceleration Paradigm

At the heart of this collaboration lies the AMD Instinct MI300 Series—a family of accelerators designed to redefine performance boundaries. These accelerators combine high-performance AMD EPYC™ 9004 series CPUs with the powerful AMD InstinctTM MI300X GPU accelerators and 192GB of HBM3 memory, creating a formidable force for AI, HPC, and technical computing.

Supermicro’s H13 Generation of GPU Servers

Supermicro’s H13 generation of GPU Servers serves as the canvas for this technological masterpiece. Optimized for leading-edge performance and efficiency, these servers integrate seamlessly with the AMD Instinct MI300 Series. Let’s explore the highlights:

8-GPU Systems for Large-Scale AI Training:

  • Supermicro’s 8-GPU servers, equipped with the AMD Instinct MI300X OAM accelerator, offer raw acceleration power. The AMD Infinity Fabric™ Links enable up to 896GB/s of peak theoretical P2P I/O bandwidth, while the 1.5TB HBM3 GPU memory fuels large-scale AI models.
  • These servers are ideal for LLM Inference and training language models with trillions of parameters, minimizing training time and inference latency, lowering the TCO and maximizing throughput.

Benchmarking Excellence

But what about real-world performance? Fear not! Supermicro’s ongoing testing and benchmarking efforts have yielded remarkable results. The continued engagement between AMD and Supermicro performance teams enabled Supermicro to test pre-release ROCm versions with the latest performance optimizations and publicly released optimization like Flash Attention 2 and vLLM. The Supermicro AMD-based system AS -8125GS-TNMR2 showcases AI inference prowess, especially on models like Llama-2 70B, Llama-2 13B, and Bloom 176B. The performance? Equal to or better than AMD’s published results from the Dec. 6 Advancing AI event.

Image - Blog - AMD and Supermicro Pioneering AI Solutions

Charles Liang’s Vision

In the words of Charles Liang, President and CEO of Supermicro:

“We are very excited to expand our rack scale Total IT Solutions for AI training with the latest generation of AMD Instinct accelerators. Our proven architecture allows for fully integrated liquid cooling solutions, giving customers a competitive advantage.”

Conclusion

The AMD-Supermicro partnership isn’t just about hardware and software stacks; it’s about pushing boundaries, accelerating breakthroughs, and shaping the future of AI. So, as we raise our virtual glasses, let’s toast to innovation, collaboration, and the relentless pursuit of performance and excellence.

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Supermicro Adds AI-Focused Systems to H13 JumpStart Program

Featured content

Supermicro Adds AI-Focused Systems to H13 JumpStart Program

Supermicro is now letting you validate, test and benchmark AI workloads on its AMD-based H13 systems right from your browser. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro has added new AI-workload-optimized GPU systems to its popular H13 JumpStart program. This means you and your customers can validate, test and benchmark AI workloads on a Supermicro H13 system right from your PC’s browser.

The JumpStart program offers remote sessions to fully configured Supermicro systems with SSH, VNC, and web IPMI. These systems feature the latest AMD EPYC 9004 Series Processors with up to 128 ‘Zen 4c’ cores per socket, DDR5 memory, PCIe 5.0, and CXL 1.1 peripherals support.

In addition to previously available models, Supermicro has added the H13 4U GPU System with dual AMD EPYC 9334 processors and Nvidia L40S AI-focused universal GPUs. This H13 configuration is designed for heavy AI workloads, including applications that leverage machine learning (ML) and deep learning (DL).

3 simple steps

The engineers at Supermicro know the value of your customer’s time. So, they made it easy to initiate a session and get down to business. The process is as simple as 1, 2, 3:

  • Select a system: Go to the main H13 JumpStart page, then scroll down and click one of the red “Get Access” buttons to browse available systems. Then click “Select Access” to pick a date and time slot. On the next page, select the configuration and press “Schedule” and then “Confirm.”
  • Sign In: log in with a Supermicro SSO account to access the JumpStart program. If you or your customers don’t already have an account, creating a new account is both free and easy.
  • Initiate secure access: When the scheduled time arrives, begin the session by visiting the JumpStart page. Each server will include documentation and instructions to help you get started quickly.

So very secure

Security is built into the program. For instance, the server is not on a public IP address. Nor is it directly addressable to the Internet. Supermicro sets up the jump server as a proxy, and this provides access to only the server you or your customer are authorized to test.

And there’s more. After your JumpStart session ends, the server is manually secure-erased, the BIOS and firmware are re-flashed, and the OS is reinstalled with new credentials. That way, you can be sure any data you’ve sent to the H13 system will disappear once the session ends.

Supermicro is serious about its security policies. However, the company still warns users to keep sensitive data to themselves. The JumpStart program is meant for benchmarking, testing and validation only. In their words, “processing sensitive data on the demo server is expressly prohibited.”

Keep up with the times

Supermicro’s expertly designed H13 systems are at the core of the JumpStart program, with new models added regularly to address typical workloads.

In addition to the latest GPU systems, the program also features hardware focused on evolving data center roles. This includes the Supermicro H13 CloudDC system, an all-in-one rackmount platform for cloud data centers. Supermicro CloudDC systems include single AMD EPYC 9004 series processors and up to 10 hot-swap NVMe/SATA/SAS drives.

You can also initiate JumpStart sessions on Supermicro Hyper Servers. These multi-use machines are optimized for tasks including cloud, 5G core, edge, telecom and hyperconverged storage.

Supermicro Hyper Servers included in the company’s JumpStart program offer single or dual processor configurations featuring AMD EPYC 9004 processors and up to 8TB of DDR5 memory in a 1U or 2U form factor.

Helping your customers test and validate a Supermicro H13 system for AI is now easy. Just get a JumpStart.

Do more:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

Featured content

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro didn’t waste any time.

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Here’s a quick look, plus links for more technical details:

Supermicro 8-GPU server with AMD Instinct MI300X: AS -8125GS-TNMR2

This big 8U rackmount system is powered by a pair of AMD EPYC 9004 Series CPUs and 8 AMD Instinct MI300X accelerator GPUs. It’s designed for training and inference on massive AI models with a total of 1.5TB of HBM3 memory per server node.

The system also supports 8 high-speed 400G networking cards, which provide direct connectivity for each GPU; 128 PCIe 5.0 lanes; and up to 16 hot-swap NVMe drives.

It’s an air-cooled system with 5 fans up front and 5 more in the rear.

Quad-APU systems with AMD Instinct MI300A accelerators: AS -2145GH-TNMR and AS -4145GH-TNMR

These two rackmount systems are aimed at converged HPC-AI and scientific computing workloads.

They’re available in the user’s choice of liquid or air cooling. The liquid-cooled version comes in a 2U rack format, while the air-cooled version is packaged as a 4U.

Either way, these servers are powered by four AMD Instinct MI300A accelerators, which combine CPUs and GPUs in an APU. That gives each server a total of 96 AMD ‘Zen 4’ cores, 912 compute units, and 512GB of HBM3 memory. Also, PCIe 5.0 expansion slots allow for high-speed networking, including RDMA to APU memory.

Supermicro says the liquid-cooled 2U system provides a 50%+ cost savings on data-center energy. Another difference: The air-cooled 4U server provides more storage and an extra 8 to 16 PCIe acceleration cards.

Do more:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: How does design simulation work? Part 1

Featured content

Tech Explainer: How does design simulation work? Part 1

Design simulation lets designers and engineers create, test and improve designs of real-world airplanes, cars, medical devices and more while working safely and quickly in virtual environments. This workflow also reduces the need for physical tests and allows designers to investigate more alternatives and optimize their products.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Design simulation is a type of computer-aided engineering used to create new products, reducing the need for physical prototypes. The result is a faster, more efficient design process in which complex physics and math do much of the heavy lifting.

Rapid advances in CPUs and GPUs that are used to perform simulation and software have made it possible to shift product design from the physical world to a virtual one.

In this virtual space, engineers can create and test new designs as quickly as their servers can calculate the results and then render them with visualization software.

Getting better all the time

Designing via AI-powered virtual simulation offers significant improvements over older methods.

Back in the day, it might have taken a small army of automotive engineers years to produce a single new model. Prototypes were often sculpted from clay and carted into a wind tunnel to test aerodynamics.

Each new model went through a seemingly endless series of time-consuming physical simulations. The feedback from those tests would literally send designers back to the drawing board.

It was an arduous and expensive process. And the resources necessary to accomplish these feats of engineering often came at the expense of competition. Companies whose pockets weren’t deep enough might fail to keep up.

Fast-forward to the present. Now, we’ve got smaller design teams aided by increasingly powerful clusters of high-performance systems.

These engineers can tweak a car’s crumple zone in the morning … run the new version through a virtual crash test while eating lunch … and send revised instructions to the design team before day’s end.

Changing designs, saving lives

Faster access to this year’s Ford Mustang is one thing. But if you really want to know how design simulation is changing the world, talk to someone whose life was saved by a mechanical heart valve.

Using the latest tech, designers can simulate new prosthetics in relation to the physiology they’ll inhabit. Many factors come into play here, including size, shape, materials, fluid dynamics, failure models and structural integrity over time.

What’s more, it’s far better to theorize how a part will interact with the human body before the doctor installs it. Simulations can warn medical pros about potential infections, rejections and physical mismatches. AI can play a big part in these types of simulations and manufacturing.

Sure, perfection may be unattainable. But the closer doctors get to a perfect match between a prosthetic and its host body, the better the patient will fair after the procedure.

Making the business case

Every business wants to cut costs, increase efficiency and get an edge over the competition. Here, too, design simulation offers a variety of ways to achieve those lofty goals.

As mentioned above, simulation can drastically reduce the need for expensive physical prototypes. Creating and testing a new airplane design virtually means not having to come within 100 miles of a runway until the first physical prototype is ready to take flight. 

Aerospace and automotive industries rely heavily on both the structural integrity of an assembly but also on computational fluid dynamics. In this way, simulation can potentially save an aerospace company billions of dollars over the long run.

What’s more, virtual airplanes don’t crash. They can’t be struck by lightning. And in a virtual passenger jet, test pilots don’t need to worry about their safety.

By the time a new aircraft design rolls onto the tarmac, it’s already been proven air-worthy—at least to the extent that a virtual simulation can make those kinds of guarantees.

Greater efficiency

Simulation makes every aspect of design more efficient. For instance, iteration, a vital element of the design process, becomes infinitely more manageable in a simulated environment.

Want to find out how a convertible top will affect your new supercar’s 0-to-60 time? Simulation allows engineers to quickly replace the hard-top with some virtual canvas and then create a virtual drag race against the original model.

Simulation can take a product to the manufacturing phase, too. Once a design is finished, engineers can simulate its journey through a factory environment.

This virtual factory, or digital twin, can help determine how long it will take to build a product and how it will react to various materials and environmental conditions. It can even determine how many moves a robot arm will need to make and when human intervention might become necessary. This process helps engineers optmize the manufacturing process.

In countless ways, simulation has never been more real.

In Part 2 of this 2-part blog, we’ll explore the digital technology behind design simulation. This cutting-edge technology is made possible by the latest silicon, vast swaths of high-speed storage, and sophisticated blade servers that bring it all together.

Do more:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

In Part 1 of this 2-part Tech Explainer, we explored the difference between how machine learning and deep learning models are trained and deployed. Now, in Part 2, we’ll get deeper into deep learning to discover how this advanced form of AI is changing the way we work, learn and create.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Where Machine Learning is designed to reduce the need for human intervention, Deep Learning—an extension of ML—removes much of the human element altogether.

If ML were a driver-assistance feature that helped you parallel park and avoid collisions, DL would be an autonomous, self-driving car.

The human intervention we’re talking about has much to do with categorizing and labeling the data used by ML models. Producing this structured data is both time-consuming and expensive.

DL shortens the time and lowers the cost by learning from unstructured data. This elimnates much of the data pre-processing performed by humans for ML.

That’s good news for modern businesses. Market watcher IDC estimates that as much as 90% of corporate data is associated with unstructured data.

DL is particularly good at processing unstructured data. That includes information coming from the edge, the core and millions of both personal and IoT devices.

Like a brain, but digital

Deep Learning systems “think” with a neural network—multiple layers of interconnected nodes designed to mimic the way the human brain works. A DL system processes data inputs in an attempt to recognize, classify and accurately describe objects within data.

The layers of a neural network are stacked vertically. Each layer builds on the work performed by the one below it. By pushing data through each successive layer, the overall system improves its predictions and categorizations.

For instance, imagine you’ve tasked a DL system to identify pictures of junk food. The system would quickly learn—on its own—how to differentiate Pringles from Doritos.

It might do this by learning to recognize Pringles’ iconic tubular packaging. Then the system would categorize Pringles differently than the family-size sack of Doritos.

What if you fed this hypothetical DL system with more pictures of chips? Then it could begin to identify varying angles of packaging, as well as colors, logos, shapes and granular aspects of the chips themselves.

As this example illustrates, the longer a DL system operates, the more intelligent and accurate it becomes.

Things we used to do

DL tends to be deployed when it’s time to pull out the big guns. This isn’t tech you throw at a mere spam filter or recommendation engine.

Instead, it’s the tech that powers the world’s finance, biomedical advances and law enforcement. For these verticals, failure is simply not an option.

For these verticals, here are some of the ways DL operates behind the scenes:

  • BioMed: DL helps healthcare staff analyze medical imaging such as X-rays and CT scans. In many cases, the technology is more accurate than well-trained physicians with decades of experience.
  • Finance: For those seeking a market edge (read: everyone), DL employs powerful, algorithmic-based predictive analytics. This helps modern-day robber barons manage their portfolios based on insights from data so vast, they couldn’t leverage it themselves. DL also helps financial institutions assess loans, detect fraud and manage credit.
  • Law Enforcement: In the 2002 movie “Minority Report,” Tom Cruise played a police officer who could arrest people before they committed a crime. With DL, this fiction could turn into an unsettling reality. DL can be used to analyze millions of data points, then predict who is most likely to break the law. It might even give authorities an idea of where, when and how it could happen.

The future…?

Looking into a crystal ball—which these days probably uses DL—we can see a long succession of similar technologies coming. Just as ML begat DL, so too will DL beget the next form of AI—and the one after that.

The future of DL isn’t a question of if, but when. Clearly, DL will be used to advance a growing number of industries. But just when each sector will come to be ruled by our new smarty-pants robots is less clear.

Keep in mind: Even as you read this, DL systems are working tirelessly to help data scientists make AI more accurate and able to provide more useful assessments of datasets for specific outcomes. And as the science progresses, neural networks will continue to become more complex—and more like human brains.

That means the next generation of DL will likely be far more capable than the current one. Future AI systems could figure out how to reverse the aging process, map distant galaxies, even produce bespoke food based on biometric feedback from hungry diners.

For example, the upcoming AMD Instinct MI300 accelerators promise to usher in a new era of computing capabilities. That includes the ability to handle large language models (LLMs), the key approach behind generative AI systems such as ChatGPT.

Yes, the robots are here, and they want to feed you custom Pringles. Bon appétit!

 

Do more:

 

Featured videos



Find AMD & Supermicro Elsewhere

Related Content

Pages