Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

Tech Explainer: What are CPU Cores, Threads, Cache & Nodes?

Featured content

Tech Explainer: What are CPU Cores, Threads, Cache & Nodes?

Today’s CPUs are complex. Find out what the key components actually do—and why, in an age of AI, they still matter.

Learn More about this topic
  • Applications:
  • Featured Technologies:

In the age of artificial intelligence, CPUs still matter. A central processor’s parts—cores, threads, cache and nodes—are as important as any AI accelerator.

But what exactly do those CPU parts do? And why, in an age of AI, do they still matter?

These questions are easy to overlook given AI’s focus on the GPU. To be sure, graphical processors are important for today’s AI workloads. But the humble CPU also remains a vital component.

If the GPU is AI’s turbocharger, then the CPU is the engine that makes the whole car go. As Dan McNamara, AMD’s GM of compute and enterprise AI business, said at the recent AMD Financial Analysts Day, “AI requires leadership CPUs.”

So here’s a look at the most important components of today’s data-center x86 CPUs. And an explanation of why they matter.

Cores: Heavy Lifting

The central processing unit is the brain of any PC or server. It reads instructions, does the complex math, and coordinates the system’s every task.

Zooming into the architecture of a CPU, it’s the individual cores that put the “PU” in CPU. Each fully independent processing unit can run its own task, virtual machine (VM) or container.

Modern enterprise-class CPUs such as AMD’s EPYC 9005 Series offer anywhere from 8 to 192 cores each. They operate at up to 5GHz.

These cores are built using AMD’s ‘Zen’ architecture. It’s a fundamental core design that offers enhancements vital to data centers, including improved instructions-per-clock (IPC), branch prediction, caches and efficiency.

Performance like that is a must-have when it comes to a data center’s most demanding tasks. That’s especially true for compute-intensive database operations and API-heavy microservices such as authentication, payment gateways and search.

Having more cores in each CPU also enables IT managers to run more workloads per server. That, in turn, helps organizations lower their hardware and operating costs, simplify IT operations, and more easily scale operations.

Threads: Helping Cores Do More

A modern CPU core needs to multitask, and that’s where having multiple threads is essential. A single CPU core with two threads can juggle two tasks by switching between them very quickly. In a CPU with a high core count, a productivity-multiplier like that becomes exponentially more effective.

This capability delivers two important benefits. One, it helps ensure that each CPU core stays productive, even if one task stalls. And two, it boosts the CPU’s overall output.

For example, the AMD EPYC 9965 processor boasts 192 cores with a total of 384 threads. That kind of multitasking horsepower helps smooth request handling for web services and microservices. It also improves VM responsiveness and helps AI workloads run more efficiently under heavy loads.

Cache: Speedy but Short-Term Memory

The unsung heroes of CPU design? That would be cache.

The main job of a CPU cache is to help the cores juggle data with low latency. Remember, less latency is always better.

As a result, CPU cache enables databases to run faster, improve VM density and reduce latency.

Your average CPU cache is arranged in three layers:

  • L1 cache is very small and very fast. Each core has its own L1 cache, which holds around 32 KB of instructions and data. The L1 cache sends that data to a register— a tiny, ultra-fast storage location the core uses to acquire the data used for calculations.
  • L2 cache is also exclusive to each core. At around 1MB, this cache is bigger than L1, but it’s also a little slower. L2 cache holds any data that doesn’t fit in the L1 cache. Working together, the L1 and L2 caches can quickly pass data back and forth until ultimately, the L1 cache passes the data to the core.
  • L3 cache is shared by all cores in a CPU, and it acts as a buffer for passing data between the CPU and main memory. Sizes vary widely. In an 8-core AMD EPYC processor, the L3 cache is just 64MB. But in AMD’s 192-core CPU, the L3 Cache gets as big as 348MB.

Some AMD CPUs, including the AMD EPYC 9845, also include a 3D V-Cache. This AMD innovation stacks an additional cache on top of the L3 cache (hence the name 3D). Stacking the two caches vertically adds storage without increasing the overall size of the CPU.

The added 3D V-Cache also improves performance for workloads that benefit from a larger cache. Examples include scientific simulations and big data.

Nodes: Power & Efficiency

When it comes to CPU nodes, smaller is better. A smaller node size can deliver benefits that include lower power consumption, increased efficiency, and more compute performance per watt.

Nodes are expressed in nanometers (nm)—that’s one billionth of a meter—which describe the tiny size of transistors on a chip.

The latest AMD EPYC 9005-series architectures, ‘Zen 5’ and ‘Zen 5c,’ are built on 4nm and 3nm nodes, respectively.

Each of these individual performance gains may seem tiny when considered on a per-chip basis. But in the aggregate, they can make a huge difference. That’s especially true for resource-intensive workloads such as AI training and inferencing.

Coming Soon: Smaller, Faster CPUs

AMD’s near-term roadmap tells us we can expect its AMD EPYC CPUs to continue getting smaller, faster and more efficient.

Those manufacturing and performance gains will likely come from more cores per CPU socket, bigger and more efficient caches. Earlier this year, AMD said the next generation of its EPYC processors, codenamed Venice, will be brought up on TSMC’s advanced 2nm process technology.

Enterprises will be able to parlay those improvements into better performance under multi-tenant loads and reduced latency overall. The latter is particularly vital for modern operations.

The bottom lie: Denser CPU cores mean big business, both for processor makers such as AMD and the server vendors such as Supermicro that rely on these CPUs.

Denser CPUs are also vital for enterprises now transforming their data centers for AI. Because adding space is so slow and costly, these organizations are instead looking to pack more compute power per rack. Smaller, more powerful CPUs are an important part of their solution.

Minimum CPU size with maximum power? It’s coming soon to a data center near you.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Supermicro adds MicroBlade for CSPs powered by AMD EPYC 4005 series processors

Featured content

Supermicro adds MicroBlade for CSPs powered by AMD EPYC 4005 series processors

To serve cloud service providers, Supermicro adds a 6U, 20-node MicroBlade server powered by AMD EPYC 4005 series processors.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Not every cloud service provider is as big or deep-pocketed as the big three—AWS, Google and Microsoft. And to serve those smaller and midsize CSPs, Supermicro recently added a 6U, 20-node server to its MicroBlade family powered by AMD EPYC 4005 series processors.

Smaller CSPs represent a big market. To be sure,  AWS, Google and Microsoft collectively drew nearly 65% of total worldwide cloud market revenue in this year’s second quarter, according to Technopedia. But for both smaller CSPs and their suppliers, that remaining 35% was still quite valuable.

Technopedia estimates worldwide cloud services revenue in Q2 totaled $99 billion. That means the 35% share left to smaller and midsize CSPs equaled $34 billion.

MicroBlade, Macro Benes

To serve these smaller CSPs, Supermicro recently introduced a 6U, 20-node MicroBlade (model number MBA-315R-1G) powered by a single AMD EPYC 4005 series processor.

This MicroBlade system delivers a cost-effective, green computing solution. It’s intended for workloads that include not only cloud computing, but also web hosting, dedicated hosting, virtual desktop infrastructure (VDI), AI inferencing, and enterprise workloads.

Supermicro CEO Charles Liang calls the new servers “a very cost-effective, green computing solution for cloud service providers.”

Key benefits of the new Supermicro system include up to 95% cable reduction with two integrated Ethernet switches per server; 70% space savings; and 30% energy savings over traditional 1U servers.

The system offers 3.3x higher density than a traditional 1U server. As a result, users can pack as many as 160 servers with 2,650 CPU cores, as well as 16 Ethernet switches, in a single 48U rack.

Under the hood, each MicroBlade server blade supports a single AMD EPYC 4005 CPU with up to 16 cores and 192GB of DDR5 memory. Also supported is a dual-slot, full-height/full-length (FHFL) GPU.

Also, this Supermicro system contains a dual-port 10GbE network switch. It’s designed to simplify topologies and enable more server instances per rack.

The 6U MicroBlade chassis can hold up to 20 individual server blades, two Ethernet switches and two management modules.

To protect workloads such as dedicated hosting, VDI, online gaming and AI inferencing, the Supermicro system also offers N+N redundancy. This setup configures two sets of independent components to provide high levels of reliability.

The MicroBlade system will also be available as a motherboard (model number BH4SRG) for Supermicro A+ servers.

Inside the AMD EPYC 4005 Series

The processors powering the new Supermicro server, AMD’s EPYC 4005 series, offer powerful performance for AI, cloud and hosting workloads. Yet they’re attractively priced for smaller businesses and hosting services.

The processors are based on the same core generation, ‘Zen 5,’ as are AMD’s more powerful data center processors, the AMD EPYC 9005 series. Yet the 4005 series processors have been designed for smaller operations, offering a combination of affordability, efficiency and ease of use.

AMD’s corporate VP for enterprise and HPC, Derek Dicker, says the AMD EPYC 4005 series processors “give our technology partners the flexibility to create powerful yet affordable systems that meet the specific needs of growing businesses and dedicated hosters.”

Do you have CSP clients looking for an affordable yet powerful servers? Tell them about these new AMD-powered Supermicro servers, coming soon.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Tech Explainer: What’s a short-depth server?

Featured content

Tech Explainer: What’s a short-depth server?

Do your customer have locations that need server compute power, but lack data centers? Short-depth servers to the rescue!

Learn More about this topic
  • Applications:
  • Featured Technologies:

There are times when a standard-sized server just won’t do. Maybe your customer’s branch office or retail store has space constraints. Maybe they have concerns over portability. Or maybe their sustainability goals demand a solution that requires low power and efficient cooling.

For these and other related situations, short-depth servers can fit the bill. These relatively diminutive boxes are designed for use in less-than-ideal physical spaces that nevertheless demand high-performance IT infrastructure.

What kinds of organizations could benefit from short-depth server? Consider your local retail store. It’s likely been laid out using a calculus that prioritizes profit per square inch. This means the store’s best spots are dedicated to attracting buyers and generating revenue.

While that’s smart in terms of retail finance, it may not leave much room for vital infrastructure. That includes the servers that power the store’s point of sale (POS), security, advertising and data-collection systems.

This is a case where short-depth servers can help. These systems provide high levels of compute, storage and networking—without needing tall data center racks, elaborate cooling systems or other supporting infrastructure.

Other good candidates for using short-depth servers include remote branch offices, telco edge installations and industrial environments. In other words, any location that needs enterprise-level servers, but is short on space.

Small but Mighty

What’s more, today’s short-depth servers can handle some serious workloads.

Consider, for instance, the Supermicro WIO A+ Server (AS -1115SV-WTNRT), powered by AMD EPYC 8004 series processors. This short-depth server is engineered to tackle a variety of workloads, including virtualization, firewall applications, database, storage, edge and cloud computing.

The WIO A+ ships as a 1U form factor with a depth of just 23.5 inches. Compared with one of Supermicro’s big 8U multi-GPU servers, which has a depth of more than 33 inches, the short-depth server is short indeed.

Yet despite its diminutive size, this Supermicro server is packed with a ton of power—and room to grow. A single AMD EPYC processor sits at the center of the action, aided by either one double-width or two single-width GPUs.

This server also has room for up to 768GB of ECC DDR5 memory. And it can accommodate up to 10 hot-swap drives for NVMe, SAS or SATA storage.

As if that weren’t enough, Supermicro also includes room in this server cabinet for two PCIe 5.0 x16 full-height, full-length (FHFL) expansion cards. There’s also space for a single PCIe 5.0 x16 low-profile (LP) card.

More Power for Smaller Space

Fitting enough tech into a short-depth server can be a challenge. To do this, Supermicro’s designers had a few tricks up their sleeves.

For one, they used a custom motherboard instead of the more common ATX or EEB types. This creates more space in the smaller chassis. It also lets the designers employ a high-density component layout. The processors, GPUs, drives and other elements are placed closer to each other than they could be in a standard server.

Supermicro’s designers also deployed low-profile heat sinks. These use pipes that direct the heat toward fans. To save space, the fans are smaller than usual, but make up the difference by running faster. Sure, faster fans can create more noise. But it’s a worthy trade-off to avoid system failure due to overheating.

Are there downsides to the smaller form factor? There can be. For one, constrained airflow could force a system to throttle both processor and GPU performance in an effort to prevent heat-related issues. This could be an issue when running highly resource-intensive VM workloads.

For another, the smaller power supply units (PSUs) used in many short-depth servers may necessitate a less-powerful configuration than a user might prefer. For example, Supermicro’s short-depth server includes two 860-watt power supplies. That’s far less available power than the company’s multi-GPU powerhouse, which comes with six 5,250-watt PSUs. Of course, from another perspective, the need for less power can be seen as a benefit, especially at remote edge locations.

Short-depth servers represent a useful trade-off. While they give up some power and expandability, their reduced sizes can help IT pros make the most of tight spaces.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

How Supermicro/AMD servers boost AI boost performance with MangoBoost

Featured content

How Supermicro/AMD servers boost AI boost performance with MangoBoost

Supermicro and MangoBoost are together delivering an optimized end-to-end GenAI stack. It’s based on Supermicro servers powered by AMD Instinct GPUs and running MangoBoost’s LLMBoost software.

Learn More about this topic
  • Applications:
  • Featured Technologies:

While many organizations are implementing AI for business, many are also discovering that deploying and operating large language models (LLMs) at scale isn’t easy.

They’re finding that the hardware demands are intense. And so are the performance and cost trade-offs. Also, with AI workloads increasingly demanding multi-node GPU clusters, orchestration and tuning can be complex.

To address these challenges, Supermicro and MangoBoost Inc. are working together to deliver an optimized end-to-end GenAI stack. They’ve combined Supermicro’s robust AMD Instinct GPU server portfolio with MangoBoost’s LLMBoost software.

Meet MangoBoost

If you’re unfamiliar with MangoBoost, the company offers programmable solutions that improve data-center application performance while lowering CPU overhead. MangoBoost was founded three years ago; today it operates in the United States, Canada and South Korea.

MangoBoost’s core product is called the Data Processing Unit. It ensures full compatibility with general-purpose GPUs, accelerators and storage devices, enabling cost-efficient and standardized AI infrastructures.

MangoBoost also offers a ready-to-deploy, full-stack AI inference server. Known as Mango LLMBoost, it’s available from the Big Three cloud providers—AWS, Microsoft Azure and Google Cloud.

LLMBoost helps organizations accelerate both the training and deploying LLM at scale. Why is this so challenging? Because once a model is ready for inference, developers face what’s known as a “productization tax.”

Integrating the machine-learning processing pipeline into the rest of the application often requires additional time and engineering effort. And this can lead to delays.

Mango LLMBoost addresses these challenges by creating an easy-to-use container. This lets LLM experts optimize their models, then select suitable GPUs on demand.

MangoBoost’s inference engine uses three forms of GPU parallelism, allowing GPUs to balance their compute, memory and network-resource usage. In addition, the software’s intelligent job scheduling optimizes cluster-wide GPU resources, ensuring that the load is balanced equally across GPU nodes.

LLMBoost also ensures the effective use of low-latency GPU caches and high-bandwidth memory through quantization. This reduces the data footprint, but without lowering accuracy.

Complementing Hardware

MangoBoost’s LLMBoost software complements the powerful hardware with a full-stack, production-ready AI MLOps platform. It includes:

  • Plug-and-play deployment: Pre-built Docker images and an intuitive command-line interface (CLI) both help developers to launch LLM workloads quickly.
  • OpenAI-compatible API: Lets developers integrate LLM endpoints with minimal code changes.
  • Kubernetes-native orchestration: Provides automated deployment and management of autoscaling, load balancing and job scheduling for seamless operation across both single- and multi-node clusters.
  • Full-stack performance auto-tuning: Unlike conventional auto-tuners that handle model hyper-parameters only, LLMBoost optimizes every layer from the inference and training back-ends to network configurations and GPU runtime parameters. This ensures maximum hardware utilization, yet without requiring any manual tuning.

Proof of Performance

Supermicro and MangoBoost collaborating to deliver an optimized end-to-end Generative AI stack sounds good. But how does the combined solution actually perform?

To find out, Supermicro, AMD and MangoBoost recently tested their combined solution using real-world GenAI workloads. Here are the results:

  • LLMBoost reduced training time by 40% for two-node training, down to 13.3 minutes on a server built around a dual-node AMD Instinct MI325X. The training was done running Llama 2 70B, an LLM with 70 billion parameters, with LoRA (low-rank adaptation).
  • LLMBoost achieved a 1.96X higher throughput for multiple-node inference on Supermicro AMD servers. That was up to over 61,000 tokens/sec. on a dual-node AMD Instinct MI325X configuration.
  • In-house LLM inference with Llama 4 Maverick and Scout models achieved near-linear scaling on AMD Instinct MI325X nodes. (Maverick is designed for fast responses at low cost; Scout, for long-document analysis.) This shows that Supermicro systems are ready for real-time GenAI deployment.
  • Load balancing: The researchers used LLaVA, an image-capturing model, on three setups. The heterogeneous dual-node configuration—eight AMD Instinct MI300X GPUs and eight AMD Instinct MI325X GPUs—achieved 96% of the sum of individual single-node runs. This demonstrates minimal overhead and high efficiency.

Are your customers looking for a turnkey GenAI cluster solution that’s high-performance, flexible and easy to operate? Then tell them that Supermicro, AMD and MangoBoost have their solution—and the proof that it works.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Validate, test and benchmark the latest AMD-powered servers with Supermicro JumpStart

Featured content

Validate, test and benchmark the latest AMD-powered servers with Supermicro JumpStart

Get a free test drive on cutting-edge Supermicro servers powered by the latest AMD CPUs and GPUs.

Learn More about this topic
  • Applications:
  • Featured Technologies:

How would you like free access to Supermicro’s first-to-market, high-end H14 servers powered by the latest AMD EPYC CPUs and Instinct GPUs?

Now it’s yours via your browser—and the Supermicro JumpStart program.

JumpStart offers you remote access to Supermicro servers. There, you can validate, test and benchmark your workloads. And assuming you qualify, using JumpStart is absolutely free.

While JumpStart has been around for some time, Supermicro has recently refreshed the program by including some of its latest H14 servers:

  • 8U server with eight AMD Instinct MI325X GPUs, dual AMD EPYC 9005 Series CPUs, 2TB of HBM3 memory (Supermicro model AS -8126GS-TNMR)
  • 2U server with dual AMD EPYC 9005 Series processors and up to 1.5TB of DDR5 memory (AS -2126HS-TN).
  • 1U cloud server with a single AMD EPYC 9005 Series processor (AS -1116CS-TN)

Supermicro has also updated JumpStart systems with its 1U E3.S all-Flash storage systems powered by a single AMD EPYC processor, so you can also test-drive the latest PCIe drives. Also, several of Supermicro’s H13 AMD-powered are available for remote access on JumpStart, as well.

How It Works

Getting started with JumpStart is easy:

Step 1: On the main JumpStart page, browse the available systems, then click the “get access” or “request access” button for the system you want to try. Then select your preferred system and time slot.

Step 2: Sign in. You can either login with your Supermicro single sign-on (SSO) account or create a new free account. Supermicro will then qualify your account and reach out with further instructions.

Step 3: When your chosen time arrives, secure access to your system. Most JumpStart sessions last for one week. If you need more time, that can often be negotiated with your Supermicro sales reps.

It's that simple.

Once you’re connected to a server via JumpStart, you can have up to three sessions open: one VNC (virtual network computing), one SSH (secure shell), and one IPMI (intelligent platform management interface).

JumpStart also protects your privacy. After your JumpStart trial is completed, the server and storage devices are manually erased. In addition, the BIOS and firmware are reflashed, and the operating system is re-installed with new credentials.

More protection is offered, too. A jump server is used as a proxy. This means that the server you’re testing can use the internet to get files, but it is not directly addressable via the internet.

That said, it’s recommended that you do not use the test servers for processing sensitive or confidential data. Instead, Supermicro advises the use of anonymized data only—mainly because the servers may follow security policies that differ from your own.

So what are you waiting for? Try out JumpStart and get free remote access to Supermicro’s cutting-edge servers powered by the latest AMD CPUs and GPUs.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Tech Explainer: What is the AMD “Zen” core architecture?

Featured content

Tech Explainer: What is the AMD “Zen” core architecture?

Originally launched in 2017, this CPU architecture now delivers high performance and efficiency with ever-thinner processes.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The recent release of AMD’s 5th generation processors—formerly codenamed Turin—also heralded the introduction of the company’s “Zen 5” core architecture.

“Zen” is AMD’s name for a design ethos that prioritizes performance, scalability and efficiency. As any CTO will tell you, these 3 aspects are crucial for success in today’s AI era.

AMD originally introduced its “Zen” architecture in 2017 as part of a broader campaign to steal market share and establish dominance in the all-important enterprise IT space.

Subsequent generations of the “Zen” design have markedly increased performance and efficiency while delivering ever-thinner manufacturing processes.

Now and Zen

Since the “Zen” core’s original appearance in AMD Ryzen 1000-series processors, the architecture’s design philosophy has maintained its focus on a handful of vital aspects. They include:

  • A modular design. Known as Infinity Fabric, it facilitates efficient connectivity among multiple CPU cores and other components. This modular architecture enhances scalability and performance, both of which are vital for modern enterprise IT infrastructure.
  • High core counts and multithreading. Both are common to EPYC and Ryzen CPUs built using the AMD “Zen” core architecture. Simultaneous multithreading enables each core to process 2 threads. In the case of EPYC processors, this makes AMD’s CPUs ideal for multithreaded workloads that include Generative AI, machine learning, HPC and Big Data.
  • Advanced manufacturing processes. These allow faster, more efficient communication among individual CPU components, including multithreaded cores and multilevel caches. Back in 2017, the original “Zen” architecture was manufactured using a 14-nanometer (nm) process. Today’s new “Zen 5” and “Zen 5c” architectures (more on these below) reduce the lithography to just 4nm and 3nm, respectively.
  • Enhanced efficiency. This enables IT staff to better manage complex enterprise IT infrastructure. Reducing heat and power consumption is crucial, too, both in data centers and at the edge. The AMD “Zen” architecture makes this possible by offering enterprise-grade EPYC processors that offer up to 192 cores, yet require a maximum thermal design power (TDP) of only 500W.

The Two-Fold Path

The latest, fifth generation “Zen” architecture is divided into two segments: “Zen 5” and “Zen 5c.”

“Zen 5” employs a 4-nanometer (nm) manufacturing process to deliver up to 128 cores operating at up to 4.1GHz. It’s optimized for high per-core performance.

“Zen 5c,” by contrast, offers a 3nm lithography that’s reserved for AMD EPYC 96xx, 97xx, 98xx, and 99xx series processors. It’s optimized for high density and power efficiency.

The most powerful of these CPUs—the AMD EPYC 9965—includes an astonishing 192 cores, a maximum boost clock speed of 3.7GHz, and an L3 cache of 384MB.

Both “Zen 5” and “Zen 5c” are key components of the 5th gen AMD EPYC processors introduced earlier this month. Both have also been designed to achieve double-digit increases in instructions per clock cycle (IPC) and equip the core with the kinds of data handling and processing power required by new AI workloads.

Supermicro’s Satori

AMD isn’t the only brand offering bold, new tech to harried enterprise IT managers.

Supermicro recently introduced its new H14 servers, GPU-accelerated systems and storage servers powered by AMD EPYC 9005 Series processors and AMD Instinct MI325X Accelerators. A number of these servers also support the new AMD “Turin” CPUs.

The new product line features updated versions of Supermicro’s vaunted Hyper system, Twin multinode servers, and AI-inferencing GPU systems. All are now available with the user’s choice of either air or liquid cooling.

Supermicro says its collection of purpose-built powerhouses represents one of the industry’s most extensive server families. That should be welcome news for organizations intent on building a fleet of machines to meet the highly resource-intensive demands of modern AI workloads.

By designing its next-generation infrastructure around AMD 5th Generation components, Supermicro says it can dramatically increase efficiency by reducing customers’ total data-center footprints by at least two-thirds.

Enlightened IT for the AI Era

While AMD and Supermicro’s advances represent today’s cutting-edge technology, tomorrow is another story entirely.

Keeping up with customer demand and the dizzying pace of AI-based innovation means these tech giants will soon return with more announcements, tools and design methodologies. AMD has already promised a new accelerator, the AMD Instinct MI350, will be formally announced in the second half of 2025.

As far as enterprise CTOs are concerned, the sooner, the better. To survive and thrive amid heavy competition, they’ll need an evolving array of next-generation technology. That will help them reduce their bottom lines even as they increase their product offerings—a kind of technological nirvana.

Do More:

Watch a related video: 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Do your customers need more room for AI? AMD has an answer

Featured content

Do your customers need more room for AI? AMD has an answer

If your customers are looking to add AI to already-crowded, power-strapped data centers, AMD is here to help. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

How can your customers make room for AI in data centers that are already full?

It’s a question that’s far from academic. Nine in 10 tech vendors surveyed recently by the Uptime Institute expect AI to be widely used in data centers in the next 5 years.

Yet data center space is both hard to find and costly to rent. Vacancy rates have hit new lows, according to real-estate services firm CBRE Group.

Worse, this combination of supply shortages and high demand is driving up data center pricing and rents. Across North America, CRBE says, pricing is up by 20% year-on-year.

Getting enough electric power is an issue, too. Some utilities have told prospective data-center customers they won’t get the power they requested until the next decade, reports The Wall Street Journal. In other cases, strapped utilities are simply giving customers less power than they asked for.

So how to help your customers get their data centers ready for AI? AMD has some answers. And a free software tool to help.

The AMD Solution

AMD’s solution is simple, with just 2 points:

  • Make the most of existing data-center real estate and power by consolidating existing workloads.
  • Replace the low-density compute of older, inefficient and out-of-warranty systems with compute that’s newer, denser and more efficient.

AMD is making the case that your customers can do both by moving from older Intel-based systems to newer ones that are AMD-based.

For example, the company says, replacing servers based on Intel Xeon 6143 Sky Lake processors with those based on AMD EPYC 9334 CPUs can result in the need for 73% fewer servers, 70% fewer racks and 69% less power.

That could include Supermicro servers powered by AMD EPYC processors. Supermicro H13 servers using AMD EPYC 9004 Series processors offer capabilities for high-performance data centers.

AMD hasn’t yet done comparisons with either its new 5th gen EPYC processors (introduced last week) or Intel’s 86xx CPUs. But the company says the results should be similar.

Consolidating processor-based servers can also make room in your customers’ racks for AMD Instinct MI300 Series accelerators designed specifically for AI and HPC workloads.

For example, if your customer has older servers based on Intel Xeon Cascade Lake processors, migrating them to servers based on AMD EPYC 9754 processors instead can gain them as much as a 5-to-1 consolidation.

The result? Enough power and room to accommodate a new AI platform.

Questions Answered

Simple doesn’t always mean easy. And you and your customers may have concerns.

For example, isn’t switching from one vendor to another difficult?

No, says AMD. The company cross-licenses the X86 instruction set, so on its processors, most workloads and applications will just work.

What about all those cores on AMD processors? Won’t they raise a customer’s failure domain too high?

No, says AMD. Its CPUs are scalable enough to handle any failure domain from 8 to 256 cores per server.

Wouldn’t moving require a cold migration? And if so, wouldn’t that disrupt the customer’s business?

Again, AMD says no. While moving virtual machines (VMs) to a new architecture does require a cold migration, the job can be done without any application downtime.

That’s especially true if you use AMD’s free open-source tool known as VAMT, short for VMware Architecture Migration Tool. VAMT automates cold migration. In one AMD test, it migrated hundreds of VMs in just an hour.

So if your customers among those struggling to find room for AI systems in their already-crowded and power-strapped data centers, tell them consider a move to AMD.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

Featured content

The AMD Instinct MI300X Accelerator draws top marks from leading AI benchmark

In the latest MLPerf testing, the AMD Instinct MI300X Accelerator with ROCm software stack beat the competition with strong GenAI inference performance. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

New benchmarks using the AMD Instinct MI300X Accelerator show impressive performance that surpasses the competition.

This is great news for customers operating demanding AI workloads, especially those underpinned by large language models (LLMs) that require super-low latency.

Initial platform tests using MLPerf Inference v4.1 measured AMD’s flagship accelerator against the Llama 2 70B benchmark. This test is an indication for real-world applications, including natural language processing (NLP) and large-scale inferencing.

MLPerf is the industry’s leading benchmarking suite for measuring the performance of machine learning and AI workloads from domains that include vision, speech and NLP. It offers a set of open-source AI benchmarks, including rigorous tests focused on Generative AI and LLMs.

Gaining high marks from the MLPerf Inference benchmarking suite represents a significant milestone for AMD. It positions the AMD Instinct MI300X accelerator as a go-to solution for enterprise-level AI workloads.

Superior Instincts

The results of the LLaMA2-70B test are particularly significant. That’s due to the benchmark’s ability to produce an apples-to-apples comparison of competitive solutions.

In this benchmark, the AMD Instinct MI300X was compared with NVIDIA’s H100 Tensor Core GPU. The test concluded that AMD’s full-stack inference platform was better than the H100 at achieving high-performance LLMs, a workload that requires both robust parallel computing and a well-optimized software stack.

The testing also showed that because the AMD Instinct MI300X offers the largest GPU memory available—192GB of HBM3 memory—it was able to fit the entire LLaMA2-70B model into memory. Doing so helped to avoid network overhead by preventing model splitting. This, in turn, maximized inference throughput, producing superior results.

Software also played a big part in the success of the AMD Instinct series. The AMD ROCm software platform accompanies the AMD Instinct MI300X. This open software stack includes programming models, tools, compilers, libraries and runtimes for AI solution development on the AMD Instinct MI300 accelerator series and other AMD GPUs.

The testing showed that the scaling efficiency from a single AMD Instinct MI300X, combined with the ROCm software stack, to a complement of eight AMD Instinct accelerators was nearly linear. In other words, the system’s performance improved proportionally by adding more GPUs.

That test demonstrated the AMD Instinct MI300X’s ability to handle the largest MLPerf inference models to date, containing over 70 billion parameters.

Thinking Inside the Box

Benchmarking the AMD Instinct MI300X required AMD to create a complete hardware platform capable of addressing strenuous AI workloads. For this task, AMD engineers chose as their testbed the Supermicro AS -8125GS-TNMR2, a massive 8U complete system.

Supermicro’s GPU A+ Client Systems are designed for both versatility and redundancy. Designers can outfit the system with an impressive array of hardware, starting with two AMD EPYC 9004-series processors and up to 6TB of ECC DDR5 main memory.

Because AI workloads consume massive amounts of storage, Supermicro has also outfitted this 8U server with 12 front hot-swap 2.5-inch NVMe drive bays. There’s also the option to add four more drives via an additional storage controller.

The Supermicro AS -8125GS-TNMR2 also includes room for two hot-swap 2.5-inch SATA bays and two M.2 drives, each with a capacity of up to 3.84TB.

Power for all those components is delivered courtesy of six 3,000-watt redundant titanium-level power supplies.

Coming Soon: Even More AI power

AMD engineers continually push the limits of silicon and human ingenuity to expand the capabilities of their hardware. So it should come as little surprise that new iterations of the AMD Instinct series are expected to be released in the coming months. This past May, AMD officials said they plan to introduce AMD Instinct MI325, MI350 and MI400 accelerators.

Forthcoming Instinct accelerators, AMD says, will deliver advances including additional memory, support for lower-precision data types, and increased compute power.

New features are also coming to the AMD ROCm software stack. Those changes should include software enhancements including kernel improvements and advanced quantization support.

Are you customers looking for a high-powered, low-latency system to run their most demanding HPC and AI workloads? Tell them about these benchmarks and the AMD Instinct MI300X accelerators.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Featured content

Why Lamini offers LLM tuning software on Supermicro servers powered by AMD processors

Lamini, provider of an LLM platform for developers, turns to Supermicro’s high-performance servers powered by AMD CPUs and GPUs to run its new Memory Tuning stack.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI systems powered by large language models (LLMs) have a serious problem: Their answers can be inaccurate—and sometimes, in the case of AI “hallucinations,” even fictional.

For users, the challenge is equally serious: How do you get precise factual accuracy—that is, correct answers with zero hallucinations—while upholding the generalization capabilities that make LLMs so valuable?

A California-based company, Lamini, has come up with an innovative solution. And its software stack runs on Supermicro servers powered by AMD CPUs and GPUs.

Why Hallucinations Happen

Here’s the premise underlying Lamini’s solution: Hallucinations happen because the right answer is clustered with other, incorrect answers. As a result, the model doesn’t know that a nearly right answer is in fact wrong.

To address this issue, Lamini’s Memory Tuning solution teaches the model that getting the answer nearly right is the same as getting it completely wrong. Its software does this by tuning literally millions of expert adapters with precise facts on top of any open-source LLM, such as Llama 3 or Mistral 3.

The Lamini model retrieves only the most relevant experts from an index at inference time. The goal is high accuracy, high speed and low cost.

More than Fine-Tuning

Isn’t this just LLM fine-tuning? Lamini says no, its Memory Tuning is fundamentally different.

Fine-tuning can’t ensure that a model’s answers are faithful to the facts in its training data. By contrast, Lamini says, its solution has been designed to deliver output probabilities that are not just close, but exactly right.

More specifically, Lamini promises its solution can deliver 95% LLM accuracy with 10x fewer hallucinations.

In the real world, Lamini says one large customer used its solution and raised LLM accuracy from 50% to 95%, and reduced the rate of AI hallucinations from an unreliable 50% to just 5%.

Investors are certainly impressed. Earlier this year Lamini raised $25 million from an investment group that included Amplify Partners, Bernard Arnault and AMD Ventures. Lamini plans to use the funding to accelerate its expert AI development and expand its cloud infrastructure.

Supermicro Solution

As part of its push to offer superior LLM tuning, Lamini chose Supermicro’s GPU server — model number AS -8125S-TNMR2 — to train LLM models in a reasonable time.

This Supermicro 8U system is powered by dual AMD EPYC 9000 series CPUs and eight AMD Instinct MI300X GPUs.

The GPUs connect with CPUs via a standard PCIe 5 bus. This gives fast access when the CPU issues commands or sends data from host memory to the GPUs.

Lamini has also benefited from Supermicro’s capacity and quick delivery schedule. With other GPUs makers facing serious capacity issues, that’s an important benefit for both Lamini and its customers.

“We’re thrilled to be working with Supermicro,” says Lamini co-founder and CEO Sharon Zhou.

Could your customers be thrilled by Lamini, too? Check out the “do more” links below.

Do More:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Why CSPs Need Hyperscaling

Featured content

Why CSPs Need Hyperscaling

Today’s cloud service providers need IT infrastructures that can scale like never before.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Hyperscaling IT infrastructure may be one of the toughest challenges facing cloud service providers (CSPs) today.

The term hyperscale refers to an IT architecture’s ability to scale in response to increased demand.

Hyperscaling is tricky, in large part because demand is a constantly moving target. Without much warning, a data center’s IT demand can increase exponentially due to a myriad of factors.

That could mean a public emergency, the failure of another CSP’s infrastructure, or simply the rampant proliferation of data—a common feature of today’s AI environment.

To meet this growing demand, CSPs have a lot to manage. That includes storage measured in exabytes, AI workloads of massive complexity, and whatever hardware is needed to keep system uptime as close to 100% as possible.

The hardware alone can be a real challenge. CSPs now oversee both air- and liquid-powered cooling systems, redundant power sources, diverse networking gear, and miles of copper and fiber-optic cabling. It’s a real handful.

Design with CSPs in Mind

To help CSPs cope with this seemingly overwhelming complexity, Supermicro offers purpose-built hardware designed to tackle the world’s most demanding workloads.

Enterprise-class servers like Supermicro’s H13 and A+ server series offer CSPs powerful platforms built to handle the rigors of resource-intensive AI workloads. They’ve been designed to scale quickly and efficiently as demand and data inevitably increase.

Take the Supermicro GrandTwin. This innovative solution puts the power and flexibility of multiple independent servers in a single enclosure.

The design helps lower operating expenses by enabling shared resources, including a space-saving 2U enclosure, heavy-duty cooling system, backplane and N+1 power supplies.

To help CSPs tackle the world’s most demanding AI workloads, Supermicro offers GPU server systems. These include a massive—and massively powerful—8U eight-GPU server.

Supermicro H13 GPU servers are powered by 4th-generation AMD EPYC processors. These cutting-edge chips are engineered to help high-end applications perform better and return faster.

To make good on those lofty promises, AMD included more and faster cores, higher bandwidth to GPUs and other devices, and the ability to address vast amounts of memory.

Theory Put to Practice

Capable and reliable hardware is a vital component for every modern CSP, but it’s not the only one. IT infrastructure architects must consider not just their present data center requirements but how to build a bridge to the requirements they’ll face tomorrow.

To help build that bridge, Supermicro offers an invaluable list: 10 essential steps for scaling the CSP data center.

A few highlights include:

  • Standardize and scale: Supermicro suggests CSPs standardize around a preferred configuration that offers the best compute, storage and networking capabilities.
  • Plan ahead for support: To operate a sophisticated data center 24/7 is to embrace the inevitability of technical issues. IT managers can minimize disruption and downtime when some-thing goes wrong by choosing a support partner who can solve problems quickly and efficiently.
  • Simplify your supply chain: Hyperscaling means maintaining the ability to move new infra-structure into place fast and without disruption. CSPs can stack the odds in their favor by choosing a partner that is ever ready to deliver solutions that are integrated, validated, and ready to work on day one.

Do More:

Hyperscaling for CSPs will be the focus of a session at the upcoming Supermicro Open Storage Summit ‘24, which streams live Aug. 13 - Aug. 29.

The CSP session, set for Aug. 20, will cover the ways in which CSPs can seamlessly scale their AI operations across thousands of GPUs while ensuring industry-leading reliability, security and compliance capabilities. The speakers will feature representatives from Supermicro, AMD, Vast Data and Solidigm.

Learn more and register now to attend the 2024 Supermicro Open Storage Summit.

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Pages