Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

Understanding the New Core Architecture of the AMD EPYC 9004 Series Processors

Featured content

Understanding the New Core Architecture of the AMD EPYC 9004 Series Processors

AMD’s announcement of its fourth generation EPYC 9004 Series processors includes major advances in how these chipsets are designed and produced. Part 2 of 4.

Learn More about this topic
  • Applications:
  • Featured Technologies:
AMD’s announcement of its fourth generation EPYC 9004 Series processors includes major advances in how these chipsets are designed and produced for delivering the highest performance levels. These advances involve using a hybrid multi-die architecture.
 
This architecture makes use of two different production processes for cores and I/O pathways. The former makes use of five nanometer dies, while the latter uses six nanometer dies. Each processor package can have up to 12 CPU dies, each with eight 8 cores for a total of 96 cores in the maximum configuration. Each eight-core assembly has its own set of eight 8 dedicated 1 MB L2 caches, and the overall assembly can access a shared 32 MB L3 cache, as shown in the diagram below.
 
32 MB L3 cache image
 
 
 
 
 
 
 
 
 
 
 
In addition to these changes, AMD announced improvements called Zen 4 that involve boosting instructions-per-clock counts and overall clock- speed increases. AMD promises roughly 29 percent faster single-core CPU performance in Zen 4 relative to Zen 3, which were affirmed with Ars Technica’s tests earlier this fall. (Zen 3 chips used the older seven 7 nanometer dies.)
 
 
This configuration provides a great deal of flexibility in how the CPU, memory channels, and I/O paths are arranged. The multi-die setup can reduce fabrication waste and offer better parallel processing support. In addition, AMD EPYC processors are produced in single and dual socket configurations, with the latter offering more I/O pathways and dedicated PCIe generation 5 I/O connections.
 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

AMD Announces Fourth-Generation EPYC™ CPUs with the 9004 Series Processors

Featured content

AMD Announces Fourth-Generation EPYC™ CPUs with the 9004 Series Processors

AMD announces its fourth-generation EPYC™ CPUs. The new EPYC 9004 Series processors demonstrate advances in hybrid, multi-die architecture by decoupling core and I/O processes. Part 1 of 4.

Learn More about this topic
  • Applications:
  • Featured Technologies:
AMD very recently announced its fourth-generation EPYC™ CPUs.This generation will provide innovative solutions that can satisfy the most demanding performance-intensive computing requirements for cloud computing, AI and highly parallelized data analytic applications. The design decisions AMD made on this processor generation strirke a good balance among specificaitons, including higher CPU power and I/O performance, latency reductions and improvements in overall data throughput. This lets a single CPU socket address an increasingly larger world of complex workloads. 
 
The new AMD EPYC™ 9004 Series processors demonstrate advances in hybrid, multi-die architecture by decoupling core and I/O processes. The new chip dies support 12 DDR5 memory channels, doubling the I/O throughput of previous generations. The new CPUs also increase core counts from 64 cores in the previous EPYC 7003 chips to 96 cores in the new chips using 5-nanometer processes. The new generation of chips also increases the maximum memory capacity from 4TB of DDR4-3200 to 6TB of DDR5-4800 memory.
 
 
 
There are three major innovations evident in the AMD EPYC™ 9004 processor series:
  1. A  new hybrid multi-die chip architecture coupled with multi-processor server innovations and a new and more advanced Zen 4 instruction set along with support for an increase in dedicated L2 and shared L3 cache storage
  2. Security enhancements to AMD’s Infinity Guard
  3. Advances to system-on-chip designs that extend and enhance AMD Infinity switching fabric technology,
Taken together, the new AMD EPYC™ 9004 series processors can offer plenty of innovation and performance advantage. The new processors offer better performance per watt of power consumed and better per core performance, too.
 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Are Your App Workloads Running in Parallel?

Featured content

Are Your App Workloads Running in Parallel?

Learn More about this topic
  • Applications:

To be effective at delivering performance-intensive applications, it pays to split up your workloads and run them simultaneously, a.k.a., in parallel. In the past, we didn’t really think about the resources required to run workloads, because many business computers were all-purpose machines. There was also a tendency to run loads serially to avoid bogging down due to heavy CPU utilization, heavy I/0 and so on.

 

But computers have become much more capable of late. What were once thought of as “desktop” computers have approached the arena once occupied by minicomputers and mainframes. Like the larger systems, they serve multiple concurrent users and higher-demanding applications. As a result, we need to think more carefully about how their various components – processor, memory, storage and network connections – interact, find and eliminate the bottlenecks between these components to make them useful for higher-end workloads.
 

Straighten out Bottlenecks


One way to eliminate bottlenecks is to break your apps into smaller, more digestible pieces that can run concurrently. As the new processors employ more cores and more sophisticated components, this means that more of your code can be consumed by the entire CPU package. This is the inherent nature of parallel processing, and why the world’s fastest supercomputers now routinely span thousands (and some in the millions) of cores.


A company called Weka has developed a file system designed to provide higher-speed data ingestion and more appropriate for machine learning and advanced mathematical modeling applications. Understanding the particular type of data storage – whether it is a parallel file system such as Weka, more scratch space for computations or better backups – can make a big difference in overall performance.


But it is also important how your apps work across the network. Is there a lot of back-and-forth between clients and servers, or sending a small chunk of data and waiting for a reply? This introduces a lot of downtime for the app, and these “wait states” should be identified and potentially eliminated.
 

Offload Workloads


Does your application do a lot of calculation? As discussed in an earlier story appearing on Performance-Intensive Computing, complementary processors, such as co-processors and GPUs, can be a big performance boost so long the processor can move on to its next task, working in parallel, instead of waiting for data returned from the offloaded computation.

 

Working in parallel can be a challenge when your apps frequently pause to wait for data from another process or are highly monolithic designed to run in a serial fashion. Such apps may be challenging to rewrite to take advantage cloud native or parallel operations. At some point, you are going to have to make that break and put in the programming effort to modernize your apps, but only you or your company can decide when it’s right to do that.

 

But if you can modify your workloads for this parallel structure and your hardware was designed to support it, you will see big benefits.

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Unlocking the Value of the Cloud for Mid-size Enterprises

Featured content

Unlocking the Value of the Cloud for Mid-size Enterprises

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • Microsoft Azure

Organizations around the world are requiring new options for their next-generation computing environments. Mid-size organizations, in particular, are facing increasing pressure to deliver cost-effective, high-performance solutions within their hyperconverged infrastructures (HCI). Recent collaboration between Supermicro, Microsoft Azure and AMD, leveraging their collective technologies, has created a fresh approach that lets enterprises maintain performance at a lower operational cost while helping to reduce the organization’s carbon footprint in support of sustainability initiatives. This cost-effective, 1U system (a 2U version is available) offers both power, flexibility and modularity in large-scale GPU deployments.

The results of the collaboration combine the latest technologies, supporting multiple CPU, GPU, storage and networking options optimized to deliver uniquely configured and highly scalable systems. The product can be optimized for SQL and Oracle databases, VDI, productivity applications and database analytics. This white paper explores why this universal GPU architecture is an intriguing and cost-effective option for CTOs and IT administrators who are planning to rapidly implement hybrid cloud, data center modernization, branch office/edge networking or Kubernetes deployments at scale.

Get the 7-page white paper that provides the detail to assess the solution for yourself, including the new Azure Stack HCI certified system, specifications, cost justification and more.

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Register to Watch Supermicro's Sweeping A+ Launch Event on Nov. 10

Featured content

Register to Watch Supermicro's Sweeping A+ Launch Event on Nov. 10

Join Supermicro online Nov. 10th to watch the unveiling of the company’s new A+ systems -- featuring next-generation AMD EPYC™ processors. They can't tell us any more right now. But you can register for a link to the event by scrolling down and signing-up on this page.
Learn More about this topic
  • Applications:
  • Featured Technologies:

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Energy-Efficient AMD EPYC™ Processors Bring Significant Savings

Featured content

Energy-Efficient AMD EPYC™ Processors Bring Significant Savings

Cut electricity consumption by up to half with AMD's power-saviing EPYC™ processors.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • Ateme, DBS, Nokia

Nokia was able to target up to a 40% reduction in server power consumption using EPYC. DBS and Ateme each experienced a 50% drop in energy costs. AMD’s EPYC™ processors can provide big energy-saving benefits, so you can meet your most demanding application performance requirements and still provide planetary and environmental efficiencies.

For example: To provide a collection of 1,200 virtual machines, AMD would require 10 servers compared to 15 for those built using equivalent Intel CPUs. This translates into a 41% lower total cost of ownership over a three-year period, with a third less energy consumption, saving on carbon emissions too. For deep detail and links to case studies by the companies mentioned above. Find out how they  saved significantly on energy-costs while reducing their carbon footprints, check out the infographic.

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

The Perfect Combination: The Weka Next-Gen File System, Supermicro A+ Servers and AMD EPYC™ CPUs

Featured content

The Perfect Combination: The Weka Next-Gen File System, Supermicro A+ Servers and AMD EPYC™ CPUs

Weka’s file system, WekaFS, unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • Weka.io

One of the challenges of building machine learning (ML) models is managing data. Your infrastructure must be able to process very large data sets rapidly as well as ingest both structured and unstructured data from a wide variety of sources.

 

That kind of data is typically generated in performance-intensive computing areas like GPU-accelerated applications, structural biology and digital simulations. Such applications typically have three problems: how to efficiently fill a data pipeline, how to easily integrate data across systems and how to manage rapid changes in data storage requirements. That’s where Weka.io comes into play, providing higher-speed data ingestion and avoiding unnecessary copies of your data while making it available across the entire ML modeling space.

 

Weka’s file system, WekaFS, has been developed just for this purpose. It unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory. It works across both on-premises and cloud storage repositories and is optimized for cloud-intensive storage so that it will provide the lowest possible network latencies and highest performance.

 

This next-generation data storage file system has several other advantages: it is easy to deploy, entirely software-based, plus it is a storage solution that provides all-flash level performance, NAS simplicity and manageability, cloud scalability and breakthrough economics. It was designed to run on any standard x86-based server hardware and commodity SSDs or run natively in the public cloud, such as AWS.

 

Weka’s file system is designed to scale to hundreds of petabytes, thousands of compute instances and billions of files. Read and write latency for file operations against active data is as low as 200 microseconds in some instances.

 

Supermicro has produced its own NVMe Reference Architecture that supports WekaFS on some of its servers, including the Supermicro A+ AS-1114S-WN10RT and AS-2114S-WN24RT using the AMD EPYC™ 7402P processors with at least 2TB of memory, expandable to 4TB. Both servers support hot-swappable NVMe storage modules for ultimate performance. Also check out the Supermicro WekaFS A/I and HPC Solution Bundle.

 

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Microsoft Azure’s More Capable Compute Instances Take Advantage of the Latest AMD EPYC™ Processors

Featured content

Microsoft Azure’s More Capable Compute Instances Take Advantage of the Latest AMD EPYC™ Processors

Azure HBv3 series virtual machines (VMs) are optimized for HPC applications, such as fluid dynamics, explicit and implicit finite element analysis, weather modeling, seismic processing, and various simulation tasks. HBv3 VMs feature up to 120 Third-Generation AMD EPYC™ 7v73X-series CPU cores with more than 450 GB of RAM.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • Azure

Increasing demands for higher-performance computing mean that the cloud-based computing needs to ratchet up its performance too. Microsoft Azure has introduced more capable compute virtual machines (VMs) that take advantage of the latest from AMD EPYC™ processors. This means that developers can easily spin up VMs that normally cost thousands of dollars if they were to purchase their physical equivalents.

 

This story's focus is on two of Azure's series: HBv3 and NVv4. In most cases, a single virtual machine is used to take advantage of all its resources. High-performance examples of Azure HBv3 series VMs are optimized for HPC applications, such as fluid dynamics, explicit and implicit finite element analysis, weather modeling, seismic processing, and various simulation tasks. HBv3 VMs feature up to 120 Third-Generation AMD EPYC™ 7v73X-series CPU cores with more than 450 GB of RAM. This series of VMs has processor clock frequencies up to 3.5GHz. All HBv3-series VMs feature 200Gb/sec HDR InfiniBand switches to enable supercomputer-scale HPC workloads. The VMs are connected and optimized to deliver the most consistent performance. Get more information about AMD EPYC and Microsoft Azure virtual machines.

 

A Dutch construction company, TBI, is using the Azure NVv4 to run computer-aided design and building modeling tasks on a series of virtual Windows desktops. The NVv4 VMs are only available running Windows powered by from four to 32 AMD EPYC™ vCPUs and offering a partial to full AMD Instinct™ M125 GPU with memory ranging from 2GB to 17GB. Previous generations of NV instances used Intel CPUs and NVIDIA GPUs that offer less performance.

 

TBI chose this solution because it was cheaper, easier to support and keep its software collection updated. Using virtual desktops meant that no client data was stored on any laptops, making things more secure. Also, these instances delivered equivalent performance, taking advantage of the SR-IOV technology.

 

Supermicro offers a wide range of servers that incorporate the AMD EPYC™ CPU and a number of servers optimized for applications that use GPUs. These servers range from 1U rackmount servers to high end 4U GPU optimized systems. Whether you’re using it on-prem or you’re building your own cloud, Supermicro’s Aplus servers are optimized for performance and technical computing applications and they run Azure and other systems well. Get more information about Supermicro servers with AMD’s EPYC™ CPUs.

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Supermicro SuperBlades®: Designed to Power Through Distributed AI/ML Training Models

Featured content

Supermicro SuperBlades®: Designed to Power Through Distributed AI/ML Training Models

Running heavy AI/ML workloads can be a challenge for any server, but the SuperBlade has extremely fast networking options, upgradability, the ability to run two AMD EPYC™ 7000-series 64-core processors and the Horovod open-source framework for scaling deep-learning training across multiple GPUs.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Running the largest artificial intelligence (AI) and machine learning (ML) workloads is a job for the higher-performing systems. Such loads are often tough for even more capable machines. Supermicro’s SuperBlade combines blades using AMD EPYC™ CPUs with competing GPUs into a single rack-mounted enclosure (such as the Supermicro SBE-820H-822). That leverages an extremely fast networking architecture for these demanding applications that need to communicate with other servers to complete a task.

 

The Supermicro SuperBlade fits everything into an 8U chassis that can host up to 20 individual servers. This means a single chassis can be divided into separate training and model processing jobs. The components are key: servers can take advantage of the 200G HDR InfiniBand network switch without losing any performance. Think of this as delivering a cloud-in-a-box, providing both easier management of the cluster along with higher performance and lower latencies.

 

The Supermicro SuperBlade is also designed as a disaggregated server, meaning that components can be upgraded with newer and more efficient CPUs or memory as technology progresses. This feature significantly reduces E-waste.


The SuperBlade line supports a wide selection of various configurations, including both CPU-only and mixed CPU/GPU models, such as the SBA-4119SG, which comes with up to two AMD EPYC™ 7000-series 64-core CPUs. These components are delivered on blades that can easily slide right in. Plus, they slide out as easily when you need to replace the blades or the enclosure. The SuperBlade servers support a wide network selection as well, ranging from 10G to 200G Ethernet connections.

 

The SuperBlade employs the Horovod distributed model-training, message-passing interface to let multiple ML sessions run in parallel, maximizing performance. In a sample test of two SuperBlade nodes, the solution was able to process 3,622 GoogleNet images/second, and eight nodes were able to scale up to 13,475 GoogleNet images/second.


As you can see, Supermicro’s SuperBlade improves performance-intensive computing and boosts AI and ML use cases, enabling larger models and data workloads. The combined solution enables higher operational efficiency to automatically streamline processes, monitor for potential breakdowns, apply fixes, more efficiently facilitate the flow of accurate and actionable data and scale up training across multiple nodes.

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Supermicro and Qumulo Deliver High-Performance File Data Management Solution

Featured content

Supermicro and Qumulo Deliver High-Performance File Data Management Solution

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • Qumulo

One of the issues that’s key to delivering higher-performing computing solutions is something that predates the PC itself: managing distributed file systems. The challenge becomes more acute when the applications involve manipulating large quantities of data. The tricky part is in how they scale to support these data collections, which might consist of video security footage, life sciences data collections and other research projects.

 

Storage systems from Qumulo integrate well into a variety of existing environments, such as those involving multiple storage protocols and file systems. The company supports a wide variety of use cases that allow for scaling up and out to handle Petabyte data quantities. Qumulo can run at both the network edge, in the data center and on various cloud environments. Their systems run on Supermicro’s all non-volatile memory express (NVMe) platform, the highest performing protocol designed for manipulating data stored on SSD drives. The servers are built on 24-core 2.8 GHz AMD EPYC™ processors.


 

Qumulo provides built-in near real-time data analytics that let IT administrators predict storage trends and better manage storage capacity so that they can proactively plan and optimize workflows.

 

The product handles seamless file and object data storage, is hardware agnostic, and supports single data namespace and burstable computing running on the three major cloud providers (AWS, Google and Azure) with nearly instant data replication. Its distributed file system is designed to handle billions of files and works equally well on both small and large file sizes.

 

Qumulo also works on storage clusters, such as those created with Supermicro AS-1114S servers, which can accommodate up to 150TB per storage node. Qumulo Shift for Amazon S3 is a feature that lets users copy data to the Amazon S3 native format for easy access to AWS services if the required services are not available in an on-prem data center. 

For more information, see the white paper on the Supermicro and Qumulo High-Performance File Data Management and Distributed Storage solution, powered by AMD EPYC™ processors.

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Pages