Capture the full potential of IT
Cut electricity consumption by up to half with AMD's power-saviing EPYC™ processors.
Nokia was able to target up to a 40% reduction in server power consumption using EPYC. DBS and Ateme each experienced a 50% drop in energy costs. AMD’s EPYC™ processors can provide big energy-saving benefits, so you can meet your most demanding application performance requirements and still provide planetary and environmental efficiencies.
For example: To provide a collection of 1,200 virtual machines, AMD would require 10 servers compared to 15 for those built using equivalent Intel CPUs. This translates into a 41% lower total cost of ownership over a three-year period, with a third less energy consumption, saving on carbon emissions too. For deep detail and links to case studies by the companies mentioned above. Find out how they saved significantly on energy-costs while reducing their carbon footprints, check out the infographic.
Weka’s file system, WekaFS, unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory.
One of the challenges of building machine learning (ML) models is managing data. Your infrastructure must be able to process very large data sets rapidly as well as ingest both structured and unstructured data from a wide variety of sources.
That kind of data is typically generated in performance-intensive computing areas like GPU-accelerated applications, structural biology and digital simulations. Such applications typically have three problems: how to efficiently fill a data pipeline, how to easily integrate data across systems and how to manage rapid changes in data storage requirements. That’s where Weka.io comes into play, providing higher-speed data ingestion and avoiding unnecessary copies of your data while making it available across the entire ML modeling space.
Weka’s file system, WekaFS, has been developed just for this purpose. It unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory. It works across both on-premises and cloud storage repositories and is optimized for cloud-intensive storage so that it will provide the lowest possible network latencies and highest performance.
This next-generation data storage file system has several other advantages: it is easy to deploy, entirely software-based, plus it is a storage solution that provides all-flash level performance, NAS simplicity and manageability, cloud scalability and breakthrough economics. It was designed to run on any standard x86-based server hardware and commodity SSDs or run natively in the public cloud, such as AWS.
Weka’s file system is designed to scale to hundreds of petabytes, thousands of compute instances and billions of files. Read and write latency for file operations against active data is as low as 200 microseconds in some instances.
Supermicro has produced its own NVMe Reference Architecture that supports WekaFS on some of its servers, including the Supermicro A+ AS-1114S-WN10RT and AS-2114S-WN24RT using the AMD EPYC™ 7402P processors with at least 2TB of memory, expandable to 4TB. Both servers support hot-swappable NVMe storage modules for ultimate performance. Also check out the Supermicro WekaFS A/I and HPC Solution Bundle.
Running heavy AI/ML workloads can be a challenge for any server, but the SuperBlade has extremely fast networking options, upgradability, the ability to run two AMD EPYC™ 7000-series 64-core processors and the Horovod open-source framework for scaling deep-learning training across multiple GPUs.
Running the largest artificial intelligence (AI) and machine learning (ML) workloads is a job for the higher-performing systems. Such loads are often tough for even more capable machines. Supermicro’s SuperBlade combines blades using AMD EPYC™ CPUs with competing GPUs into a single rack-mounted enclosure (such as the Supermicro SBE-820H-822). That leverages an extremely fast networking architecture for these demanding applications that need to communicate with other servers to complete a task.
The Supermicro SuperBlade fits everything into an 8U chassis that can host up to 20 individual servers. This means a single chassis can be divided into separate training and model processing jobs. The components are key: servers can take advantage of the 200G HDR InfiniBand network switch without losing any performance. Think of this as delivering a cloud-in-a-box, providing both easier management of the cluster along with higher performance and lower latencies.
The Supermicro SuperBlade is also designed as a disaggregated server, meaning that components can be upgraded with newer and more efficient CPUs or memory as technology progresses. This feature significantly reduces E-waste.
The SuperBlade line supports a wide selection of various configurations, including both CPU-only and mixed CPU/GPU models, such as the SBA-4119SG, which comes with up to two AMD EPYC™ 7000-series 64-core CPUs. These components are delivered on blades that can easily slide right in. Plus, they slide out as easily when you need to replace the blades or the enclosure. The SuperBlade servers support a wide network selection as well, ranging from 10G to 200G Ethernet connections.
The SuperBlade employs the Horovod distributed model-training, message-passing interface to let multiple ML sessions run in parallel, maximizing performance. In a sample test of two SuperBlade nodes, the solution was able to process 3,622 GoogleNet images/second, and eight nodes were able to scale up to 13,475 GoogleNet images/second.
As you can see, Supermicro’s SuperBlade improves performance-intensive computing and boosts AI and ML use cases, enabling larger models and data workloads. The combined solution enables higher operational efficiency to automatically streamline processes, monitor for potential breakdowns, apply fixes, more efficiently facilitate the flow of accurate and actionable data and scale up training across multiple nodes.
The AMD Threadripper™ CPU may be a desktop processor, but desktop computing was never like this. The new chipset comes in a variety of multi-core versions, with a maximum of 64 cores running up to 128 threads, 256MB of L3 cache and 2TB of DDR 8-channel memory. The newest Threadrippers are built with AMD’s latest 7 nanometer dies.
Content creators, designers, video animators and digital FX experts make much higher demands of their digital workstations than typical PC users. These disciplines often make use of heavily threaded applications such as Adobe After Effects, Unreal Engine or CAD apps such as Autodesk. What is needed is a corresponding increase in computing power to handle these applications.
That’s where one solution comes in handy for this type of power user: the AMD Ryzen Threadripper™ CPU, which now has a PRO 5000 update. One advantage of these newer chipsets is that they can fit on the same WRX80 motherboards that supported the earlier Threadripper series. There are other configurations, including the ProMagix HD150 workstation sold by Velocity Micro. The solution provider is looking at testing overclocking on both the MSI and Asrock motherboards that they will include in their HD150 workstations. That’s right, a chip that’s designed from the get-go to be overclocked. Benchmarks using sample apps (mentioned above) ran about twice as fast as on competitors’ less-capable hardware. (Supermicro offers the MI2SWA-TF motherboard with the Threadripper chipset.)
Desktop Was Never Like This
The AMD Threadripper™ CPU may be a desktop processor, but desktop computing was never like this. The new chipset comes in a variety of multi-core versions, with a maximum of 64 cores running up to 128 threads, 256MB of L3 cache and 2TB of DDR 8-channel memory. The newest Threadrippers are built with AMD’s latest 7 nanometer dies.
The Threadripper CPUs are not just fast but come with several built-in security features, including support for Zen 3 and Shadow Stack. Zen 3 is the overall name for a series of improvements to the AMD higher-end CPU line that have shown a 19% improvement in instructions per clock. And they have lower latency for double the cache delivery when compared to the earlier Zen 2 architecture chips.
These processors also support Microsoft’s Hardware-enforced Stack Protection to help detect and thwart control-flow attacks by checking the normal program stack against a secured hardware-stored copy. This helps to boot securely, protect the computer from firmware vulnerabilities, shield the operating system from attacks, and prevent unauthorized access to devices and data with advanced access controls and authentication systems.
One of the issues that’s key to delivering higher-performing computing solutions is something that predates the PC itself: managing distributed file systems. The challenge becomes more acute when the applications involve manipulating large quantities of data. The tricky part is in how they scale to support these data collections, which might consist of video security footage, life sciences data collections and other research projects.
Storage systems from Qumulo integrate well into a variety of existing environments, such as those involving multiple storage protocols and file systems. The company supports a wide variety of use cases that allow for scaling up and out to handle Petabyte data quantities. Qumulo can run at both the network edge, in the data center and on various cloud environments. Their systems run on Supermicro’s all non-volatile memory express (NVMe) platform, the highest performing protocol designed for manipulating data stored on SSD drives. The servers are built on 24-core 2.8 GHz AMD EPYC™ processors.
Qumulo provides built-in near real-time data analytics that let IT administrators predict storage trends and better manage storage capacity so that they can proactively plan and optimize workflows.
The product handles seamless file and object data storage, is hardware agnostic, and supports single data namespace and burstable computing running on the three major cloud providers (AWS, Google and Azure) with nearly instant data replication. Its distributed file system is designed to handle billions of files and works equally well on both small and large file sizes.
Qumulo also works on storage clusters, such as those created with Supermicro AS-1114S servers, which can accommodate up to 150TB per storage node. Qumulo Shift for Amazon S3 is a feature that lets users copy data to the Amazon S3 native format for easy access to AWS services if the required services are not available in an on-prem data center.
For more information, see the white paper on the Supermicro and Qumulo High-Performance File Data Management and Distributed Storage solution, powered by AMD EPYC™ processors.
The Supermicro SuperBlade's advantage for the Red Hat OCP environment is that it supports a higher-density infrastructure and lower-latency network configuration, along with benefits from reduced cabling, power and shared cooling features. SuperBlades feature multiple AMD EPYC™ processors using fast DDR4 3200MHz memory modules.
Red Hat’s OpenShift Container Platform (OCP) provides enterprise Kubernetes-bundled devops pipelines. It automates builds and container deployments and lets developers focus on application logic while leveraging best-of-class enterprise infrastructure.
OpenShift supports a broad range of programming languages, web frameworks, databases, connectors to mobile devices and external back ends. OCP supports cloud-native, stateless applications and traditional applications. Because of its flexibility and utility in running advanced applications, OCP has become one of the go-to places that support high-performance computing.
Red Hat’s OCP comes in several deployment packages, including as a managed service running on the major cloud platforms, as virtual machines, and on “bare metal” servers, meaning a user installs all the software needed for the platform and is the sole tenant of the server.
It’s that last use case in which Supermicro’s SuperBlade servers are especially useful. Their advantage is that they support a higher-density infrastructure and lower-latency network configuration, along with benefits from reduced cabling, power and shared cooling features.
The SuperBlade comes in an 8U chassis with room to accommodate up to 20 hot-pluggable nodes (processor, network and storage) in a variety of more than a dozen models that support serial-attached SCSI, ordinary SATA drives, and GPU processor modules. It sports multiple AMD EPYC™ processors using fast DDR4 3200MHz memory modules.
A chief advantage of the SuperBlade is that it can support a variety of higher-capacity OCP workload configurations and do so within a single server chassis. This is critical because OCP requires a variety of server roles to deliver its overall functionality, and having these roles working inside of a chassis means performance and latency benefits. For example, you could partition a SuperBlade’s 20 nodes into various OCP components such as administrative, management, storage, worker, infrastructure and load balancer nodes, all operating within a single chassis. For deeper detail about running OCP on the SuperBlade, check out this Supermicro white paper.
Single-root I/O virtualization (SR-IOV) is an interesting standard for performance-intensive computing because it lets a network adapter access resources across a PCIe bus, making it even higher performing. It lets data traffic be routed directly to a particular virtual machine (VM) without interrupting the flow of other traffic across the bus. It does that by bypassing the software switching layer of the virtualization stack, thereby reducing the input/output overhead and improving network performance, stability and reliability. (Get more information about SR-IOV in VMware and Microsoft contexts, for example.)
What this means, especially in GPU-based computing, is that each VM has its own dedicated share of the GPU and isn’t forced to compete with other VMs for its share of resources. The feature also helps isolate each VM and is the basic building block for modern VM hyperscale technologies.
Tests of SR-IOV have found big benefits, such as lowering processor utilization by 50% and boosting network throughput by up to 30%. This allows for more VMs per host and being able to run heavier workloads on each VM.
An excellent server for any virtualization platform is the Supermicro BigTwin® server. With up to 4 servers in just 2U, the Supermicro BigTwin is a versatile and powerful multi-node system that is environmentally friendly due to its shared components. Plus it can handle a wide range of workloads. Learn more about the Supermicro BigTwin model AS -2124BT-HTR.
Not a New Idea
The technology isn’t new: Scott Lowe wrote about it back in 2009 and SR-IOV was initially supported by Microsoft Windows Server 2012 and with AMD chipsets in 2016. This support has been extended with Azure NVv4 and AWS EC2 G4ad virtual machine instances, which are based on the AMD EPYC™ 7002 CPU and Radeon Pro™ GPU processor families.
The standard is supported by both VMware and Microsoft’s Hyper-V hosts and in various AMD EPYC™ CPU chipsets with MxGPU technology that is built into the actual silicon. This enables sharing a GPU’s power across multiple users or VMs but providing a similar performance level of a discrete processor.
The SR-IOV technology is a big benefit for immersive cloud-based gaming, desktop-as-a-service, machine learning models and 3D rendering applications.
“AMD EPYC™ processors are now a part of the world’s hyperscale data centers,” said Lisa Su, AMD’s CEO. Meta/Facebook is now building its servers with powerful third-generation AMD EPYC™ CPUs.
If you're making plans to build a high-performance data center, be sure to take a close look at the latest version of AMD's EPYC™ CPU chipsets, which were code-named “Milan X.”
Servers that employ AMD’s third-generation EPYC™ CPUs are so powerful that Meta/Facebook is now building its servers with them, using the new single-socket cloud-scale design, which is a part of their Open Compute Project. “AMD EPYC™ processors are now a part of the world’s hyperscale data centers,” said Lisa Su, AMD’s CEO, in the presentation at which she debuted the processors.
This latest generation of AMD EPYC™ CPUs uses an innovative packaging option of 3D stacking of chiplets for high-performance computing applications. Higher density cached memory is stacked on top of the processor to deliver more than 200 times the interconnected density of prior chiplet packaging designs. “It is the most flexible active-on-active silicon technology available in the world,” Su said. “It consumes much less energy and fits into existing CPU sockets, too.” AMD's latest chipsets satisfy the higher demands of cloud computing and electronic circuit design applications.
Jason Zander, EVP Microsoft Azure, said that Microsoft's partnership with AMD has let the cloud computing company deliver cloud instances that can run up to 12 times the speed of earlier offerings. “That rivals some supercomputers,” he said. Azure has configured some of the most powerful virtual instances, which are running on the latest AMD EPYC™ processors. They are available from 16 cores up to 120 cores and can share 448 GB of memory and 480 MB of L3 cache among the processors. For deeper information, see this Microsoft blog.
Circuit design demands the fastest processors. “The next step for AMD is to deliver more differentiation in value with a focus on performance per core,” said Dan McNamara, general manager of AMD’s Server Business Unit. “In our tests comparing Synopsys VCS chip-design simulation software running on older and newer AMD EPYC™ CPUs, engineers were able to complete 66% more jobs in the same elapsed time, thanks to having a larger L3 cache. This means that more data can be kept closer to the processor for better performance.” These faster product design lifecycles mean faster times to market since designers can save time in the testing process.
A white paper from IDC projects a new role for IT leaders in preparing the infrastructure required to properly power performance-intensive computing (PIC) for enterprise workloads, such as data-driven insights, AI/machine learning, big data, modeling and simulation and more. Get the full white paper to learn best practices and avoid pitfalls when implementing performance-intensive computing infrastructure.
Organizations use data-driven insights to gain competitive advantage over their rivals. Competitive differentiation is often realized through the delivery of new products and services or enhancements to existing products and services. It can also be achieved by streamlining and optimizing business operations. A data-driven business reduces the time needed to realize business advantage by creating an environment conducive to forming business-differentiating insights.
As a result, IDC projects that a new chapter in the relationship between IT and the business is about to begin. The new phase will push IT further in a strategic direction, increasing its influence on business outcomes.
The trend is expected by IDC to play out over the next four to five years, thrusting IT into a new role implementing a foundational infrastructure designed to foster timely, data-driven insights, at scale. The new infrastructure will be designed to support Performance-Intensive Computing (PIC). Investments in new performance-intensive workloads will be more significant than those used for corporate IT and other business applications.
IDC defines performance-intensive computing as the process of performing large-scale, mathematically intensive computations, commonly used in analytics, machine learning and technical computing — and now increasingly required for artificial intelligence and big data and analytics in the commercial space.
Performance-intensive computing workloads have evolved at an accelerated pace. An overwhelming majority of respondents in IDC's 2021 IT Enterprise Infrastructure Survey agreed that PIC workloads are important or even critically important to their business.
But a general-purpose infrastructure won’t get the job done. Common pitfalls organizations encounter have to do with people, organizational models, business process and access to the technology required to succeed.
Get the full IDC white paper: Gaining Deep and Timely Insights with Performance-Intensive Computing Infrastructure.