Sponsored by:

Visit AMD Visit Supermicro

Performance Intensive Computing

Capture the full potential of IT

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

Featured content

Supermicro debuts 3 GPU servers with AMD Instinct MI300 Series APUs

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro didn’t waste any time.

The same day that AMD introduced its new AMD Instinct MI300 series accelerators, Supermicro debuted three GPU rackmount servers that use the new AMD accelerated processing units (APUs). One of the three new systems also offers energy-efficient liquid cooling.

Here’s a quick look, plus links for more technical details:

Supermicro 8-GPU server with AMD Instinct MI300X: AS -8125GS-TNMR2

This big 8U rackmount system is powered by a pair of AMD EPYC 9004 Series CPUs and 8 AMD Instinct MI300X accelerator GPUs. It’s designed for training and inference on massive AI models with a total of 1.5TB of HBM3 memory per server node.

The system also supports 8 high-speed 400G networking cards, which provide direct connectivity for each GPU; 128 PCIe 5.0 lanes; and up to 16 hot-swap NVMe drives.

It’s an air-cooled system with 5 fans up front and 5 more in the rear.

Quad-APU systems with AMD Instinct MI300A accelerators: AS -2145GH-TNMR and AS -4145GH-TNMR

These two rackmount systems are aimed at converged HPC-AI and scientific computing workloads.

They’re available in the user’s choice of liquid or air cooling. The liquid-cooled version comes in a 2U rack format, while the air-cooled version is packaged as a 4U.

Either way, these servers are powered by four AMD Instinct MI300A accelerators, which combine CPUs and GPUs in an APU. That gives each server a total of 96 AMD ‘Zen 4’ cores, 912 compute units, and 512GB of HBM3 memory. Also, PCIe 5.0 expansion slots allow for high-speed networking, including RDMA to APU memory.

Supermicro says the liquid-cooled 2U system provides a 50%+ cost savings on data-center energy. Another difference: The air-cooled 4U server provides more storage and an extra 8 to 16 PCIe acceleration cards.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Research Roundup: GenAI, 10 IT trends, cybersecurity, CEOs, and privacy

Featured content

Research Roundup: GenAI, 10 IT trends, cybersecurity, CEOs, and privacy

Catch up on the latest IT research and analysis from leading market watchers.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Generative AI is booming. Ten trends will soon rock your customers’ world. While cybersecurity spending is up, CEOs lack cyber confidence. And Americans worry about their privacy.

That’s some of the latest from leading IT market watchers. And here’s your Performance Intensive Computing roundup.

GenAI market to hit $143B by 2027

Generative AI is quickly becoming a big business.

Market watcher IDC expects that spending on GenAI software, related hardware and services will this year reach nearly $16 billion worldwide.

Looking ahead, IDC predicts GenAI spending will reach $143 billion by 2027. That would represent a compound annual growth rate (CAGR) over the years 2023 to 2027 of 73%—more than twice the growth rate in overall AI spending.

“GenAI is more than a fleeting trend or mere hype,” says IDC group VP Ritu Jyoti.

Initially, IDC expects, the largest GenAI investments will go to infrastructure, including hardware, infrastructure as a service (IaaS), and system infrastructure software. Then, once the foundation has been laid, spending is expected to shift to AI services.

Top 10 IT trends

What will be top-of-mind for your customers next year and beyond? Researchers at Gartner recently made 10 predictions:

1. AI productivity will be a primary economic indicator of national power.

2. Generative AI tools will reduce modernization costs by 70%.

3. Enterprises will collectively spend over $30 billion fighting “malinformation.”

4. Nearly half of all CISOs will expand their responsibilities beyond cybersecurity, driven by regulatory pressure and expanding attack surfaces.

5. Unionization among knowledge workers will increase by 1,000%, motivated by fears of job loss due to the adoption of GenAI.

6. About one in three workers will leverage “digital charisma” to advance their careers.

7. One in four large corporations will actively recruit neurodivergent talent—including people with conditions such as autism and ADHD—to improve business performance.

8. Nearly a third of large companies will create dedicated business units or sales channels for machine customers.

9. Due to labor shortages, robots will soon outnumber human workers in three industries: manufacturing, retail and logistics.

10. Monthly electricity rationing will affect fully half the G20 nations. One result: Energy efficiency will become a serious competitive advantage.

Cybersecurity spending in Q2 rose nearly 12%

Heightened threat levels are leading to heightened cybersecurity spending.

In the second quarter of this year, global spending on cybersecurity products and services rose 11.6% year-on-year, reaching a total of $19 billion worldwide, according to Canalys.

A mere 12 vendors received nearly half that spending, Canalys says. They include Palo Alto Networks, Fortinet, Cisco and Microsoft.

One factor driving the spending is fear, the result of a 50% increase in the number of publicly reported ransomware attacks. Also, the number of breached data records more than doubled in the first 8 months of this year, Canalys says.

All this increased spending should be good for channel sellers. Canalys finds that nearly 92% of all cybersecurity spending worldwide goes through the IT channel.

CEOs lack cyber confidence

Here’s another reason why cybersecurity spending should be rising: Roughly three-quarters of CEOs (74%) say they’re concerned about their organizations’ ability to avert or minimize damage from a cyberattack.

That’s according to a new survey, conducted by Accenture, of 1,000 CEOs from large organizations worldwide.

Two findings from the Accenture survey really stand out:

  • Nearly two-thirds of CEOs (60%) say their organizations do not incorporate cybersecurity into their business strategies, products or services
  • Nearly half (44%) the CEOs believe cybersecurity can be handled with episodic interventions rather than with ongoing, continuous attention.

Despite those weaknesses, nearly all the surveyed CEOs (96%) say they believe cybersecurity is critical to their organizations’ growth and stability. Mind the gap!

How do Americans view data privacy?

Fully eight in 10 Americans (81%) are concerned about how companies use their personal data. And seven in 10 (71%) are concerned about how their personal data is used by the government.

So finds a new Pew Research Center survey of 5,100 U.S. adults. The study, conducted in May and published this month, sought to discover how Americans think about privacy and personal data.

Pew also found that Americans don’t understand how their personal data is used. In the survey, nearly eight in 10 respondents (77%) said they have little to no understanding of how the government uses their personal data. And two-thirds (67%) said the same thing about businesses, up from 59% a year ago.

Another key finding: Americans don’t trust social media CEOs. Over three-quarters of Pew’s respondents (77%) say they have very little or no trust that leaders of social-medica companies will publicly admit mistakes and take responsibility.

And about the same number (76%) believe social-media companies would sell their personal data without their consent.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: How does design simulation work? Part 2

Featured content

Tech Explainer: How does design simulation work? Part 2

Cutting-edge technology powers the virtual design process.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The market for simulation software is hot, growing at a compound annual growth rate (CAGR) of 13.2%, according to Markets and Markets. The research firm predicts that the global market for simulation software, worth an estimated $18.1 billion this year, will rise to $33.5 billion by 2027.

No surprise, then, that tech titans AMD and Supermicro would design an advanced hardware platform to meet the demands of this burgeoning software market.

AMD and Supermicro have teamed up with Ansys Inc., a U.S.-based designer of engineering simulation software. One result of this three-way collaboration is the Supermicro SuperBlade.

Shanthi Adloori, senior director of product management at Supermicro, calls the SuperBlade “one of the fastest simulation-in-a-box solutions.”

Adloori adds: “With a high core count, large memory capacity and faster memory bandwidth, you can reduce the time it takes to complete a simulation .”

One very super blade

Adloori isn’t overstating the case.

Supermicro’s SuperBlade can house up to 20 hot-swappable nodes in its 8U chassis. Each of those blades can be equipped with AMD EPYC CPUs and AMD Instinct GPUs. In fact, SuperBlade is the only platform of its kind designed to support both GPU and non-GPU nodes in the same enclosure.

Supermicro SuperBlade’s other tech specs may be less glamorous, but they’re no less impressive. When it comes to memory, each blade can address a maximum of either 8TB or 16TB of DDR5-4800 memory.

Each node can also house 2 NVMe/SAS/SATA drives and as many as eight 3000W Titanium Level power supplies.

Because networking is an essential element of enterprise-grade design simulation, SuperBlade includes redundant 25Gb/10Gb/1Gb Ethernet switches and up to 200Gbps/100Gbps InfiniBand networking for HPC applications.

For smaller operations, the Supermicro SuperBlade is also available in smaller configurations, including  6U and 4U. These versions pack fewer nodes, which ultimately means they’re able to bring less power to bear. But, hey, not every design team makes passenger jets for a living.

It’s all about the silicon

If Supermicro’s SuperBlade is the tractor-trailer of design simulation technology, then AMD CPUs and GPUs are the engines under the hood.

The differing designs of these chips lend themselves to specific core competencies. CPUs can focus tremendous power on a few tasks at a time. Sure, they can multitask. But there’s a limit to how many simultaneous operations they can address.

AMD bills its EPYC 7003 Series CPUs as the world’s highest-performing server processors for technical computing. The addition of AMD 3D V-Cache technology delivers an expanded L3 cache to help accelerate simulations.

GPUs, on the other hand, are required when running simulations where certain tasks require simultaneous operations to be performed. The AMD Instinct MI250X Accelerator contains 220 compute units with 14,080 stream processors.

Instead of throwing a ton of processing power at a small number of operations, the AMD Instinct can address thousands of less resource-intensive operations simultaneously. It’s that capability that makes GPUs ideal for HPC and AI-enabled operations, an increasingly essential element of modern design simulation.

The future of design simulation

The development of advanced hardware like SuperBlade and the AMD CPUs and GPUs that power it will continue to progress as more organizations adopt design simulation as their go-to product development platform.

That progression will continue to manifest in global companies like Boeing and Volkswagen. But it will also find its way into small startups and single users.

Also, as the required hardware becomes more accessible, simulation software should become more efficient.

This confluence of market trends could empower millions of independent designers with the ability to perform complex design, testing and validation functions.

The result could be nothing short of a design revolution.

Part 1 of this two-part Tech Explainer explores the many ways design simulation is used to create new products, from tiny heart valves to massive passenger aircraft. Read Part 1 now.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Why M&E content creators need high-end VDI, rendering & storage

Featured content

Why M&E content creators need high-end VDI, rendering & storage

Content creators in media and entertainment need lots of compute, storage and networking. Supermicro servers with AMD EPYC processors are enhancing the creativity of these content creators by offering improved rendering and high-speed storage. These systems empower the production of creative ideas.

 

Learn More about this topic
  • Applications:
  • Featured Technologies:

When content creators at media and entertainment (M&E) organizations create videos and films, they’re also competing for attention. And today that requires a lot of technology.

Making a full-length animated film involves no fewer than 14 complex steps, including 3D modeling, texturing, animating, visual effects and rendering. The whole process can take years. And it requires a serious quantity of high-end compute, storage and software.

From an IT perspective, three of the most compute-intensive activities for M&E content creators are VDI, rendering and storage. Let’s take a look at each.

* Virtual desktop infrastructure (VDI): While content creators work on personal workstations, they need the kind of processing power and storage capacity available from a rackmount server. That’s what they get with VDI.

VDI separates the desktop and associated software from the physical client device by hosting the desktop environment and applications on a central server. These assets are then delivered to the desktop workstation over a network.

To power VDI setups, Supermicro offers a 4U GPU server with up to 8 PCIe GPUs. The Supermicro AS -4125GS-TNRT server packs a pair of AMD EPYC 9004 processors, Nvidia RTX 6000 GPUs, and 6TB of DDR5 memory.

* Rendering: The last stage of film production, rendering is where the individual 3D images created on a computer are transformed into the stream of 2D images ready to be shown to audiences. This process, conducted pixel by pixel, is time-consuming and resource-hungry. It requires powerful servers, lots of storage capacity and fast networking.

For rendering, Supermicro offers its 2U Hyper system, the AS -2125HS-TNR. It’s configured with dual AMD EPYC 9004 processors, up to 6TB of memory, and your choice of NVMe, SATA or SAS storage.

* Storage: Content creation involves creating, storing and manipulating huge volumes of data. So the first requirement is simply having a great deal of storage capacity. But it’s also important to be able to retrieve and access that data quickly.

For these kinds of storage challenges, Supermicro offers Petascale storage servers based on AMD EPYC processors. They can pack up to 16 hot-swappable E3.S (7.5mm) NVMe drive bays. And they’ve been designed to store, process and move vast amounts of data.

M&E content creators are always looking to attract more attention. They’re getting help from today’s most advanced technology.

Do more:

 

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 2

In Part 1 of this 2-part Tech Explainer, we explored the difference between how machine learning and deep learning models are trained and deployed. Now, in Part 2, we’ll get deeper into deep learning to discover how this advanced form of AI is changing the way we work, learn and create.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Where Machine Learning is designed to reduce the need for human intervention, Deep Learning—an extension of ML—removes much of the human element altogether.

If ML were a driver-assistance feature that helped you parallel park and avoid collisions, DL would be an autonomous, self-driving car.

The human intervention we’re talking about has much to do with categorizing and labeling the data used by ML models. Producing this structured data is both time-consuming and expensive.

DL shortens the time and lowers the cost by learning from unstructured data. This elimnates much of the data pre-processing performed by humans for ML.

That’s good news for modern businesses. Market watcher IDC estimates that as much as 90% of corporate data is associated with unstructured data.

DL is particularly good at processing unstructured data. That includes information coming from the edge, the core and millions of both personal and IoT devices.

Like a brain, but digital

Deep Learning systems “think” with a neural network—multiple layers of interconnected nodes designed to mimic the way the human brain works. A DL system processes data inputs in an attempt to recognize, classify and accurately describe objects within data.

The layers of a neural network are stacked vertically. Each layer builds on the work performed by the one below it. By pushing data through each successive layer, the overall system improves its predictions and categorizations.

For instance, imagine you’ve tasked a DL system to identify pictures of junk food. The system would quickly learn—on its own—how to differentiate Pringles from Doritos.

It might do this by learning to recognize Pringles’ iconic tubular packaging. Then the system would categorize Pringles differently than the family-size sack of Doritos.

What if you fed this hypothetical DL system with more pictures of chips? Then it could begin to identify varying angles of packaging, as well as colors, logos, shapes and granular aspects of the chips themselves.

As this example illustrates, the longer a DL system operates, the more intelligent and accurate it becomes.

Things we used to do

DL tends to be deployed when it’s time to pull out the big guns. This isn’t tech you throw at a mere spam filter or recommendation engine.

Instead, it’s the tech that powers the world’s finance, biomedical advances and law enforcement. For these verticals, failure is simply not an option.

For these verticals, here are some of the ways DL operates behind the scenes:

  • BioMed: DL helps healthcare staff analyze medical imaging such as X-rays and CT scans. In many cases, the technology is more accurate than well-trained physicians with decades of experience.
  • Finance: For those seeking a market edge (read: everyone), DL employs powerful, algorithmic-based predictive analytics. This helps modern-day robber barons manage their portfolios based on insights from data so vast, they couldn’t leverage it themselves. DL also helps financial institutions assess loans, detect fraud and manage credit.
  • Law Enforcement: In the 2002 movie “Minority Report,” Tom Cruise played a police officer who could arrest people before they committed a crime. With DL, this fiction could turn into an unsettling reality. DL can be used to analyze millions of data points, then predict who is most likely to break the law. It might even give authorities an idea of where, when and how it could happen.

The future…?

Looking into a crystal ball—which these days probably uses DL—we can see a long succession of similar technologies coming. Just as ML begat DL, so too will DL beget the next form of AI—and the one after that.

The future of DL isn’t a question of if, but when. Clearly, DL will be used to advance a growing number of industries. But just when each sector will come to be ruled by our new smarty-pants robots is less clear.

Keep in mind: Even as you read this, DL systems are working tirelessly to help data scientists make AI more accurate and able to provide more useful assessments of datasets for specific outcomes. And as the science progresses, neural networks will continue to become more complex—and more like human brains.

That means the next generation of DL will likely be far more capable than the current one. Future AI systems could figure out how to reverse the aging process, map distant galaxies, even produce bespoke food based on biometric feedback from hungry diners.

For example, the upcoming AMD Instinct MI300 accelerators promise to usher in a new era of computing capabilities. That includes the ability to handle large language models (LLMs), the key approach behind generative AI systems such as ChatGPT.

Yes, the robots are here, and they want to feed you custom Pringles. Bon appétit!

 

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

What’s the difference between machine learning and deep learning? That’s the subject of this 2-part Tech Explainer. Here, in Part 1, learn more about ML. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

As the names imply, machine learning and deep learning are types of smart software that can learn. Perhaps not the way a human does. But close enough.

What’s the difference between machine and deep learning? That’s the subject of this 2-part Tech Explainer. Here in Part 1, we’ll look in depth at machine learning. Then in Part 2, we’ll look more closely at deep learning.

Both, of course, are subsets of artificial intelligence (AI). To understand their differences, it helps to first understand something of the AI hierarchy.

At the very top is overarching AI technology. It powers both popular generative AI models such as ChatGPT and less famous but equally helpful systems such as the suggestion engine that tells you which show to watch next on Netflix.

Machine learning is a subset of AI. It can perform specific tasks without first needing explicit instructions.

As for deep learning, it’s actually a subset of machine learning. DL is powered by so-called neural networks, multiple node layers that form a system inspired by the structure of the human brain.

Machine learning for smarties

Machine learning is defined as the use and development of computer systems designed to learn and adapt without following explicit instructions.

Instead of requiring human input, ML systems use algorithms and statistical models to analyze and draw inferences from patterns they find in large data sets.

This form of AI is especially good at identifying patterns from structured data. Then it can analyze those patterns to make predictions, usually reliable.

For example, let’s say an organization wants to predict when a particular customer will unsubscribe from its service. The organization could use ML to make an educated guess based on previous data about customer churn.

The machinery of ML

Like all forms of AI, machine learning uses lots of compute and storage resources. Enterprise-scale ML models are powered by data centers packed to the gills with cutting-edge tech. The most vital of these components are GPUs and AI data-center accelerators.

GPUs, though initially designed to process graphics, have become the preferred tool for AI development. They offer high core counts—sometimes numbering in the thousands—as well as massive parallel processes. That makes them ideally suited to process a vast number of simple calculations simultaneously.

As AI gained acceptance, IT managers sought ever more powerful GPUs. The logical conclusion was the advent of new technologies like AMD’s Instinct MI200 Series accelerators. These purpose-built GPUs have been designed to power discoveries in mainstream servers and supercomputers, including some of the largest exascale systems in use today.

AMD’s forthcoming Instinct MI300X will go one step further, combining a GPU and AMD EPYC CPU in a single component. It’s set to ship later this year.

State-of-the-art CPUs are important for ML-optimized systems. The CPUs need as many cores as possible, running at high frequencies to keep the GPU busy. AMD’s EPYC 9004 Series processors excel at this.

In addition, the CPUs need to run other tasks and threads of the application. When looking at a full system, PCIe 5.0 connectivity and DDR4 memory are important, too.

The GPUs that power AI are often installed in integrated servers that have the capacity to house their constituent components, including processors, flash storage, networking tech and cooling systems.

One such monster server is the Supermicro AS -4125GS-TNRT. It brings together eight direct attached, double-width, full-length GPUs; up to 6TB of RAM; and two dozen 2.5-inch solid-state drives (SSDs). This server also supports the AMD Instinct MI210 accelerator.

ML vs. DL

The difference between machine learning and deep learning begins with their all-important training methods. ML is trained using four primary methods: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

Deep learning, on the other hand, requires more complex training methods. These include convolutional neural networks, recurrent neural networks, generative adversarial networks and autoencoders.

When it comes to performing real-world tasks, ML and DL offer different core competencies. For instance, ML is the type of AI behind the most effective spam filters, like those used by Google and Yahoo. Its ability to adapt to varying conditions allows ML to generate new rules based on previous operations. This functionality helps it keep pace with highly motivated spammers and cybercriminals.

More complex inferencing tasks like medical imaging recognition are powered by deep learning. DL models can capture intricate relationships within medical images, even when those relationships are nonlinear or difficult to define. In other words, deep learning can quickly and accurately identify abnormalities not visible to the human eye.

Up next: a Deep Learning deep dive

In Part 2, we’ll explore more about deep learning. You’ll find out how data scientists develop new models, how various verticals leverage DL, and what the future holds for this emerging technology.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Can liquid-cooled servers help your customers?

Featured content

Can liquid-cooled servers help your customers?

Liquid cooling can offer big advantages over air cooling. According to a new Supermicro solution guide, these benefits include up to 92% lower electricity costs for a server’s cooling infrastructure, and up to 51% lower electricity costs for an entire data center.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The previous thinking was that liquid cooling was only for supercomputers and high-end gaming PCs. No more.

Today, many large-scale cloud, HPC, analytics and AI servers combine CPUs and GPUs in a single enclosure, generating a lot of heat. Liquid cooling can carry away the heat that’s generated, often with less overall cost and more efficiently than air.

According to a new Supermicro solution guide, liquid’s advantages over air cooling include:

  • Up to 92% lower electricity costs for a server’s cooling infrastructure
  • Up to 51% lower electricity costs for the entire data center
  • Up to 55% less data center server noise

What’s more, the latest liquid cooling systems are turnkey solutions that support the highest GPU and CPU densities. They’re also fully validated and tested by Supermicro under demanding workloads that stress the server. And unlike some other components, they’re ready to ship to you and your customers quickly, often in mere weeks.

What are the liquid-cooling components?

Liquid cooling starts with a cooling distribution unit (CDU). It incorporates two modules: a pump that circulates the liquid coolant, and a power supply.

Liquid coolant travels from the CDU through flexible hoses to the cooling system’s next major component, the coolant distribution manifold (CDM). It’s a unit with distribution hoses to each of the servers.

There are 2 types of CDMs. A vertical manifold is placed on the rear of the rack, is directly connected via hoses to the CDU, and delivers coolant to another important component, the cold plates. The second type, a horizontal manifold, is placed on the front of the rack, between two servers; it’s used with systems that have inlet hoses on the front.

The cold plates, mentioned above, are placed on top of the CPUs and GPUs in place of their typical heat sinks. With coolant flowing through their channels, they keep these components cool.

Two valuable CDU features are offered by Supermicro. First, the company’s CDU has a cooling capacity of 100kW, which enables very high rack compute densities. Second, Supermicro’s CDU features a touchscreen for monitoring and controlling the rack operation via a web interface. It’s also integrated with the company’s Super Cloud Composer data-center management software.

What does it work on?

Supermicro offers several liquid-cooling configurations to support different numbers of servers in different size racks.

Among the Supermicro servers available for liquid cooling is the company’s GPU systems, which can combine up to eight Nvidia GPUs and AMD EPYC 9004 series CPUs. Direct-to-chip (D2C) coolers are mounted on each processor, then routed through the manifolds to the CDU. 

D2C cooling is also a feature of the Supermicro SuperBlade. This system supports up to 20 blade servers, which can be powered by the latest AMD EPYC CPUs in an 8U chassis. In addition, the Supermicro Liquid Cooling solution is ideal for high-end AI servers such as the company’s 8-GPU 8125GS-TNHR.

To manage it all, Supermicro also offers its SuperCloud Composer’s Liquid Cooling Consult Module (LCCM). This tool collects information on the physical assets and sensor data from the CDU, including pressure, humidity, and pump and valve status.

This data is presented in real time, enabling users to monitor the operating efficiency of their liquid-cooled racks. Users can also employ SuperCloud Composer to set up alerts, manage firmware updates, and more.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: Green Computing, Part 3 – Why you should reduce, reuse & recycle

Featured content

Tech Explainer: Green Computing, Part 3 – Why you should reduce, reuse & recycle

The new 3Rs of green computing are reduce, reuse and recycle. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

To help your customers meet their environmental, social and governance (ESG) goals, it pays to focus on the 3 Rs of green computing—reduce, reuse and recycle.

Sure, pursuing these goals can require some additional R&D and reorganization. But tech titans such as AMD and Supermicro are helping.

AMD, Supermicro and their vast supply chains are working to create a new virtuous circle. More efficient tech is being created using recycled materials, reused where possible, and then once again turned into recycled material.

For you and your customers, the path to green computing can lead to better corporate citizenship as well as higher efficiencies and lower costs.

Green server design

New disaggregated server technology is now available from manufacturers like Supermicro. This tech makes it possible for organizations of every size to increase their energy efficiency, better utilize data-center space, and reduce capital expenditures.

Supermicro’s SuperBlade, BigTwin and EDSFF SuperStorage are exemplars of disaggregated server design. The SuperBlade multi-node server, for instance, can house up to 20 server blades and 40 CPUs. And it’s available in 4U, 6U and 8U rack enclosures.

These efficient designs allow for larger, more efficient shared fans and power supplies. And along with the chassis itself, many elements can remain in service long past the lifespans of the silicon components they facilitate. In some cases, an updated server blade can be used in an existing chassis.

Remote reprogramming

Innovative technologies like adaptive computing enable organizations to adopt a holistic approach to green computing at the core, the edge and in end-user devices.

For instance, AMD’s adaptive computing initiative offers the ability to optimize hardware based on applications. Then your customers can get continuous updates after production deployment, adapting to new requirements without needing new hardware.

The key to adaptive computing is the Field Programmable Gate Array (FPGA). It’s essentially a blank canvas of hardware, capable of being configured into a multitude of different functions. Even after an FPGA has been deployed, engineers can remotely access the component to reprogram various hardware elements.

The FPGA reprogramming process can be as simple as applying security patches and bug fixes—or as complex as a wholesale change in core functionality. Either way, the green computing bona fides of adaptive computing are the same.

What’s more, adaptive tech like FPGAs significantly reduces e-waste. This helps to lower an organization’s overall carbon footprint by obviating the manufacturing and transportation necessary to replace hardware already deployed.

Adaptive computing also enables organizations to increase energy efficiency. Deploying cutting-edge tech like the AMD Instinct MI250X Accelerator to complete AI training or inferencing can significantly reduce the overall electricity needed to complete a task.

Radical recycling

Even in organizations with the best green computing initiatives, elements of the hardware infrastructure will eventually be ready for retirement. When the time comes, these organizations have yet another opportunity to go green—by properly recycling.

Some servers can be repurposed for other, less-demanding tasks, extending their lifespan. For example, a system that had been used for HPC applications that may no longer have the required FP64 performance could be repurposed to host a database or email application.

Quite a lot of today’s computer hardware can be recycled. This includes glass from monitors; plastic and aluminum from cases; copper in power supplies; precious metals used in circuitry; even the cardboard, wood and other materials used in packaging.

If that seems like too much work, there are now third-party organizations that will oversee your customers’ recycling efforts for a fee. Later, if all goes according to plan, these recycled materials will find their way back into the manufacturing supply chain.

Tech suppliers are working to make recycling even easier. For example, AMD is one of the many tech leaders whose commitment to environmental sustainability extends across its entire value chain. For AMD, that includes using environmentally preferable packing materials, such as recycled materials and non-toxic dyes.

Are you 3R?

Your customers understand that establishing and adhering to ESG goals is more than just a good idea. In fact, it’s vital to the survival of humanity.

Efforts like those of AMD and Supermicro are helping to establish a green computing revolution—and not a moment too soon.

In other words, pursuing green computing’s 3 Rs will be well worth the effort.

Also read:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Interview: How NEC Germany keeps up with the changing HPC market

Featured content

Interview: How NEC Germany keeps up with the changing HPC market

In an interview, Oliver Tennert, director of HPC marketing and post-sales at NEC Germany, explains how the company keeps pace with a fast-developing market.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • NEC Germany

The market for high performance computing (HPC) is changing, meaning system integrators that serve HPC customers need to change too.

To learn more, PIC managing editor Peter Krass spoke recently with Oliver Tennert, NEC Germany’s director of HPC marketing and post-sales. NEC Germany works with hardware vendors that include AMD processors and Supermicro servers. This interview has been lightly edited for clarity.

First, please tell me about NEC Germany and its relationship with parent company NEC Corp.?

I work for NEC Germany, which is a subsidary of NEC Europe. Our parent company, NEC Corp., is a Japanese company with a focus on telecommunications, which is still a major part of our business. Today NEC has about 100,000 employees around the world.

HPC as a business within NEC is done primarily by NEC Germany and our counterparts at NEC Corp. in Japan. The Japanese operation covers HPC in Asia, and we cover EMEA, mainly Europe.

What kinds of HPC workloads and applications do your customers run?

It’s probably 60:40 — that is, about 60% of our customers are in academia, including universities, research facilities, and even DWD, Germany’s weather-forecasting service. The remaining 40% are industrial, including automotive and engineering companies. 

The typical HPC use cases of our customers come in two categories. The most important HPC category of course is simulation. That can mean simulating physical processes. For example, what does a car crash look like under certain parameters? These simulations are done in great detail.

Our other important HPC category is data analytics. For example, that could mean genomic analysis.

How do you work with AMD and Supermicro?

To understand this, you first have to understand how NEC’s HPC business works. For us, there are two aspects to the business.

One, we’ve got our own vector technology. Our NEC vector engine is a PCIe card designed and produced in Japan. The latest incarnation of our vector supercomputer is the NEC SX-Aurora TSUBASA. It was designed to run applications that are both vectorizable and profit from high bandwidth to main memory. One of our big customers in this area is the German weather service, DWD.

The other part of the business is what we call “pizza boxes,” the x86 architecture. For this, we need industry-standard servers, including processors from AMD and servers from Supermicro.

For that second part of the business, what is NEC’s role?

The answer has to do with how the HPC business works operationally. If a customer intends to purchase a new HPC cluster, typically they need expert advice on designing an optimized HPC environment. What they do know is the application they run. And what they want to know is, ‘How do we get the best, most optimized system for this application?’

This implies doing a lot of configuration. Essentially, we optimize the design based on many different components. Even if we know that an AMD processor is the best for a particular task, still, there are dozens of combinations of processor SKUs and server model types which offer different price/performance ratios. The same applies to certain data-storage solutions. For HPC, storage is more than just picking an SSD. What’s needed is a completely different kind of technology.

Configuring and setting up such a complex solution takes a lot of expertise. We’re being asked to run benchmarks. That means the customer says, ‘Here’s my application, please run it on some specific configurations, and tell me which one offers the best price/performance ratio.’ This takes a lot of time and resources. For example, you need the systems on hand to just try it out. And the complete tender process—from pre-sales discussions to actual ordering and delivery—can take anywhere from weeks to months.

And this is just to bid, right? After all this work, you still might not get the order?

Yes, that can happen. There are lots of factors that influence your chances. In general, if you have a good working relationship with a private customer, it’s easier. They have more discretion than academic or public customers. For public bids, everything must be more transparent, because it’s more strictly regulated. Normally, that means you have more work, because you have to test more setups. Your competition will be doing the same.

When working with the second group, the private industry customers, do customer specify parts from specific vendors, such as AMD and Supermicro?

It depends on the factors that will influence the customer’s final selection. Price and performance, that’s one thing. Power consumption is another. Then, sometimes, it’s the vendors. Also, certain projects are more attractive to certain vendors because of market visibility—so-called lighthouse projects. That can have an influence on the conditions we get from vendors. Vendors also honor the amount of effort we have put in to getting the customer in the first place. So there are all sorts of external factors that can influence the final system design.

Also, today, the majority of HPC solutions are similar from an architectural point of view. So the difference between competing vendors is to take all the standard components and optimize from these, instead of providing a competing architecture. As a result, the soft skills—such as the ability to implement HPC solutions in an efficient and professional way—also have a large influence on the final order.

How about power consumption and cooling? Are these important considerations for your HPC customers?

It’s become absolutely vital. As a rule of thumb, we can say that the larger an HPC project is going to be, the more likely that it is going to be cooled by liquid.

In the past, you had a server room that you cooled with air conditioning. But those times are nearly gone. Today, when you think of a larger HPC installation—say, 1,000 or 2,000 nodes—you’re talking about a megawatt of power being consumed, or even more. And that also needs to be cooled.

The challenge in cooling a large environment is to get the heat away from the server and out of the room to somewhere else, whether outside or to a larger cooling system. This cannot be done by traditional cooling with air. Air is too inefficient for transporting heat. Water is much better. It’s a more efficient means for moving heat from Point A to Point B.

How are you cooling HPC systems with liquid?

There are a few ways to do this. There’s cold-water cooling, mainly indirect. You bring in water with what’s known as an “inlet temperature” of about 10 C and it cools down the air inside the server racks, with the heat getting carried away with the water now at about 15 or 20 C. The issue is, first you need energy just to cool the water down to 10 C. Also, there’s not much you can do with water at 15 or 20 C. It’s too warm for cooling anything else, but too cool for heating a room.

That’s why the new approach is to use hot-water cooling, mainly direct. It sounds like a paradox. But what might seem hot to a human being is in fact pretty cool for a CPU. For a CPU, an ambient temperature of 50 or 60 C is fine; it would be absolutely not fine for a human being. So if you have an inlet temperature for water of, say, 40 or 45 C, that will cool the CPU, which runs at an internal temperature of 80 or 90 C. The outbound temperature of the water is then maybe 50 C. Then it becomes interesting. At that temperature, you can heat a building. You can reuse the heat, rather than just throwing it away. So this kind of infrastructure is becoming more important and more interesting.

Looking ahead, what are some of your top projects for the future?

Public customers such as research universities have to replace their HPC systems every three to five years. That’s the normal cycle. In that time the hardware becomes obsolete, especially as the vendors optimize their power consumption to performance ratio more and more. So it’s a steady flow of new projects. For our industrial customers, the same applies, though the procurement cycle may vary.

We’re also starting to see the use of computational HPC capacity from the cloud. Normally, when people think of cloud, they think of public clouds from Amazon, Microsoft, etc. But for HPC, there are interim approaches as well. A decade ago, there was the idea of a dedicated public cloud. Essentially, this meant a dedicated capacity that was for the customer’s exclusive use, but was owned by someone other than the customer. Now, between the dedicated cloud and public cloud, there are all these shades of grey. In the past two years, we’ve implemented several larger installations of this “grey-shaded” cloud approach. So more and more, we’re entering the service-oriented market.

There is a larger trend away from customers wanting to own a system, and toward customers just wanting to utilize capacity. For vendors with expertise in HPC, they have to change as well. Which means a change in the business and the way they have to work with customers. It boils down to, Who owns the hardware? And what does the customer buy, hardware or just services? That doesn’t make you a public-cloud provider. It just means you take over responsibility for this particular customer environment. You have a different business model, contract type, and set of responsibilities.

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

How AMD and Supermicro are working together to help you deliver AI

Featured content

How AMD and Supermicro are working together to help you deliver AI

AMD and Supermicro are jointly offering high-performance AI alternatives with superior price and performance.

Learn More about this topic
  • Applications:
  • Featured Technologies:

When it comes to building AI systems for your customers, a certain GPU provider with a trillion-dollar valuation isn’t the only game in town. You should also consider the dynamic duo of AMD and Supermicro, which are jointly offering high-performance AI alternatives with superior price and performance.

Supermicro’s Universal GPU systems are designed specifically for large-scale AI and high-performance computing (HPC) applications. Some of these modular designs come equipped with AMD’s Instinct MI250 Accelerator and have the option of being powered by dual AMD EPYC processors.

AMD, with a newly formed AI group led by Victor Peng, is working hard to enable AI across many environments. The company has developed an open software stack for AI, and it has also expanded its partnerships with AI software and framework suppliers that now include the PyTorch Foundation and Hugging Face.

AI accelerators

In addition, AMD’s Instinct MI300A data-center accelerator is due to ship in this year’s fourth quarter. It’s the successor to AMD’s MI200 series, based on the company’s CDNA 2 architecture and first multi-die CPU, which powers some of today’s fastest supercomputers.

The forthcoming Instinct MI300A is based on AMD’s CDNA 3 architecture for AI and HPC workloads, which uses 5nm and 6nm process tech and advanced chiplet packaging. Under the MI300A’s hood, you’ll find 24 processor cores with Zen 4 tech, as well as 128GB of HBM3 memory that’s shared by the CPU and GPU. And it supports AMD ROCm 5, a production-ready, open source HPC and AI software stack.

Earlier this month, AMD introduced another member of the series, the AMD Instinct MI300X. It replaces three Zen 4 CPU chiplets with two CDNA 3 chiplets to create a GPU-only system. Announced at AMD’s recent Data Center and AI Technology Premier event, the MI300X is optimized for large language models (LLMs) and other forms of AI.

To accommodate the demanding memory needs of generative AI workloads, the new AMD Instinct MI300X also adds 64GB of HBM3 memory, for a new total of 192GB. This means the system can run large models directly in memory, reducing the number of GPUs needed, speeding performance, and reducing the user’s total cost of ownership (TCO).

AMD also recently introduced the AMD Instinct Platform, which puts eight MI300X systems and 1.5TB of memory in a standard Open Compute Project (OCP) infrastructure. It’s designed to drop into an end user’s current IT infrastructure with only minimal changes.

All this is coming soon. The AMD MI300A started sampling with select customers earlier this quarter. The MI300X and Instinct Platform are both set to begin sampling in the third quarter. Production of the hardware products is expected to ramp in the fourth quarter.

KT’s cloud

All that may sound good in theory, but how does the AMD + Supermicro combination work in the real world of AI?

Just ask KT Cloud, a South Korea-based provider of cloud services that include infrastructure, platform and software as a service (IaaS, PaaS, SaaS). With the rise of customer interest in AI, KT Cloud set out to develop new XaaS customer offerings around AI, while also developing its own in-house AI models.

However, as KT embarked on this AI journey, the company quickly encountered three major challenges:

  • The high cost of AI GPU accelerators: KT Cloud would need hundreds of thousands of new GPU servers.
  • Inefficient use of GPU resources in the cloud: Few cloud providers offer GPU virtualization due to overhead. As a result, most cloud-based GPUs are visible to only 1 virtual machine, meaning they cannot be shared by multiple users.
  • Difficulty using large GPU clusters: KT is training Korean-language models using literally billions of parameters, requiring more than 1,000 GPUs. But this is complex: Users would need to manually apply parallelization strategies and optimizations techniques.

The solution: KT worked with Moreh Inc., a South Korean developer of AI software, and AMD to design a novel platform architecture powered by AMD’s Instinct MI250 Accelerators and Moreh’s software.

The entire AI software stack was developed by Moreh from PyTorch and TensorFlow APIs to GPU-accelerated primitive operations. This overcomes the limitations of cloud services and large AI model training.

Users do not need to insert or modify even a single line of existing source code for the MoAI platform. They also do not need to change the method of running a PyTorch/TensorFlow program.

Did it work?

In a word, yes. To test the setup, KT developed a Korean language model with 11 billion parameters. Training was then done on two machines: one using Nvidia GPUs, the other being the AMD/Moreh cluster equipped with AMD Instinct MI250 accelerators, Supermicro Universal GPU systems, and the Moreh AI platform software.

Compared with the Nvidia system, the Moreh solution with AMD Instinct accelerators showed 116% throughput (as measured by tokens trained per second), and 2.05x higher cost-effectiveness (measured as throughput per dollar).

Other gains are expected, too. “With cost-effective AMD Instinct accelerators and a pay-as-you-go pricing model, KT Cloud expects to be able to reduce the effective price of its GPU cloud service by 70%,” says JooSung Kim, VP of KT Cloud.

Based on this test, KT built a larger AMD/Moreh cluster of 300 nodes—with a total of 1,200 AMD MI250 GPUs—to train the next version of the Korean language model with 200 billion parameters.

It delivers a theoretical peak performance of 434.5 petaflops for fp16/bf16 (a native 16-bit format for mixed-precision training) matrix operations. That should make it one of the top-tier GPU supercomputers in the world.

Do more:

 

Featured videos


Events




Find AMD & Supermicro Elsewhere

Related Content

Pages