Sponsored by:

Visit AMD Visit Supermicro

Capture the full potential of IT

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

Featured content

Tech Explainer: What’s the difference between Machine Learning and Deep Learning? Part 1

What’s the difference between machine learning and deep learning? That’s the subject of this 2-part Tech Explainer. Here, in Part 1, learn more about ML. 

Learn More about this topic
  • Applications:
  • Featured Technologies:

As the names imply, machine learning and deep learning are types of smart software that can learn. Perhaps not the way a human does. But close enough.

What’s the difference between machine and deep learning? That’s the subject of this 2-part Tech Explainer. Here in Part 1, we’ll look in depth at machine learning. Then in Part 2, we’ll look more closely at deep learning.

Both, of course, are subsets of artificial intelligence (AI). To understand their differences, it helps to first understand something of the AI hierarchy.

At the very top is overarching AI technology. It powers both popular generative AI models such as ChatGPT and less famous but equally helpful systems such as the suggestion engine that tells you which show to watch next on Netflix.

Machine learning is a subset of AI. It can perform specific tasks without first needing explicit instructions.

As for deep learning, it’s actually a subset of machine learning. DL is powered by so-called neural networks, multiple node layers that form a system inspired by the structure of the human brain.

Machine learning for smarties

Machine learning is defined as the use and development of computer systems designed to learn and adapt without following explicit instructions.

Instead of requiring human input, ML systems use algorithms and statistical models to analyze and draw inferences from patterns they find in large data sets.

This form of AI is especially good at identifying patterns from structured data. Then it can analyze those patterns to make predictions, usually reliable.

For example, let’s say an organization wants to predict when a particular customer will unsubscribe from its service. The organization could use ML to make an educated guess based on previous data about customer churn.

The machinery of ML

Like all forms of AI, machine learning uses lots of compute and storage resources. Enterprise-scale ML models are powered by data centers packed to the gills with cutting-edge tech. The most vital of these components are GPUs and AI data-center accelerators.

GPUs, though initially designed to process graphics, have become the preferred tool for AI development. They offer high core counts—sometimes numbering in the thousands—as well as massive parallel processes. That makes them ideally suited to process a vast number of simple calculations simultaneously.

As AI gained acceptance, IT managers sought ever more powerful GPUs. The logical conclusion was the advent of new technologies like AMD’s Instinct MI200 Series accelerators. These purpose-built GPUs have been designed to power discoveries in mainstream servers and supercomputers, including some of the largest exascale systems in use today.

AMD’s forthcoming Instinct MI300X will go one step further, combining a GPU and AMD EPYC CPU in a single component. It’s set to ship later this year.

State-of-the-art CPUs are important for ML-optimized systems. The CPUs need as many cores as possible, running at high frequencies to keep the GPU busy. AMD’s EPYC 9004 Series processors excel at this.

In addition, the CPUs need to run other tasks and threads of the application. When looking at a full system, PCIe 5.0 connectivity and DDR4 memory are important, too.

The GPUs that power AI are often installed in integrated servers that have the capacity to house their constituent components, including processors, flash storage, networking tech and cooling systems.

One such monster server is the Supermicro AS -4125GS-TNRT. It brings together eight direct attached, double-width, full-length GPUs; up to 6TB of RAM; and two dozen 2.5-inch solid-state drives (SSDs). This server also supports the AMD Instinct MI210 accelerator.

ML vs. DL

The difference between machine learning and deep learning begins with their all-important training methods. ML is trained using four primary methods: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

Deep learning, on the other hand, requires more complex training methods. These include convolutional neural networks, recurrent neural networks, generative adversarial networks and autoencoders.

When it comes to performing real-world tasks, ML and DL offer different core competencies. For instance, ML is the type of AI behind the most effective spam filters, like those used by Google and Yahoo. Its ability to adapt to varying conditions allows ML to generate new rules based on previous operations. This functionality helps it keep pace with highly motivated spammers and cybercriminals.

More complex inferencing tasks like medical imaging recognition are powered by deep learning. DL models can capture intricate relationships within medical images, even when those relationships are nonlinear or difficult to define. In other words, deep learning can quickly and accurately identify abnormalities not visible to the human eye.

Up next: a Deep Learning deep dive

In Part 2, we’ll explore more about deep learning. You’ll find out how data scientists develop new models, how various verticals leverage DL, and what the future holds for this emerging technology.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

What’s inside Supermicro’s new Petascale storage servers?

Featured content

What’s inside Supermicro’s new Petascale storage servers?

Supermicro has a new class of storage servers that support E3.S Gen 5 NVMe drives. They offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro has introduced a new class of storage servers that support E3.S Gen 5 NVMe drives. These storage servers offer up to 256TB of high-throughput, low-latency storage in a 1U enclosure, and up to half a petabyte in a 2U.

Supermicro has designed these storage servers to be used with large AI training and HPC clusters. Those workloads require that unstructured data, often in extremely large quantities, be delivered quickly to the system’s CPUs and GPUs.

To do this, Supermicro has developed a symmetrical architecture that reduces latency. It does so in 2 ways. One, by ensuring that data travels the shortest possible signal path. And two, by providing the maximum airflow over critical components, allowing them to run as fast and cool as possible.

1U and 2U for you 

Supermicro’s new lineup of optimized storage systems includes 1U servers that support up to 16 hot-swap E3.S drives. An alternate configuration could be up to eight E3.S drives, plus four E3.S 2T 16.8mm bays for CMM and other emerging modular devices.

(CMM is short for Chassis Management Module. These devices provide management and control of the chassis, including basic system health, inventory information and basic recovery operations.)

The E3.S form factor calls for a short and thin NVMe SSD drive that is 76mm high, 112.75mm long, and 7.5mm thick.

In the 2U configuration, Supermicro’s servers support up to 32 hot-swap E3.S drives. A single-processor system, it support the latest 4th Gen AMD EPYC processors.

Put it all together, and you can have a standard rack that stores up to an impressive 20 petabytes of data for high-throughput NVMe over fabrics (NVMe-oF) configurations.

30TB drives coming

When new 30TB drives become available—a move expected later this year—the new Supermicro storage servers will be able to handle them. Those drives will bring the storage total to 1 petabyte in a compact 2U server.

Two storage-drive vendors working closely with Supermicro are Kioxia America and Solidigm, both of which make E3.S solid-state drives (SSDs). Kioxia has announced a 30.72TB SSD called the Kioxia CD8P Series. And Solidigm says its D5-P5336 SSD will ship in an E3.S form factor with up to 30.72TB in the first half of 2024.

The new Supermicro Petascale storage servers are shipping now in volume worldwide.

Learn more about the Supermicro E3.S Petascale All-Flash NVMe Storage Systems.

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Can liquid-cooled servers help your customers?

Featured content

Can liquid-cooled servers help your customers?

Liquid cooling can offer big advantages over air cooling. According to a new Supermicro solution guide, these benefits include up to 92% lower electricity costs for a server’s cooling infrastructure, and up to 51% lower electricity costs for an entire data center.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The previous thinking was that liquid cooling was only for supercomputers and high-end gaming PCs. No more.

Today, many large-scale cloud, HPC, analytics and AI servers combine CPUs and GPUs in a single enclosure, generating a lot of heat. Liquid cooling can carry away the heat that’s generated, often with less overall cost and more efficiently than air.

According to a new Supermicro solution guide, liquid’s advantages over air cooling include:

  • Up to 92% lower electricity costs for a server’s cooling infrastructure
  • Up to 51% lower electricity costs for the entire data center
  • Up to 55% less data center server noise

What’s more, the latest liquid cooling systems are turnkey solutions that support the highest GPU and CPU densities. They’re also fully validated and tested by Supermicro under demanding workloads that stress the server. And unlike some other components, they’re ready to ship to you and your customers quickly, often in mere weeks.

What are the liquid-cooling components?

Liquid cooling starts with a cooling distribution unit (CDU). It incorporates two modules: a pump that circulates the liquid coolant, and a power supply.

Liquid coolant travels from the CDU through flexible hoses to the cooling system’s next major component, the coolant distribution manifold (CDM). It’s a unit with distribution hoses to each of the servers.

There are 2 types of CDMs. A vertical manifold is placed on the rear of the rack, is directly connected via hoses to the CDU, and delivers coolant to another important component, the cold plates. The second type, a horizontal manifold, is placed on the front of the rack, between two servers; it’s used with systems that have inlet hoses on the front.

The cold plates, mentioned above, are placed on top of the CPUs and GPUs in place of their typical heat sinks. With coolant flowing through their channels, they keep these components cool.

Two valuable CDU features are offered by Supermicro. First, the company’s CDU has a cooling capacity of 100kW, which enables very high rack compute densities. Second, Supermicro’s CDU features a touchscreen for monitoring and controlling the rack operation via a web interface. It’s also integrated with the company’s Super Cloud Composer data-center management software.

What does it work on?

Supermicro offers several liquid-cooling configurations to support different numbers of servers in different size racks.

Among the Supermicro servers available for liquid cooling is the company’s GPU systems, which can combine up to eight Nvidia GPUs and AMD EPYC 9004 series CPUs. Direct-to-chip (D2C) coolers are mounted on each processor, then routed through the manifolds to the CDU. 

D2C cooling is also a feature of the Supermicro SuperBlade. This system supports up to 20 blade servers, which can be powered by the latest AMD EPYC CPUs in an 8U chassis. In addition, the Supermicro Liquid Cooling solution is ideal for high-end AI servers such as the company’s 8-GPU 8125GS-TNHR.

To manage it all, Supermicro also offers its SuperCloud Composer’s Liquid Cooling Consult Module (LCCM). This tool collects information on the physical assets and sensor data from the CDU, including pressure, humidity, and pump and valve status.

This data is presented in real time, enabling users to monitor the operating efficiency of their liquid-cooled racks. Users can also employ SuperCloud Composer to set up alerts, manage firmware updates, and more.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Featured content

Meet Supermicro’s Petascale Storage, a compact rackmount system powered by the latest AMD EPYC processors

Supermicro’s H13 Petascale Storage Systems is a compact 1U rackmount system powered by the AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

 

 

Learn More about this topic
  • Applications:
  • Featured Technologies:

Your customers can now implement Supermicro Petascale Storage, an all-Flash NVMe storage system powered by the latest 4th gen AMD EPYC 9004 series processors.

The Supermicro system has been specifically designed for AI, HPC, private and hybrid cloud, in-memory computing and software-defined storage.

Now Supermicro is offering the first of these systems. It's the Supermicro H13 Petascale Storage System. This compact 1U rackmount system is powered by an AMD EPYC 97X4 processor (formerly codenamed Bergamo) with up to 128 cores.

For organizations with data-storage requirements approaching petascale capacity, the Supermicro system was designed with a new chassis and motherboard that support a single AMD EPYC processor, 24 DIMM slots for up to 6TB of main memory, and 16 hot-swap ES.3 slots. That's the Enterprise and Datacenter Standard Form Factor (EDSFF), part of the E3 family of SSD form factors designed for specific use cases. ES.3 is short and thin. It uses 25W and 7.5mm-wide storage media designed with a PCIe 5.0 interface.

The Supermicro Petascale Storage system can deliver more than 200 GB/sec. bandwidth and over 25 million input-output operations per second (IOPS) from a half-petabyte of storage.

Here's why 

Why might your customers need such a storage system? Several reasons, depending on what sorts of workloads they run:

  •  Training AI/ML applications requires massive amounts of data for creating reliable models.
  • HPC projects use and generate immense amounts of data, too. That's needed for real-world simulations, such as predicting the weather or simulating a car crash.
  • Big-data environments need susbstantial datasets. These gain intelligence from real-world observations ranging from sensor inputs to business transactions.
  • Enterprise applications need to locate large amounts of data close to computing over NVMe-over-Fabrics (NVMeoF) speeds.

Also, the Supermicro H13 Petascale Storage System offers significant performance, capacity, throughput and endurance--all while keeping excellent power efficiencies.

Do more:

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Interview: How NEC Germany keeps up with the changing HPC market

Featured content

Interview: How NEC Germany keeps up with the changing HPC market

In an interview, Oliver Tennert, director of HPC marketing and post-sales at NEC Germany, explains how the company keeps pace with a fast-developing market.

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • NEC Germany

The market for high performance computing (HPC) is changing, meaning system integrators that serve HPC customers need to change too.

To learn more, PIC managing editor Peter Krass spoke recently with Oliver Tennert, NEC Germany’s director of HPC marketing and post-sales. NEC Germany works with hardware vendors that include AMD processors and Supermicro servers. This interview has been lightly edited for clarity.

First, please tell me about NEC Germany and its relationship with parent company NEC Corp.?

I work for NEC Germany, which is a subsidary of NEC Europe. Our parent company, NEC Corp., is a Japanese company with a focus on telecommunications, which is still a major part of our business. Today NEC has about 100,000 employees around the world.

HPC as a business within NEC is done primarily by NEC Germany and our counterparts at NEC Corp. in Japan. The Japanese operation covers HPC in Asia, and we cover EMEA, mainly Europe.

What kinds of HPC workloads and applications do your customers run?

It’s probably 60:40 — that is, about 60% of our customers are in academia, including universities, research facilities, and even DWD, Germany’s weather-forecasting service. The remaining 40% are industrial, including automotive and engineering companies. 

The typical HPC use cases of our customers come in two categories. The most important HPC category of course is simulation. That can mean simulating physical processes. For example, what does a car crash look like under certain parameters? These simulations are done in great detail.

Our other important HPC category is data analytics. For example, that could mean genomic analysis.

How do you work with AMD and Supermicro?

To understand this, you first have to understand how NEC’s HPC business works. For us, there are two aspects to the business.

One, we’ve got our own vector technology. Our NEC vector engine is a PCIe card designed and produced in Japan. The latest incarnation of our vector supercomputer is the NEC SX-Aurora TSUBASA. It was designed to run applications that are both vectorizable and profit from high bandwidth to main memory. One of our big customers in this area is the German weather service, DWD.

The other part of the business is what we call “pizza boxes,” the x86 architecture. For this, we need industry-standard servers, including processors from AMD and servers from Supermicro.

For that second part of the business, what is NEC’s role?

The answer has to do with how the HPC business works operationally. If a customer intends to purchase a new HPC cluster, typically they need expert advice on designing an optimized HPC environment. What they do know is the application they run. And what they want to know is, ‘How do we get the best, most optimized system for this application?’

This implies doing a lot of configuration. Essentially, we optimize the design based on many different components. Even if we know that an AMD processor is the best for a particular task, still, there are dozens of combinations of processor SKUs and server model types which offer different price/performance ratios. The same applies to certain data-storage solutions. For HPC, storage is more than just picking an SSD. What’s needed is a completely different kind of technology.

Configuring and setting up such a complex solution takes a lot of expertise. We’re being asked to run benchmarks. That means the customer says, ‘Here’s my application, please run it on some specific configurations, and tell me which one offers the best price/performance ratio.’ This takes a lot of time and resources. For example, you need the systems on hand to just try it out. And the complete tender process—from pre-sales discussions to actual ordering and delivery—can take anywhere from weeks to months.

And this is just to bid, right? After all this work, you still might not get the order?

Yes, that can happen. There are lots of factors that influence your chances. In general, if you have a good working relationship with a private customer, it’s easier. They have more discretion than academic or public customers. For public bids, everything must be more transparent, because it’s more strictly regulated. Normally, that means you have more work, because you have to test more setups. Your competition will be doing the same.

When working with the second group, the private industry customers, do customer specify parts from specific vendors, such as AMD and Supermicro?

It depends on the factors that will influence the customer’s final selection. Price and performance, that’s one thing. Power consumption is another. Then, sometimes, it’s the vendors. Also, certain projects are more attractive to certain vendors because of market visibility—so-called lighthouse projects. That can have an influence on the conditions we get from vendors. Vendors also honor the amount of effort we have put in to getting the customer in the first place. So there are all sorts of external factors that can influence the final system design.

Also, today, the majority of HPC solutions are similar from an architectural point of view. So the difference between competing vendors is to take all the standard components and optimize from these, instead of providing a competing architecture. As a result, the soft skills—such as the ability to implement HPC solutions in an efficient and professional way—also have a large influence on the final order.

How about power consumption and cooling? Are these important considerations for your HPC customers?

It’s become absolutely vital. As a rule of thumb, we can say that the larger an HPC project is going to be, the more likely that it is going to be cooled by liquid.

In the past, you had a server room that you cooled with air conditioning. But those times are nearly gone. Today, when you think of a larger HPC installation—say, 1,000 or 2,000 nodes—you’re talking about a megawatt of power being consumed, or even more. And that also needs to be cooled.

The challenge in cooling a large environment is to get the heat away from the server and out of the room to somewhere else, whether outside or to a larger cooling system. This cannot be done by traditional cooling with air. Air is too inefficient for transporting heat. Water is much better. It’s a more efficient means for moving heat from Point A to Point B.

How are you cooling HPC systems with liquid?

There are a few ways to do this. There’s cold-water cooling, mainly indirect. You bring in water with what’s known as an “inlet temperature” of about 10 C and it cools down the air inside the server racks, with the heat getting carried away with the water now at about 15 or 20 C. The issue is, first you need energy just to cool the water down to 10 C. Also, there’s not much you can do with water at 15 or 20 C. It’s too warm for cooling anything else, but too cool for heating a room.

That’s why the new approach is to use hot-water cooling, mainly direct. It sounds like a paradox. But what might seem hot to a human being is in fact pretty cool for a CPU. For a CPU, an ambient temperature of 50 or 60 C is fine; it would be absolutely not fine for a human being. So if you have an inlet temperature for water of, say, 40 or 45 C, that will cool the CPU, which runs at an internal temperature of 80 or 90 C. The outbound temperature of the water is then maybe 50 C. Then it becomes interesting. At that temperature, you can heat a building. You can reuse the heat, rather than just throwing it away. So this kind of infrastructure is becoming more important and more interesting.

Looking ahead, what are some of your top projects for the future?

Public customers such as research universities have to replace their HPC systems every three to five years. That’s the normal cycle. In that time the hardware becomes obsolete, especially as the vendors optimize their power consumption to performance ratio more and more. So it’s a steady flow of new projects. For our industrial customers, the same applies, though the procurement cycle may vary.

We’re also starting to see the use of computational HPC capacity from the cloud. Normally, when people think of cloud, they think of public clouds from Amazon, Microsoft, etc. But for HPC, there are interim approaches as well. A decade ago, there was the idea of a dedicated public cloud. Essentially, this meant a dedicated capacity that was for the customer’s exclusive use, but was owned by someone other than the customer. Now, between the dedicated cloud and public cloud, there are all these shades of grey. In the past two years, we’ve implemented several larger installations of this “grey-shaded” cloud approach. So more and more, we’re entering the service-oriented market.

There is a larger trend away from customers wanting to own a system, and toward customers just wanting to utilize capacity. For vendors with expertise in HPC, they have to change as well. Which means a change in the business and the way they have to work with customers. It boils down to, Who owns the hardware? And what does the customer buy, hardware or just services? That doesn’t make you a public-cloud provider. It just means you take over responsibility for this particular customer environment. You have a different business model, contract type, and set of responsibilities.

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Supermicro H13 JumpStart remote access program adds latest AMD EPYC processors

Featured content

Supermicro H13 JumpStart remote access program adds latest AMD EPYC processors

Get remote access to the next generation of AMD-powered servers from Supermicro.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Supermicro’s H13 JumpStart Remote Access program—which lets you use Supermicro servers before you buy—now includes the latest Supermicro H13 systems powered by 4th gen AMD EPYC 9004 processors.

These include servers using the two new AMD EPYC processor series introduced in June. One, previously codenamed Bergamo, is optimized for cloud-native workloads. The other, previously codenamed Genoa-X, is equipped with AMD 3D V-Cache technology and is optimized for technical computing.

Supermicro’s free H13 JumpStart program lets you and your customers validate, test and benchmark workloads remotely on Supermicro H13 systems powered by these new AMD processors.

The latest Supermicro H13 systems deliver performance and density with some cool technologies. These include AMD EPYC processors with up to 128 “Zen 4c” cores per socket, DDR5 memory, PCIe 5.0, and CXL 1.1 peripherals support.

Those AMD Zen 4c cores are designed for the sweet spot of both density and power efficiency. Compared with AMD’s previous generation (Zen 4), the new design offers substantially improved performance per watt.

Get started

Getting started with Supermicro’s H13 JumpStart program is simple. Just sign up with your name, email and a brief description of what you plan to do with the system.

Next, Supermicro will verify your information and your request. Assuming you qualify, you’ll receive a welcome email from Supermicro, and you’ll be scheduled to gain access to the JumpStart server.

Next, you’ll be given a unique username, password and URL to access your JumpStart account. Then you can run your test, try new features, and benchmark your application.

Once you’re done, Supermicro will ask you to complete a quick survey for your feedback on the program. That’s it.

The H13 JumpStart program now offers 3 server configurations. These include Supermicro’s dual-processor 2U Hyper (AS -2025HS-TNR); single-processor 2U Cloud DC (AS -2015CS-TNR); and single-processor 2U Hyper-U (AS -2115HS-TNR).

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Interview: How German system integrator SVA serves high performance computing with AMD and Supermicro

Featured content

Interview: How German system integrator SVA serves high performance computing with AMD and Supermicro

In an interview, Bernhard Homoelle, head of the HPC competence center at German system integrator SVA, explains how his company serves customers with help from AMD and Supermicro. 

Learn More about this topic
  • Applications:
  • Featured Technologies:
  • Featured Companies:
  • SVA System Vertrieb Alexander GmbH

SVA System Vertrieb Alexander GmbH, better known as SVA, is among the leading IT system integrators of Germany. Headquartered in Wiesbaden, the company employs more than 2,700 people in 27 branch offices. SVA’s customers include organizations in automotive, financial services and healthcare.

To learn more about how SVA works jointly with Supermicro and AMD on advanced technologies, PIC managing editor Peter Krass spoke recently with Bernhard Homoelle, head of SVA’s high performance computing (HPC) competence center (pictured above). Their interview has been lightly edited.

For readers outside of Germany, please tell us about SVA?

First of all, SVA is an owner-operated system integrator. We offer high-quality products, we sell infrastructure, we support certain types of implementations, and we offer operational support to help our customers achieve optimum solutions.

We work with partners to figure out what might be the best solution for our customers, rather than just picking one vendor and trying to convince the customer they should use them. Instead, we figure out what is really needed. Then we go in the direction where the customer can really have their requirements met. The result is a good relationship with the customer, even after a particular deal has been closed.

Does SVA focus on specific industries?

While we do support almost all the big industries—automotive, transportation, public sector, healthcare and more—we are not restricted to any specific vertical. Our main business is helping customers solve their daily IT problems, deal with the complexity of new IT systems, and implement new things like AI and even quantum computing. So we’re open to new solutions. We also offer training with some of our partners.

Germany has a robust auto industry. How do you work with these clients?

In general, they need huge HPC clusters and machine learning. For example, autonomous driving demands not only more computing power, but also more storage. We’re talking about petabytes of data, rather than terabytes. And this huge amount of data needs to be stored somewhere and finally processed. That puts pressure on the infrastructure—not just on storage, but also on the network infrastructure as well as on the compute side. For their way into cloud, some these customers are saying, “Okay, offer me HPC as a Service.”

How do you work with AMD and Supermicro?

It’s a really good relationship. We like working with them because Supermicro has all these various types of servers for individual needs. Customers are different, and therefore they have their own requirements. Figuring out what might be the best server for them is difficult if you have limited types of servers available. But with Supermicro, you can get what you have in mind. You don’t have to look for special implementations because they have these already at hand.

We’re also partnering with AMD, and we have access to their benchmark labs, so we can get very helpful information. We start with discussions with the customer to figure out their needs. Typically, we pick up an application from the customer and then use it as a kind of benchmark. Next, we put it on a cluster with different memory, different CPUs, and look for the best solution in terms of performance for their particular application. Based on the findings, we can recommend a specific CPU, number of cores, memory type and size, and more.

With HPC applications, core memory bandwidth is almost as important as the number of cores. AMD’s new Genoa-X processors should help to overcome some of these limitations. And looking ahead, I’m keen to see what AMD will offer with the Instinct MI300.

Are there special customer challenges you’re solving with Supermicro and AMD solutions?

With HPC workloads, our academic customers say, “This is the amount of money available, so how many servers can you really give us for this budget?” Supermicro and AMD really help here with reasonable prices. They’re a good choice for price/performance.

With AI and machine learning, the real issue is software tools. It really depends what kinds of models you can use and how easy it is to use the hardware with those models.

This discussion is not easy, because for many of our customers today, AI means Nvidia. But I really recommend alternatives, and AMD is bringing some alternatives that are great. They offer a fast time to solution, but they also need to be easy to switch to.

How about "green" computing? Is this an important issue for your customers now?

Yes, more and more we’re seeing customers ask for this green computing approach. Typically, a customer has a thermal budget and a power-price budget. They may say, “In five years, the expenses paid for power should not exceed a certain limit.”

In Europe, we also have a supply-chain discussion. Vendors must increasingly provide proof that they’re taking care in their supply chain with issues including child labor and working conditions. This is almost mandatory, especially in government calls. If you’re unable to answer these questions, you’re out of the bid.

With green computing, we see that the power needed for CPUs and GPUs is going up and up. Five years ago, the maximum a CPU could burn was 200W, but now even 400W might not be enough. Some GPUs are as high as 700W, and there are super-chips beyond even that.

All this makes it difficult to use air-cooled systems. Customers can use air conditioning to a certain extent, but there’s only so much air you can press through the rack. Then you need either on-chip water cooling or some kind of immersion cooling. This can help in two dimensions: saving energy and getting density — you can put the components closer together, and you don’t need the big heat sink anymore.

One issue now is that each vendor offers a different cooling infrastructure. Some of our customers run multi-vendor data centers, so this could create a compatibility issue. That’s one reason we’re looking into immersion cooling. We think we could do some of our first customer implementations in 2024.

Looking ahead, what do you see as a big challenge?

One area is that we want to help customers get easier access to their HPC clusters. That’s done on the software side.

In contrast to classic HPC users, machine learning and AI engineers are not that interested in Linux stuff, compiler options or any other infrastructure details. Instead, they’d like to work on their frameworks. The challenge is getting them to their work as easily as possible—so that they can just log in, and they’re in their development environment. That way, they won’t have to care about what sort of operating system is underneath or what kind of scheduler, etc., is running.

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

How AMD and Supermicro are working together to help you deliver AI

Featured content

How AMD and Supermicro are working together to help you deliver AI

AMD and Supermicro are jointly offering high-performance AI alternatives with superior price and performance.

Learn More about this topic
  • Applications:
  • Featured Technologies:

When it comes to building AI systems for your customers, a certain GPU provider with a trillion-dollar valuation isn’t the only game in town. You should also consider the dynamic duo of AMD and Supermicro, which are jointly offering high-performance AI alternatives with superior price and performance.

Supermicro’s Universal GPU systems are designed specifically for large-scale AI and high-performance computing (HPC) applications. Some of these modular designs come equipped with AMD’s Instinct MI250 Accelerator and have the option of being powered by dual AMD EPYC processors.

AMD, with a newly formed AI group led by Victor Peng, is working hard to enable AI across many environments. The company has developed an open software stack for AI, and it has also expanded its partnerships with AI software and framework suppliers that now include the PyTorch Foundation and Hugging Face.

AI accelerators

In addition, AMD’s Instinct MI300A data-center accelerator is due to ship in this year’s fourth quarter. It’s the successor to AMD’s MI200 series, based on the company’s CDNA 2 architecture and first multi-die CPU, which powers some of today’s fastest supercomputers.

The forthcoming Instinct MI300A is based on AMD’s CDNA 3 architecture for AI and HPC workloads, which uses 5nm and 6nm process tech and advanced chiplet packaging. Under the MI300A’s hood, you’ll find 24 processor cores with Zen 4 tech, as well as 128GB of HBM3 memory that’s shared by the CPU and GPU. And it supports AMD ROCm 5, a production-ready, open source HPC and AI software stack.

Earlier this month, AMD introduced another member of the series, the AMD Instinct MI300X. It replaces three Zen 4 CPU chiplets with two CDNA 3 chiplets to create a GPU-only system. Announced at AMD’s recent Data Center and AI Technology Premier event, the MI300X is optimized for large language models (LLMs) and other forms of AI.

To accommodate the demanding memory needs of generative AI workloads, the new AMD Instinct MI300X also adds 64GB of HBM3 memory, for a new total of 192GB. This means the system can run large models directly in memory, reducing the number of GPUs needed, speeding performance, and reducing the user’s total cost of ownership (TCO).

AMD also recently introduced the AMD Instinct Platform, which puts eight MI300X systems and 1.5TB of memory in a standard Open Compute Project (OCP) infrastructure. It’s designed to drop into an end user’s current IT infrastructure with only minimal changes.

All this is coming soon. The AMD MI300A started sampling with select customers earlier this quarter. The MI300X and Instinct Platform are both set to begin sampling in the third quarter. Production of the hardware products is expected to ramp in the fourth quarter.

KT’s cloud

All that may sound good in theory, but how does the AMD + Supermicro combination work in the real world of AI?

Just ask KT Cloud, a South Korea-based provider of cloud services that include infrastructure, platform and software as a service (IaaS, PaaS, SaaS). With the rise of customer interest in AI, KT Cloud set out to develop new XaaS customer offerings around AI, while also developing its own in-house AI models.

However, as KT embarked on this AI journey, the company quickly encountered three major challenges:

  • The high cost of AI GPU accelerators: KT Cloud would need hundreds of thousands of new GPU servers.
  • Inefficient use of GPU resources in the cloud: Few cloud providers offer GPU virtualization due to overhead. As a result, most cloud-based GPUs are visible to only 1 virtual machine, meaning they cannot be shared by multiple users.
  • Difficulty using large GPU clusters: KT is training Korean-language models using literally billions of parameters, requiring more than 1,000 GPUs. But this is complex: Users would need to manually apply parallelization strategies and optimizations techniques.

The solution: KT worked with Moreh Inc., a South Korean developer of AI software, and AMD to design a novel platform architecture powered by AMD’s Instinct MI250 Accelerators and Moreh’s software.

The entire AI software stack was developed by Moreh from PyTorch and TensorFlow APIs to GPU-accelerated primitive operations. This overcomes the limitations of cloud services and large AI model training.

Users do not need to insert or modify even a single line of existing source code for the MoAI platform. They also do not need to change the method of running a PyTorch/TensorFlow program.

Did it work?

In a word, yes. To test the setup, KT developed a Korean language model with 11 billion parameters. Training was then done on two machines: one using Nvidia GPUs, the other being the AMD/Moreh cluster equipped with AMD Instinct MI250 accelerators, Supermicro Universal GPU systems, and the Moreh AI platform software.

Compared with the Nvidia system, the Moreh solution with AMD Instinct accelerators showed 116% throughput (as measured by tokens trained per second), and 2.05x higher cost-effectiveness (measured as throughput per dollar).

Other gains are expected, too. “With cost-effective AMD Instinct accelerators and a pay-as-you-go pricing model, KT Cloud expects to be able to reduce the effective price of its GPU cloud service by 70%,” says JooSung Kim, VP of KT Cloud.

Based on this test, KT built a larger AMD/Moreh cluster of 300 nodes—with a total of 1,200 AMD MI250 GPUs—to train the next version of the Korean language model with 200 billion parameters.

It delivers a theoretical peak performance of 434.5 petaflops for fp16/bf16 (a native 16-bit format for mixed-precision training) matrix operations. That should make it one of the top-tier GPU supercomputers in the world.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: Green Computing, Part 2 — Holistic strategies

Featured content

Tech Explainer: Green Computing, Part 2 — Holistic strategies

Holistic green computing strategies can help both corporate and individual users make changes for the better.

Learn More about this topic
  • Applications:
  • Featured Technologies:

Green computing allows us to align the technology that powers our lives with the sustainability goals necessary to battle the climate crisis.

In Part 1 of our Tech Explainer on green computing, we looked at data-center architecture best practices and component-level green engineering. Now we’ll investigate holistic green computing strategies that can help both corporate and individual users change for the better.

Green manufacturing and supply chain

The manufacturing process can account for up to 70% of the natural resources used in the lifecycle of a PC, server or other digital device. And an estimated 76% of all global trade passes through a supply chain. So it’s more important than ever to reform processes that could harm the environment.

AMD’s efforts to advance environmental sustainability in partnership with its suppliers is a step in the right direction. The AMD Supply Chain is currently on track to ensure two important goals: that 80% of its suppliers source renewable energy, and that 100% make public their emissions-reduction goals, both by 2025.

To reduce the environmental impact of IT manufacturing, tech providers are replacing the toxic chemicals used in computer manufacturing with alternatives that are more environmentally friendly.

Materials such as the brominated flame retardants found in plastic casings are giving way to eco-friendly, non-toxic silicone compounds. Traditional non-recyclable plastic parts are being replaced by parts made from both bamboo and recyclable plastics, such as polycarbonate resins. And green manufacturers are working to eliminate other toxic chemicals, including lead in solder and cadmium and selenium in circuit boards.

Innovation in green manufacturing can identify and improve hundreds, if not thousands, of industry-standard practices. No matter how small an improvement is when employed to create millions of devices, it can make a big difference.

Green enterprise

Today’s enterprise data-center managers are working to maximize server performance while also minimizing their environmental impact. Leading-edge green methodologies include two important moves: reducing power usage at the server level and extending hardware lifecycles to create less waste.

Supermicro, an authority on energy-efficient data center design, is empowering this movement by creating new servers engineered for green computing.

One such server is Supermicro’s 4-node BigTwin. The BigTwin features disaggregated server architecture that reduces e-waste by enabling subsystem upgrades.

As technology improves, IT managers can replace components like the CPU, GPU and memory. This extends the life of the chassis, power supplies and cooling systems that might otherwise end up in a landfill.

Twin and Blade server architectures are more efficient because they share power supplies and fans. This can significantly lower their power usage, making them a better choice for green data centers.

The upgraded components that go into these servers now include high-efficiency processors like the AMD EPYC 9654. The infographic below, courtesy of AMD, shows how 4th Gen AMD EPYC processors can power 2,000 virtual machines using up to 35% fewer servers than the competition:

EPYC green infographic

As shown, the potential result is up to 29% less energy consumed annually. That kind of efficiency can save an estimated 35 tons of carbon dioxide—the equivalent of 38 acres of U.S. forest carbon sequestration every year.

Green data centers also employ advanced cooling systems. For instance, Supermicro’s servers include optional liquid cooling. Using fluid to carry heat away from critical components allows IT managers to lower fan speeds inside each server and reduce HVAC usage in data centers.

Deploying efficient cooling systems like these lowers a data center’s Power Usage Effectiveness (PUE), thus reducing carbon emissions from power generation.

Changing for the better, together

No single person, corporation or government can stave off the worst effects of climate crisis. If we are to win this battle, we must work together.

Engineers, industrial designers and data scientists have their work cut out for them. By fueling the evolution of green computing, they—and their corporate managers—can provide us with the tools we need to go green and safeguard our environment for generations to come.

Do more:

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Tech Explainer: Green Computing, Part 1 - What does the data center demand?

Featured content

Tech Explainer: Green Computing, Part 1 - What does the data center demand?

The ultimate goal of Green Computing is net-zero emissions. To get there, organizations can and must innovate, conducting an ongoing campaign to increase efficiency and reduce waste.

Learn More about this topic
  • Applications:
  • Featured Technologies:

The Green Computing movement has begun in earnest and not a moment too soon. As humanity faces the existential threat of climate crisis, technology needs to be part of the solution. Green computing is a big step in the right direction.

The ultimate goal of Green Computing is net-zero emissions. It’s a symbiotic relationship between technology and nature in which both SMBs and enterprises can offset carbon emissions, drastically reduce pollution, and reuse/recycle the materials that make up their products and services.

To get there, the tech industry will need to first take a long, hard look at the energy it uses and the waste it produces. Using that information, individual organizations can and must innovate, conducting an ongoing campaign to increase efficiency and reduce waste.

It’s a lofty goal, sure. But after all the self-inflicted damage we’ve done since the dawn of the Industrial Revolution, we simply have no choice.

The data-center conundrum

All digital technology requires electricity to operate. But data centers use more than their share.

Here’s a startling fact: Each year, the world’s data centers gobble up at least 200 terawatts of energy. That’s roughly 2% of all the electricity used on this planet annually.

What’s more, that figure is likely to increase as new, power-hungry systems are brought online and new data centers are opened. And the number of global data centers could grow from 700 in 2021 to as many as 1,200 by 2026, predicts Supermicro.

At that rate, data-center energy consumption could account for up to 8% of global energy usage by 2030. That’s why tech leaders including AMD and Supermicro are rewriting the book on green computing best practices.

A Supermicro white paper, Green Computing: Top 10 Best Practices For A Green Data Center, suggests specific actions you and your customers can take now to reduce the environmental impact of your data centers:

  • Right-size systems to match workload requirements
  • Share common scalable infrastructure
  • Operate at higher ambient temperature
  • Capture heat at the source via aisle containment and liquid cooling
  • Optimize key components (i.e., CPU, GPU, SSD, etc.) for workload performance per watt
  • Optimize hardware refresh cycle to maintain efficiency
  • Optimize power delivery
  • Utilize virtualization and power management
  • Source renewable energy and green manufacturing
  • Consider climate impact when making site selection

Green components

Rethinking data-center architectures is an excellent way to leverage green computing from a macro perspective. But to truly make a difference, the industry needs to consider green computing at the component level.

This is one area where AMD is leading the charge. Its mission: increase the energy efficiency of its CPUs and hardware accelerators. The rest of the industry should follow suit.

In 2021 AMD announced its goal to deliver a 30x increase in energy efficiency for both AMD EPYC CPUs and AMD Instinct accelerators for AI and HPC applications running on accelerated compute nodes—and to do so by 2025.

Taming AI energy usage

The golden age of AI has begun. New machine learning algorithms will give life to a population of hyper-intelligent robots that will forever alter the nature of humanity. If AI’s most beneficent promises come to fruition, it could help us live, eat, travel, learn and heal far better than ever before.

But the news isn’t all good. AI has a dark side, too. Part of that dark side is its potential impact on our climate crisis.

Researchers at the University of Massachusetts, Amherst, illustrated this point by performing a life-cycle assessment for training several large AI models. Their findings, published by Supermicro, concluded that training a single AI model can emit more than 626,000 pounds of carbon dioxide. That’s approximately 5 times the lifetime emissions of your average American car.

A comparison like that helps put AMD’s environmental sustainability goals in perspective. Affecting a 30x energy efficiency increase in the components that power AI could bring some much-needed light to AI’s dark side.

In fact, if the whole technology sector produces practical innovations similar to those from AMD and Supermicro, we might have a fighting chance in the battle against climate crisis.

Continued…

Part 2 of this 3-part series will take a closer look at the technology behind green computing—and the world-saving innovations we could see soon.

 

Featured videos


Events


Find AMD & Supermicro Elsewhere

Related Content

Pages