Tech Explainer: What’s the difference between AI training and AI inference?

AI training and inference may be two sides of the same coin, but their compute needs can be quite different. 

  • April 16, 2024 | Author: KJ Jacoby
Learn More about this topic

Article Key

Artificial Intelligence (AI) training and inference are two sides of the same coin. Training is the process of teaching an AI model how to perform a given task. Inference is the AI model in action, drawing its own conclusions without human intervention.

Take a theoretical machine learning (ML) model designed to detect counterfeit one-dollar bills. During the training process, AI engineers would feed the model large data sets containing thousands, or even millions, of pictures. And tell the training application which are real and which are counterfeit.

Then inference could kick in. The AI model could be uploaded to retail locations, then run to detect bogus bills.

A deeper look at training

That’s the high level. Let’s dig in a bit deeper.

Continuing with our bogus-bill detecting workload, during training, the pictures fed to the AI model would include annotations telling the AI how to think about each piece of data.

For instance, the AI might see a picture of a dollar bill with an embedded annotation that essentially tells the model “this is legal tender.” The annotation could also identify characteristics of a genuine dollar, such as the minute details of the printed iconography and the correct number of characters in the bill’s serial number.

Engineers might also feed the AI model pictures of counterfeit bills. That way, the model could learn the tell-tale signs of a fake. These might include examples of incomplete printing, color discrepancies and missing watermarks.

On to inference

One the training is complete, inference can take over.

Still with our example of counterfeit detection, the AI model could now be uploaded to the cloud, then connected with thousands of point-of-sale (POS) devices in retail locations worldwide.

Retail workers would scan any bill they suspect might be fake. The machine learning model, in turn, would then assess the bill’s legitimacy.

This process of AI inference is autonomous. In other words, once the AI enters inference, it’s no longer getting help from engineers and app developers.

Using our example, during inference the AI system has reached the point where it can reliably discern both legal and counterfeit bills. And it can do so with a high enough success percentage to satisfy its human controllers.

Different needs

AI training and inference also have different technology requirements. Basically, training is far more resource-intensive. The focus is on achieving low-latency operation and brute force.

Training a large language model (LLM) chatbot like the popular ChatGPT often forces its underlying technology to contend with more than a trillion parameters. An AI parameter is a variable learned by the LLM during training. These parameters include configuration settings and components that define the LLM’s behavior.)

To meet these requirements, IT operations must deploy a system that can bring to bear raw computational power in a vast cluster.

By contrast, inference applications have different compute requirements. “Essentially, it’s, ‘I’ve trained my model, now I want to organize it,’” explained AMD executive VP and CTO Mark Papermaster in a recent virtual presentation.

AMD’s dual-processor solution

Inferencing workloads are both more concise and less demanding than those for training. Therefore, it makes sense to run them on more affordable GPU-CPU combination technology like the AMD Instinct MI300A.

The AMD Instinct MI300A is an accelerated processing unit (APU) that combines the facility of a standard AI accelerator with the efficiency of AMD EPYC processors. Both the CPU and GPU elements can share memory, dramatically enhancing efficiency, flexibility and programmability.

A single AMD MI300A APU packs 228 GPU compute units, 24 of AMD’s ‘Zen 4’ CPU cores, and 128GB of unified HBM3 memory. Compared with the previous-generation AMD MI250X accelerators, this translates to approximately 2.6x the workload performance per watt using FP32.

That’s a significant increase in performance. It’s likely to be repeated as AI infrastructure evolves along with the proliferation of AI applications that now power our world.

Do more:

 

 

Related Content