Inference with AWS Inferentia

Before you start

Prepare your environment for this section:

~$prepare-environment aiml/inferentia

This will make the following changes to your lab environment:

Installs Karpenter in the Amazon EKS cluster
Creates an S3 Bucket to store results
Creates an IAM Role for the Pods to use
Installs the AWS Neuron device plugin

You can view the Terraform that applies these changes here.

AWS Trainium and Inferentia are custom-built machine learning accelerators designed by Amazon to accelerate and optimize AI model training and inference tasks, respectively, in cloud computing environments.

AWS Neuron is the software development kit (SDK) and runtime that enables developers to optimize and run machine learning models on both Trainium and Inferentia chips. Neuron provides a unified software interface for these custom AI accelerators, allowing developers to take advantage of their performance benefits without having to rewrite their code for each specific chip architecture.

The Neuron device plugin exposes Neuron cores and devices to Kubernetes as a resource. When your workloads require Neuron cores, the Kubernetes scheduler can assign the appropriate node to the workloads. You can even provision the node automatically using Karpenter.

This lab provides a tutorial on how to use Inferentia to accelerate deep learning inference workloads on EKS.

In this lab we will:

Create a Karpenter node pool to provision Inferentia and Trainium EC2 instances
Compile a ResNet-50 pre-trained model for use with AWS Inferentia using a Trainium instance
Upload this model to an S3 Bucket for later use
Launch an inference Pod that uses our previous model to run our inference against

Let's get started.