EMR on EKS is a deployment option in EMR that allows you to automate the provisioning and management of open-source big data frameworks on EKS. There are several advantages of running optimized spark runtime provided by Amazon EMR on EKS such as 3x faster performance, fully managed lifecycle of these jobs, built-in monitoring and logging functionality, integrates securely with Kubernetes and more. Because Kubernetes can natively run Spark jobs, if you use multi-tenant EKS environment (shared with other micro-services), your spark jobs are deployed in seconds vs minutes when compared to EC2 based deployments.

In this module, we will review how to setup your EKS cluster and run a sample spark job, setup monitoring and logging for these jobs, configure autoscaling, use Kubernetes node selectors for jobs that need to meet certain constraints such as run in single-az, use spot best practices for running EMR on EKS, and use the serverless compute engine AWS Fargate with Amazon EKS to support EMR workloads.

For more hands-on labs, see the dedicated EMR on EKS Workshop.