Provisioning an EKS Cluster with Terraform from Scratch
Build a production-ready EKS cluster with Terraform: VPC with private subnets, managed node groups, IRSA for pod IAM, and OIDC provider — all in reproducible, reviewable infrastructure code.
Before you begin
- Terraform >= 1.6 installed
- AWS CLI configured with sufficient IAM permissions
- kubectl installed
- Basic Terraform knowledge (init
- plan
- apply)
Creating an EKS cluster through the AWS console is fine for learning. Doing it with Terraform means you can reproduce it, review changes before applying, and destroy it cleanly. This tutorial builds a cluster you'd actually run in production.
What You'll Build
VPC
├── 3 public subnets (one per AZ) — load balancers
├── 3 private subnets (one per AZ) — EKS nodes
└── NAT Gateway — outbound internet from private subnets
EKS Cluster (Kubernetes 1.30)
├── Managed node group — 2–10 nodes, t3.medium
├── OIDC provider — enables IRSA for pods
├── aws-vpc-cni add-on — pod networking
├── coredns add-on — cluster DNS
└── kube-proxy add-on — service networking
Step 1: Project Structure
mkdir eks-cluster && cd eks-cluster
# Create files
touch main.tf vpc.tf eks.tf outputs.tf variables.tf versions.tf
Step 2: versions.tf — Provider Pins
# versions.tf
terraform {
required_version = ">= 1.6"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.25"
}
}
# Remote state — replace with your bucket
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "eks/terraform.tfstate"
region = "ap-south-1"
}
}
Step 3: variables.tf
# variables.tf
variable "cluster_name" {
description = "EKS cluster name"
type = string
default = "my-cluster"
}
variable "aws_region" {
description = "AWS region"
type = string
default = "ap-south-1"
}
variable "kubernetes_version" {
description = "Kubernetes version"
type = string
default = "1.30"
}
variable "node_instance_type" {
description = "EC2 instance type for worker nodes"
type = string
default = "t3.medium"
}
variable "node_min_size" {
type = number
default = 2
}
variable "node_max_size" {
type = number
default = 10
}
variable "node_desired_size" {
type = number
default = 3
}
Step 4: main.tf — AWS Provider
# main.tf
provider "aws" {
region = var.aws_region
default_tags {
tags = {
ManagedBy = "terraform"
Cluster = var.cluster_name
Environment = "production"
}
}
}
# Data sources
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_caller_identity" "current" {}
Step 5: vpc.tf — Network Foundation
# vpc.tf
locals {
azs = slice(data.aws_availability_zones.available.names, 0, 3)
vpc_cidr = "10.0.0.0/16"
public_cidrs = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_cidrs = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
}
# VPC
resource "aws_vpc" "main" {
cidr_block = local.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.cluster_name}-vpc"
# Required for EKS to discover the VPC
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = { Name = "${var.cluster_name}-igw" }
}
# Public subnets — for load balancers
resource "aws_subnet" "public" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = local.public_cidrs[count.index]
availability_zone = local.azs[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.cluster_name}-public-${count.index + 1}"
"kubernetes.io/role/elb" = "1" # Required for AWS Load Balancer Controller
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
# Private subnets — for EKS nodes
resource "aws_subnet" "private" {
count = 3
vpc_id = aws_vpc.main.id
cidr_block = local.private_cidrs[count.index]
availability_zone = local.azs[count.index]
tags = {
Name = "${var.cluster_name}-private-${count.index + 1}"
"kubernetes.io/role/internal-elb" = "1" # For internal load balancers
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = 3
domain = "vpc"
tags = { Name = "${var.cluster_name}-nat-eip-${count.index + 1}" }
}
# NAT Gateways — one per AZ for HA
resource "aws_nat_gateway" "main" {
count = 3
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = { Name = "${var.cluster_name}-nat-${count.index + 1}" }
depends_on = [aws_internet_gateway.main]
}
# Route tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = { Name = "${var.cluster_name}-public-rt" }
}
resource "aws_route_table" "private" {
count = 3
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = { Name = "${var.cluster_name}-private-rt-${count.index + 1}" }
}
# Route table associations
resource "aws_route_table_association" "public" {
count = 3
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = 3
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
Step 6: eks.tf — Cluster and Node Groups
# eks.tf
# IAM role for the EKS control plane
resource "aws_iam_role" "eks_cluster" {
name = "${var.cluster_name}-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "eks.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
# Security group for the cluster API endpoint
resource "aws_security_group" "cluster" {
name = "${var.cluster_name}-cluster-sg"
description = "EKS cluster security group"
vpc_id = aws_vpc.main.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = { Name = "${var.cluster_name}-cluster-sg" }
}
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
vpc_config {
subnet_ids = concat(aws_subnet.private[*].id, aws_subnet.public[*].id)
security_group_ids = [aws_security_group.cluster.id]
endpoint_private_access = true
endpoint_public_access = true # Set to false and use VPN for production
public_access_cidrs = ["0.0.0.0/0"] # Restrict to your office IP for production
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
depends_on = [aws_iam_role_policy_attachment.eks_cluster_policy]
}
# OIDC Provider — enables IRSA (IAM Roles for Service Accounts)
data "tls_certificate" "eks" {
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
# IAM role for worker nodes
resource "aws_iam_role" "node_group" {
name = "${var.cluster_name}-node-group-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "node_group_policies" {
for_each = toset([
"arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
"arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
"arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
])
policy_arn = each.value
role = aws_iam_role.node_group.name
}
# Managed Node Group
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.cluster_name}-main"
node_role_arn = aws_iam_role.node_group.arn
subnet_ids = aws_subnet.private[*].id
instance_types = [var.node_instance_type]
ami_type = "AL2_x86_64"
capacity_type = "ON_DEMAND"
scaling_config {
desired_size = var.node_desired_size
min_size = var.node_min_size
max_size = var.node_max_size
}
update_config {
max_unavailable = 1
}
labels = {
role = "general"
}
depends_on = [aws_iam_role_policy_attachment.node_group_policies]
lifecycle {
ignore_changes = [scaling_config[0].desired_size] # Let Cluster Autoscaler manage this
}
}
# Core EKS Add-ons
resource "aws_eks_addon" "vpc_cni" {
cluster_name = aws_eks_cluster.main.name
addon_name = "vpc-cni"
resolve_conflicts_on_update = "PRESERVE"
depends_on = [aws_eks_node_group.main]
}
resource "aws_eks_addon" "coredns" {
cluster_name = aws_eks_cluster.main.name
addon_name = "coredns"
resolve_conflicts_on_update = "PRESERVE"
depends_on = [aws_eks_node_group.main]
}
resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.main.name
addon_name = "kube-proxy"
resolve_conflicts_on_update = "PRESERVE"
depends_on = [aws_eks_node_group.main]
}
Step 7: outputs.tf
# outputs.tf
output "cluster_name" {
value = aws_eks_cluster.main.name
}
output "cluster_endpoint" {
value = aws_eks_cluster.main.endpoint
}
output "cluster_certificate_authority" {
value = aws_eks_cluster.main.certificate_authority[0].data
sensitive = true
}
output "oidc_provider_arn" {
value = aws_iam_openid_connect_provider.eks.arn
}
output "oidc_provider_url" {
value = replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")
}
output "configure_kubectl" {
value = "aws eks update-kubeconfig --region ${var.aws_region} --name ${var.cluster_name}"
}
Step 8: Apply
terraform init
# Check the plan before applying
terraform plan -out=tfplan
# Review: should see ~50 resources to create
# Apply
terraform apply tfplan
Application takes 12–20 minutes — most of the time is the EKS control plane coming up.
Step 9: Connect kubectl
$(terraform output -raw configure_kubectl)
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# ip-10-0-11-xxx.ap-south-1.compute.internal Ready <none> 5m v1.30.x
Step 10: Use the OIDC Provider for IRSA
The OIDC provider ARN is in terraform output oidc_provider_arn. Use it to create IAM roles for pods (see the AWS IRSA tutorial for the full workflow).
OIDC_PROVIDER_ARN=$(terraform output -raw oidc_provider_arn)
OIDC_PROVIDER_URL=$(terraform output -raw oidc_provider_url)
Tear Down
# Scale down node group first (faster)
terraform destroy -target=aws_eks_node_group.main
# Then destroy everything else
terraform destroy
NAT Gateways are expensive — make sure you destroy the cluster when you're done with it.
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.