Cloud Engineering

Provisioning an EKS Cluster with Terraform from Scratch

Intermediate60 min to complete14 min read

Build a production-ready EKS cluster with Terraform: VPC with private subnets, managed node groups, IRSA for pod IAM, and OIDC provider — all in reproducible, reviewable infrastructure code.

Before you begin

  • Terraform >= 1.6 installed
  • AWS CLI configured with sufficient IAM permissions
  • kubectl installed
  • Basic Terraform knowledge (init
  • plan
  • apply)
AWS
EKS
Terraform
Kubernetes
Infrastructure as Code

Creating an EKS cluster through the AWS console is fine for learning. Doing it with Terraform means you can reproduce it, review changes before applying, and destroy it cleanly. This tutorial builds a cluster you'd actually run in production.

What You'll Build

VPC
├── 3 public subnets (one per AZ) — load balancers
├── 3 private subnets (one per AZ) — EKS nodes
└── NAT Gateway — outbound internet from private subnets

EKS Cluster (Kubernetes 1.30)
├── Managed node group — 2–10 nodes, t3.medium
├── OIDC provider — enables IRSA for pods
├── aws-vpc-cni add-on — pod networking
├── coredns add-on — cluster DNS
└── kube-proxy add-on — service networking

Step 1: Project Structure

bash
mkdir eks-cluster && cd eks-cluster

# Create files
touch main.tf vpc.tf eks.tf outputs.tf variables.tf versions.tf

Step 2: versions.tf — Provider Pins

hcl
# versions.tf
terraform {
  required_version = ">= 1.6"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.25"
    }
  }

  # Remote state — replace with your bucket
  backend "s3" {
    bucket = "my-terraform-state-bucket"
    key    = "eks/terraform.tfstate"
    region = "ap-south-1"
  }
}

Step 3: variables.tf

hcl
# variables.tf
variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
  default     = "my-cluster"
}

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "ap-south-1"
}

variable "kubernetes_version" {
  description = "Kubernetes version"
  type        = string
  default     = "1.30"
}

variable "node_instance_type" {
  description = "EC2 instance type for worker nodes"
  type        = string
  default     = "t3.medium"
}

variable "node_min_size" {
  type    = number
  default = 2
}

variable "node_max_size" {
  type    = number
  default = 10
}

variable "node_desired_size" {
  type    = number
  default = 3
}

Step 4: main.tf — AWS Provider

hcl
# main.tf
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      ManagedBy   = "terraform"
      Cluster     = var.cluster_name
      Environment = "production"
    }
  }
}

# Data sources
data "aws_availability_zones" "available" {
  state = "available"
}

data "aws_caller_identity" "current" {}

Step 5: vpc.tf — Network Foundation

hcl
# vpc.tf
locals {
  azs            = slice(data.aws_availability_zones.available.names, 0, 3)
  vpc_cidr       = "10.0.0.0/16"
  public_cidrs   = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  private_cidrs  = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = local.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.cluster_name}-vpc"
    # Required for EKS to discover the VPC
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.cluster_name}-igw" }
}

# Public subnets — for load balancers
resource "aws_subnet" "public" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.public_cidrs[count.index]
  availability_zone = local.azs[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name                     = "${var.cluster_name}-public-${count.index + 1}"
    "kubernetes.io/role/elb" = "1"   # Required for AWS Load Balancer Controller
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
}

# Private subnets — for EKS nodes
resource "aws_subnet" "private" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = local.private_cidrs[count.index]
  availability_zone = local.azs[count.index]

  tags = {
    Name                              = "${var.cluster_name}-private-${count.index + 1}"
    "kubernetes.io/role/internal-elb" = "1"   # For internal load balancers
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }
}

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count  = 3
  domain = "vpc"
  tags   = { Name = "${var.cluster_name}-nat-eip-${count.index + 1}" }
}

# NAT Gateways — one per AZ for HA
resource "aws_nat_gateway" "main" {
  count         = 3
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  tags          = { Name = "${var.cluster_name}-nat-${count.index + 1}" }
  depends_on    = [aws_internet_gateway.main]
}

# Route tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = { Name = "${var.cluster_name}-public-rt" }
}

resource "aws_route_table" "private" {
  count  = 3
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = { Name = "${var.cluster_name}-private-rt-${count.index + 1}" }
}

# Route table associations
resource "aws_route_table_association" "public" {
  count          = 3
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count          = 3
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Step 6: eks.tf — Cluster and Node Groups

hcl
# eks.tf

# IAM role for the EKS control plane
resource "aws_iam_role" "eks_cluster" {
  name = "${var.cluster_name}-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "eks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

# Security group for the cluster API endpoint
resource "aws_security_group" "cluster" {
  name        = "${var.cluster_name}-cluster-sg"
  description = "EKS cluster security group"
  vpc_id      = aws_vpc.main.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "${var.cluster_name}-cluster-sg" }
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.kubernetes_version

  vpc_config {
    subnet_ids              = concat(aws_subnet.private[*].id, aws_subnet.public[*].id)
    security_group_ids      = [aws_security_group.cluster.id]
    endpoint_private_access = true
    endpoint_public_access  = true   # Set to false and use VPN for production
    public_access_cidrs     = ["0.0.0.0/0"]   # Restrict to your office IP for production
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  depends_on = [aws_iam_role_policy_attachment.eks_cluster_policy]
}

# OIDC Provider — enables IRSA (IAM Roles for Service Accounts)
data "tls_certificate" "eks" {
  url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.main.identity[0].oidc[0].issuer
}

# IAM role for worker nodes
resource "aws_iam_role" "node_group" {
  name = "${var.cluster_name}-node-group-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ec2.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "node_group_policies" {
  for_each = toset([
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
  ])

  policy_arn = each.value
  role       = aws_iam_role.node_group.name
}

# Managed Node Group
resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.cluster_name}-main"
  node_role_arn   = aws_iam_role.node_group.arn
  subnet_ids      = aws_subnet.private[*].id

  instance_types = [var.node_instance_type]
  ami_type       = "AL2_x86_64"
  capacity_type  = "ON_DEMAND"

  scaling_config {
    desired_size = var.node_desired_size
    min_size     = var.node_min_size
    max_size     = var.node_max_size
  }

  update_config {
    max_unavailable = 1
  }

  labels = {
    role = "general"
  }

  depends_on = [aws_iam_role_policy_attachment.node_group_policies]

  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]  # Let Cluster Autoscaler manage this
  }
}

# Core EKS Add-ons
resource "aws_eks_addon" "vpc_cni" {
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = "vpc-cni"
  resolve_conflicts_on_update = "PRESERVE"
  depends_on               = [aws_eks_node_group.main]
}

resource "aws_eks_addon" "coredns" {
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = "coredns"
  resolve_conflicts_on_update = "PRESERVE"
  depends_on               = [aws_eks_node_group.main]
}

resource "aws_eks_addon" "kube_proxy" {
  cluster_name             = aws_eks_cluster.main.name
  addon_name               = "kube-proxy"
  resolve_conflicts_on_update = "PRESERVE"
  depends_on               = [aws_eks_node_group.main]
}

Step 7: outputs.tf

hcl
# outputs.tf
output "cluster_name" {
  value = aws_eks_cluster.main.name
}

output "cluster_endpoint" {
  value = aws_eks_cluster.main.endpoint
}

output "cluster_certificate_authority" {
  value     = aws_eks_cluster.main.certificate_authority[0].data
  sensitive = true
}

output "oidc_provider_arn" {
  value = aws_iam_openid_connect_provider.eks.arn
}

output "oidc_provider_url" {
  value = replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")
}

output "configure_kubectl" {
  value = "aws eks update-kubeconfig --region ${var.aws_region} --name ${var.cluster_name}"
}

Step 8: Apply

bash
terraform init

# Check the plan before applying
terraform plan -out=tfplan

# Review: should see ~50 resources to create
# Apply
terraform apply tfplan

Application takes 12–20 minutes — most of the time is the EKS control plane coming up.

Step 9: Connect kubectl

bash
$(terraform output -raw configure_kubectl)

kubectl get nodes
# NAME                          STATUS   ROLES    AGE   VERSION
# ip-10-0-11-xxx.ap-south-1.compute.internal   Ready    <none>   5m    v1.30.x

Step 10: Use the OIDC Provider for IRSA

The OIDC provider ARN is in terraform output oidc_provider_arn. Use it to create IAM roles for pods (see the AWS IRSA tutorial for the full workflow).

bash
OIDC_PROVIDER_ARN=$(terraform output -raw oidc_provider_arn)
OIDC_PROVIDER_URL=$(terraform output -raw oidc_provider_url)

Tear Down

bash
# Scale down node group first (faster)
terraform destroy -target=aws_eks_node_group.main

# Then destroy everything else
terraform destroy

NAT Gateways are expensive — make sure you destroy the cluster when you're done with it.

We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.

Struggling with this in production?

We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.