Platform Engineering
15 min readMay 9, 2026

Terraform for Kubernetes: Managing EKS with Infrastructure as Code

Provisioning EKS with Terraform gives you reproducible clusters, version-controlled configuration, and a clear audit trail. It also introduces ordering problems, state drift, and IAM deadlocks that don't exist when you click through the console. Here's how to structure Terraform for EKS without creating operational debt.

CO
Coding Protocols Team
Platform Engineering
Terraform for Kubernetes: Managing EKS with Infrastructure as Code

Terraform is the standard way to provision EKS clusters. The AWS-managed EKS module (terraform-aws-modules/eks/aws) handles most of the complexity, but the way you structure the Terraform around it — state boundaries, IAM management, add-on ordering, and Kubernetes resource management — determines whether your EKS Terraform is maintainable or a tangle of depends_on blocks and manual state surgery.

This guide covers a production-ready EKS Terraform structure with the architectural decisions explained, not just the code.


The Core Problem: Ordering and State

EKS Terraform has an inherent ordering challenge: some resources depend on the cluster existing before they can be created, but they also need to be managed by Terraform for reproducibility.

The specific problem:

  1. aws_eks_cluster must exist before you can configure its Kubernetes resources
  2. Some Kubernetes resources (RBAC, ConfigMaps, namespaces) must exist before your applications deploy
  3. IAM roles referenced by Kubernetes service accounts must exist before the pods that use them start
  4. EKS add-ons (VPC CNI, CoreDNS, kube-proxy) must be configured before workload pods schedule

Putting everything in one Terraform state file makes terraform apply fragile — a single failed resource can prevent everything downstream from applying. Splitting into separate state files creates explicit dependency boundaries and allows targeted applies.


terraform/
├── 00-vpc/               # VPC, subnets, NAT gateways, route tables
│   └── main.tf
├── 01-iam/               # IAM roles for EKS cluster, node groups, Karpenter
│   ├── cluster-role.tf
│   ├── node-role.tf
│   └── karpenter.tf
├── 02-eks-cluster/       # EKS cluster, managed node groups, EKS add-ons
│   └── main.tf
├── 03-cluster-config/    # Kubernetes-level config: namespaces, RBAC, aws-auth
│   └── main.tf
├── 04-platform-addons/   # Karpenter, Cert-Manager, External Secrets Operator
│   └── main.tf
└── 05-workloads/         # Application namespaces, quotas, team RBAC
    └── main.tf

Each layer reads outputs from the previous layer via remote state data sources. Changes to IAM don't trigger a plan on workloads. Cluster upgrades (layer 02) don't affect add-on configuration (layer 04).

The Karpenter IAM deadlock documented in Karpenter IAM Deadlock: How We Broke Our EKS Cluster with a Terraform Apply is a direct result of mixing IAM resources and live Karpenter configuration in the same state file. Separating them eliminates the race.


Layer 00: VPC

Use the terraform-aws-modules/vpc/aws module. Key settings for EKS:

hcl
1module "vpc" {
2  source  = "terraform-aws-modules/vpc/aws"
3  version = "~> 5.0"
4
5  name = "${var.cluster_name}-vpc"
6  cidr = "10.0.0.0/16"
7
8  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
9  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
10  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
11
12  enable_nat_gateway     = true
13  single_nat_gateway     = false    # One NAT per AZ for HA
14  enable_dns_hostnames   = true
15  enable_dns_support     = true
16
17  # Required tags for EKS subnet discovery
18  private_subnet_tags = {
19    "kubernetes.io/role/internal-elb"               = "1"
20    "kubernetes.io/cluster/${var.cluster_name}"     = "shared"
21    "karpenter.sh/discovery"                        = var.cluster_name
22  }
23
24  public_subnet_tags = {
25    "kubernetes.io/role/elb"                        = "1"
26    "kubernetes.io/cluster/${var.cluster_name}"     = "shared"
27  }
28}

The karpenter.sh/discovery tag on private subnets is required for Karpenter's EC2NodeClass subnet discovery. The kubernetes.io/role/internal-elb tag is required for the AWS Load Balancer Controller to find private subnets for internal load balancers.


Layer 01: IAM

Separate IAM resources from the cluster itself. This layer runs first and rarely changes:

hcl
1# EKS cluster IAM role
2resource "aws_iam_role" "cluster" {
3  name = "${var.cluster_name}-cluster-role"
4
5  assume_role_policy = jsonencode({
6    Version = "2012-10-17"
7    Statement = [{
8      Action = "sts:AssumeRole"
9      Effect = "Allow"
10      Principal = {
11        Service = "eks.amazonaws.com"
12      }
13    }]
14  })
15}
16
17resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
18  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
19  role       = aws_iam_role.cluster.name
20}
21
22# EKS node IAM role
23resource "aws_iam_role" "nodes" {
24  name = "${var.cluster_name}-node-role"
25
26  assume_role_policy = jsonencode({
27    Version = "2012-10-17"
28    Statement = [{
29      Action = "sts:AssumeRole"
30      Effect = "Allow"
31      Principal = {
32        Service = "ec2.amazonaws.com"
33      }
34    }]
35  })
36}
37
38resource "aws_iam_role_policy_attachment" "nodes_AmazonEKSWorkerNodePolicy" {
39  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
40  role       = aws_iam_role.nodes.name
41}
42
43resource "aws_iam_role_policy_attachment" "nodes_AmazonEKS_CNI_Policy" {
44  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
45  role       = aws_iam_role.nodes.name
46}
47
48resource "aws_iam_role_policy_attachment" "nodes_AmazonEC2ContainerRegistryReadOnly" {
49  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
50  role       = aws_iam_role.nodes.name
51}
52
53# Karpenter controller IAM role (EKS Pod Identity)
54resource "aws_iam_role" "karpenter" {
55  name = "KarpenterControllerRole-${var.cluster_name}"
56
57  assume_role_policy = jsonencode({
58    Version = "2012-10-17"
59    Statement = [{
60      Effect = "Allow"
61      Principal = {
62        Service = "pods.eks.amazonaws.com"
63      }
64      Action = ["sts:AssumeRole", "sts:TagSession"]
65    }]
66  })
67}
68
69# Attach Karpenter controller policy (from CloudFormation or inline)
70resource "aws_iam_role_policy" "karpenter_controller" {
71  name   = "KarpenterControllerPolicy"
72  role   = aws_iam_role.karpenter.id
73  policy = data.aws_iam_policy_document.karpenter_controller.json
74}

Layer 02: EKS Cluster

hcl
1module "eks" {
2  source  = "terraform-aws-modules/eks/aws"
3  version = "~> 20.0"
4
5  cluster_name    = var.cluster_name
6  cluster_version = "1.31"
7
8  cluster_endpoint_public_access       = true
9  cluster_endpoint_private_access      = true
10  cluster_endpoint_public_access_cidrs = var.allowed_cidrs    # Restrict to known IPs
11
12  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id
13  subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnet_ids
14
15  iam_role_arn = data.terraform_remote_state.iam.outputs.cluster_role_arn
16
17  # EKS Managed Add-ons — pin versions explicitly
18  cluster_addons = {
19    coredns = {
20      addon_version               = "v1.11.3-eksbuild.1"
21      resolve_conflicts_on_create = "OVERWRITE"
22      resolve_conflicts_on_update = "PRESERVE"
23    }
24    kube-proxy = {
25      addon_version               = "v1.31.2-eksbuild.3"
26      resolve_conflicts_on_update = "PRESERVE"
27    }
28    vpc-cni = {
29      addon_version               = "v1.19.0-eksbuild.1"
30      resolve_conflicts_on_update = "PRESERVE"
31      configuration_values = jsonencode({
32        env = {
33          ENABLE_PREFIX_DELEGATION = "true"    # Increases pod density per node
34          WARM_PREFIX_TARGET       = "1"
35        }
36      })
37    }
38    aws-ebs-csi-driver = {
39      addon_version            = "v1.37.0-eksbuild.1"
40      service_account_role_arn = data.terraform_remote_state.iam.outputs.ebs_csi_role_arn
41    }
42  }
43
44  # Initial node group — just enough to bootstrap Karpenter
45  eks_managed_node_groups = {
46    bootstrap = {
47      name           = "bootstrap"
48      instance_types = ["m5.large"]
49      min_size       = 2
50      max_size       = 4
51      desired_size   = 2
52
53      iam_role_arn = data.terraform_remote_state.iam.outputs.node_role_arn
54
55      labels = {
56        role = "bootstrap"
57      }
58
59      taints = [{
60        key    = "CriticalAddonsOnly"
61        value  = "true"
62        effect = "NO_SCHEDULE"
63      }]
64    }
65  }
66
67  # Access Entries (replaces aws-auth ConfigMap)
68  access_entries = {
69    platform_admins = {
70      kubernetes_groups = []
71      principal_arn     = "arn:aws:iam::${var.account_id}:role/PlatformAdminRole"
72      policy_associations = {
73        cluster_admin = {
74          policy_arn   = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
75          access_scope = { type = "cluster" }
76        }
77      }
78    }
79  }
80}

Key decisions here:

  • Pin add-on versions explicitly. most_recent = true can upgrade your VPC CNI on a terraform apply during a maintenance freeze.
  • resolve_conflicts_on_update = "PRESERVE" — keeps custom configurations on add-ons rather than overwriting them on every apply.
  • Bootstrap node group with a taint — system pods schedule here; Karpenter provisions all workload nodes. The bootstrap node group is small and fixed.
  • Access Entries instead of aws-auth — the new API for cluster access, avoids ConfigMap management.

Layer 03: Cluster Configuration

Kubernetes-level resources managed with the Kubernetes and Helm Terraform providers:

hcl
1provider "kubernetes" {
2  host                   = data.terraform_remote_state.eks.outputs.cluster_endpoint
3  cluster_ca_certificate = base64decode(data.terraform_remote_state.eks.outputs.cluster_ca)
4  exec {
5    api_version = "client.authentication.k8s.io/v1beta1"
6    command     = "aws"
7    args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
8  }
9}
10
11# Platform namespaces
12resource "kubernetes_namespace" "monitoring" {
13  metadata {
14    name = "monitoring"
15    labels = {
16      "platform.example.com/role"                          = "monitoring"
17      "pod-security.kubernetes.io/enforce"                 = "privileged"
18    }
19  }
20}
21
22resource "kubernetes_namespace" "ingress" {
23  metadata {
24    name = "ingress-nginx"
25    labels = {
26      "platform.example.com/role"                          = "ingress"
27      "pod-security.kubernetes.io/enforce"                 = "baseline"
28    }
29  }
30}
31
32# Default StorageClass
33resource "kubernetes_storage_class" "gp3" {
34  metadata {
35    name = "gp3"
36    annotations = {
37      "storageclass.kubernetes.io/is-default-class" = "true"
38    }
39  }
40  storage_provisioner    = "ebs.csi.aws.com"
41  reclaim_policy         = "Retain"
42  volume_binding_mode    = "WaitForFirstConsumer"
43  allow_volume_expansion = true
44  parameters = {
45    type      = "gp3"
46    encrypted = "true"
47  }
48}

Layer 04: Platform Add-ons

Karpenter, Cert-Manager, External Secrets Operator — installed via Helm provider:

hcl
1provider "helm" {
2  kubernetes {
3    host                   = data.terraform_remote_state.eks.outputs.cluster_endpoint
4    cluster_ca_certificate = base64decode(data.terraform_remote_state.eks.outputs.cluster_ca)
5    exec {
6      api_version = "client.authentication.k8s.io/v1beta1"
7      command     = "aws"
8      args        = ["eks", "get-token", "--cluster-name", var.cluster_name]
9    }
10  }
11}
12
13resource "helm_release" "karpenter" {
14  name       = "karpenter"
15  repository = "oci://public.ecr.aws/karpenter"
16  chart      = "karpenter"
17  version    = "1.3.3"
18  namespace  = "kube-system"
19
20  set {
21    name  = "settings.clusterName"
22    value = var.cluster_name
23  }
24  set {
25    name  = "settings.interruptionQueue"
26    value = var.cluster_name
27  }
28  set {
29    name  = "controller.resources.requests.cpu"
30    value = "1"
31  }
32  set {
33    name  = "controller.resources.requests.memory"
34    value = "1Gi"
35  }
36
37  depends_on = [
38    # Karpenter IAM role must exist before the Helm release
39    data.terraform_remote_state.iam
40  ]
41}
42
43# Karpenter Pod Identity association
44resource "aws_eks_pod_identity_association" "karpenter" {
45  cluster_name    = var.cluster_name
46  namespace       = "kube-system"
47  service_account = "karpenter"
48  role_arn        = data.terraform_remote_state.iam.outputs.karpenter_role_arn
49
50  depends_on = [helm_release.karpenter]
51}

Common Terraform EKS Mistakes

Using null_resource with local-exec for kubectl commands. null_resource with local-exec and kubectl apply in Terraform is fragile — it runs outside Terraform's state model, doesn't handle idempotency correctly, and breaks on plan/apply separation. Use the Kubernetes or Helm provider for Kubernetes resources.

Putting all resources in one state file. A failed IAM policy attachment rolling back a CoreDNS upgrade that rolling back a VPC subnet tag. Separate states prevent unrelated failures from cascading.

desired_size in managed node groups managed by Terraform. After Karpenter or Cluster Autoscaler changes the desired node count, Terraform will want to reset it to the desired_size in the config on the next apply. Use ignore_changes = [scaling_config[0].desired_size] in the node group config to prevent Terraform from fighting the autoscaler.

hcl
lifecycle {
  ignore_changes = [
    scaling_config[0].desired_size
  ]
}

Not pinning provider versions. AWS provider 5.x has breaking changes from 4.x for EKS resources. Pin provider versions in versions.tf and update intentionally.


Frequently Asked Questions

Should I use the community EKS module or write my own?

Use terraform-aws-modules/eks/aws. It's battle-tested, widely used, and kept up to date with EKS API changes. Writing your own EKS module from scratch means maintaining a significant surface area. The community module's opinions (access entries, managed node groups, add-on management) are good defaults.

How do I handle Terraform state when rotating the cluster?

Never delete and recreate an EKS cluster in-place for upgrades — use rolling minor version upgrades (cluster_version update → terraform apply). For major migrations (moving to a new cluster), provision the new cluster in parallel, migrate workloads, then destroy the old. Don't try to use terraform state mv to move a cluster between state files.

Can I manage Kyverno policies and NetworkPolicies with Terraform?

Yes, via the Kubernetes provider. Whether you should depends on your GitOps setup. If you're using Argo CD or Flux, manage Kubernetes resources in Git and let the GitOps tool apply them — don't duplicate management in Terraform. If you're not using a GitOps tool, Terraform is a reasonable way to manage cluster-level policies.

How do I handle secrets in Terraform EKS config?

Use Terraform variables backed by AWS Secrets Manager or SSM Parameter Store, not hardcoded values or .tfvars files committed to Git. The Terraform AWS provider can read from SSM:

hcl
data "aws_ssm_parameter" "db_password" {
  name = "/production/db/password"
}

For Kubernetes Secrets managed by Terraform, use sensitive = true on the variable and enable Terraform state encryption — Terraform state contains the plaintext value.


For a getting-started guide that covers a single-state EKS cluster with managed node groups, EKS addons, and Pod Identity, see Terraform for EKS: Complete Infrastructure as Code Guide. For the production multi-state architecture described in this post, the versioned addon approach, and full environment separation, see this post. For the Karpenter setup that follows cluster provisioning, see How to Install Karpenter on EKS. For the IAM race condition to avoid during Terraform applies, see Karpenter IAM Deadlock: How We Broke Our EKS Cluster with a Terraform Apply.

OpenTofu compatibility: All examples in this post are compatible with OpenTofu, the open-source Terraform fork maintained by the Linux Foundation.

Building a Terraform-managed EKS platform from scratch? Talk to us at Coding Protocols — we help platform teams structure EKS Terraform that stays maintainable as the cluster and team grow.

Related Topics

Terraform
EKS
AWS
Infrastructure as Code
Platform Engineering
Kubernetes
DevOps
IaC

Read Next