AWS bills grow because it's easier to overprovision than to tune. A t3.xlarge "just to be safe" instead of t3.medium doubles the instance cost. An RDS db.r5.large with 10% CPU average is wasting 90% of its compute.

This tutorial gives you a systematic process to find the waste and act on it.

The Four Categories of AWS Waste

Overprovisioned resources — right type, wrong size (most common)
Unused resources — running but doing nothing (snapshots, idle EBS, stopped instances)
Wrong purchase type — On-Demand when Reserved or Spot would be cheaper
Wrong storage class — S3 Standard for cold data, gp2 instead of gp3

Step 1: Get the High-Level Picture with Cost Explorer

In the AWS Console → Cost Management → Cost Explorer:

Set date range to last 3 months
Group by Service — see what's driving the bill
Group by Usage Type — see EC2 instances, data transfer, storage separately
Filter to your top spending service, group by Instance Type

Look for:

Instance types with low utilisation (Cost Explorer shows rightsizing recommendations)
Data transfer costs (often hidden, can be 20% of bill)
NAT Gateway data processed (usually avoidable)

Enable Cost Explorer Rightsizing Recommendations: Cost Management → Rightsizing Recommendations. This uses CloudWatch CPU metrics to suggest instance downsizes.

Step 2: Find Underutilised EC2 Instances

bash

# Find instances with avg CPU < 10% over 14 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time $(date -u -v-14d +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 86400 \
  --statistics Average

# Better: use AWS Compute Optimizer
aws compute-optimizer get-ec2-instance-recommendations \
  --query "instanceRecommendations[?finding=='OVER_PROVISIONED'].[instanceArn,recommendationOptions[0].instanceType,utilizationMetrics[0].value]" \
  --output table

AWS Compute Optimizer uses 14 days of CloudWatch data to recommend downsizes. Enable it first:

bash

aws compute-optimizer update-enrollment-status --status Active

Wait 24 hours for it to process your account, then check recommendations:

bash

aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=finding,values=OVER_PROVISIONED

Step 3: Rightsize EKS Pod Resource Requests

In Kubernetes, requests determine which node a pod lands on. Overprovisioned requests leave nodes underutilised — you pay for capacity that sits idle.

Find pods with low CPU utilisation using Prometheus:

promql

# Pods using less than 20% of their CPU request (over 24h)
(
  sum by (pod, namespace) (
    rate(container_cpu_usage_seconds_total{container!=""}[24h])
  )
  /
  sum by (pod, namespace) (
    kube_pod_container_resource_requests{resource="cpu", container!=""}
  )
) < 0.2

promql

# Memory: pods using less than 30% of their memory request
(
  sum by (pod, namespace) (
    container_memory_working_set_bytes{container!=""}
  )
  /
  sum by (pod, namespace) (
    kube_pod_container_resource_requests{resource="memory", container!=""}
  )
) < 0.3

Install the Vertical Pod Autoscaler (VPA) in recommendation mode to get automated suggestions:

bash

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

# Create a VPA object in recommendation mode (won't change pods automatically)
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"   # Recommend only, don't change pods
EOF

# After 24h, check recommendations
kubectl describe vpa my-app-vpa -n production

VPA recommendations:

  Container Recommendations:
    Container Name:  app
    Lower Bound:
      Cpu:     25m
      Memory:  64Mi
    Target:          ← use this
      Cpu:     100m
      Memory:  256Mi
    Upper Bound:
      Cpu:     500m
      Memory:  512Mi
    Uncapped Target:
      Cpu:     87m
      Memory:  230Mi

Update your deployment with the Target values. Then re-check in a week.

Step 4: Find Unused Resources

Idle Elastic IPs (charged when unattached):

bash

aws ec2 describe-addresses \
  --query "Addresses[?AssociationId==null].[AllocationId,PublicIp]" \
  --output table

Unattached EBS volumes:

bash

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query "Volumes[*].[VolumeId,Size,CreateTime]" \
  --output table

Old snapshots (more than 30 days, no associated instance):

bash

aws ec2 describe-snapshots \
  --owner-ids self \
  --query "Snapshots[?StartTime<='$(date -u -v-30d +%Y-%m-%d)'].[SnapshotId,StartTime,VolumeSize]" \
  --output table

Load balancers with no targets:

bash

aws elbv2 describe-target-groups \
  --query "TargetGroups[*].[TargetGroupArn,TargetType]" \
  --output text | while read arn type; do
    count=$(aws elbv2 describe-target-health \
      --target-group-arn $arn \
      --query "length(TargetHealthDescriptions)" \
      --output text)
    if [ "$count" = "0" ]; then
      echo "Empty target group: $arn"
    fi
  done

Old RDS snapshots:

bash

aws rds describe-db-snapshots \
  --query "DBSnapshots[?SnapshotCreateTime<='$(date -u -v-30d +%Y-%m-%d)' && SnapshotType=='manual'].[DBSnapshotIdentifier,AllocatedStorage,SnapshotCreateTime]" \
  --output table

Step 5: Switch gp2 EBS to gp3

gp3 is cheaper than gp2 at the same baseline performance and gives you 3000 IOPS free (vs gp2's baseline that scales with volume size). For most volumes, gp3 is a drop-in replacement that costs 20% less.

bash

# Find all gp2 volumes
aws ec2 describe-volumes \
  --filters Name=volume-type,Values=gp2 \
  --query "Volumes[*].[VolumeId,Size]" \
  --output text | while read vol_id size; do
    echo "Modifying $vol_id ($size GB) to gp3..."
    aws ec2 modify-volume --volume-id $vol_id --volume-type gp3
  done

The modification is live — no downtime, no detaching. It takes a few minutes per volume.

For EKS, update the StorageClass:

yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"

Step 6: Reserved Instances for Predictable Workloads

For EC2 instances that run 24/7, a 1-year Reserved Instance saves 40% over On-Demand. For 3-year, it's 60%.

Only reserve instance types and sizes you're confident won't change. Use Compute Savings Plans instead of RIs for more flexibility — they apply to any instance type and family automatically.

Check your On-Demand spend:

bash

aws ce get-cost-and-usage \
  --time-period Start=2026-01-01,End=2026-04-01 \
  --granularity MONTHLY \
  --filter '{"Dimensions":{"Key":"PURCHASE_TYPE","Values":["On Demand"]}}' \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=INSTANCE_TYPE

For EKS node groups, use Spot instances for stateless workloads with a fallback to On-Demand:

bash

aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name spot-workers \
  --capacity-type SPOT \
  --instance-types t3.medium t3.large t3a.medium \
  --scaling-config minSize=2,maxSize=20,desiredSize=5

Mix of 3+ instance types reduces the chance of Spot interruption in a single AZ.

Tracking Progress

Set a monthly budget alert:

bash

aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Monthly AWS Budget",
    "BudgetLimit": {"Amount": "1000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "you@example.com"}]
  }]'

Review Cost Explorer weekly for the first month after changes. Cost reductions take a full billing cycle to show up clearly.

Rightsizing AWS Costs: Finding and Fixing Overprovisioned Resources

Before you begin

The Four Categories of AWS Waste

Step 1: Get the High-Level Picture with Cost Explorer

Step 2: Find Underutilised EC2 Instances

Step 3: Rightsize EKS Pod Resource Requests

Step 4: Find Unused Resources

Step 5: Switch gp2 EBS to gp3

Step 6: Reserved Instances for Predictable Workloads

Tracking Progress

Struggling with this in production?