AWS
15 min readMay 7, 2026

AWS Cost Optimization: EC2 Pricing Models, Spot, Savings Plans, and EKS Cost Control

AWS bills go wrong in predictable ways: on-demand instances running 24/7 that could be Reserved or Spot, EBS volumes attached to stopped instances, NAT Gateway data transfer charges that VPC endpoints would eliminate, and CloudWatch metrics that nobody reads. This covers EC2 pricing models (On-Demand, Reserved, Savings Plans, Spot), right-sizing strategies, EKS-specific cost controls (Karpenter consolidation, Spot node groups, namespace cost allocation), data transfer cost reduction, and the tooling to find what's actually costing you money.

CO
Coding Protocols Team
Platform Engineering
AWS Cost Optimization: EC2 Pricing Models, Spot, Savings Plans, and EKS Cost Control

Most AWS bills have two parts: the cost for what you're using, and the cost for what you're not using anymore (forgotten snapshots, unused load balancers, EBS volumes attached to stopped instances). Optimization addresses both — choosing the right pricing model for sustained usage and cleaning up the waste that accumulates over time.

For EKS workloads specifically, the main levers are: instance type selection, capacity type (Spot vs on-demand), bin packing density (Karpenter consolidation), and eliminating data transfer charges with VPC endpoints.


EC2 Pricing Models

On-Demand

Full list price, no commitment, per-second billing (minimum 60 seconds). Use for:

  • Unpredictable or short-lived workloads
  • Development and testing environments
  • Workloads you're not yet sure will be sustained

On-Demand is the right baseline — start here, then move to commitment pricing once you understand your sustained usage.

Reserved Instances

1-year or 3-year commitment for a specific instance type in a specific region. Up to 72% discount vs on-demand.

TypeFlexibilityDiscountPayment
Standard RIFixed instance type + AZ or regionUp to 72%All upfront, partial upfront, no upfront
Convertible RICan exchange instance family, size, OSUp to 66%Same options

Standard RI: commit to m5.xlarge in us-east-1. If you switch to m5.2xlarge, the RI doesn't apply.

Convertible RI: commit to an equivalent amount of compute value — you can exchange to m5.2xlarge (two m5.xlarge equivalents) at any time. More flexible, slightly less discount.

Reserved Instances are purchased per-instance, which makes them hard to manage at scale. Savings Plans solve this.

Savings Plans

Savings Plans commit to a consistent dollar amount of compute spend per hour (e.g., $10/hour) for 1 or 3 years. AWS applies the discount automatically to matching usage.

TypeScopeFlexibilityDiscount
Compute Savings PlanEC2, Fargate, LambdaAny region, any instance family, any OSUp to 66%
EC2 Instance Savings PlanEC2 in a specific regionAny size within a family (e.g., any m5.* in us-east-1)Up to 72%

Compute Savings Plan is the most flexible: $10/hour of EC2 spend anywhere, plus Fargate and Lambda. If you switch from m5 to c5 or move to Fargate, the plan still applies.

EC2 Instance Savings Plan gives a higher discount but commits to an instance family in a region — appropriate if you know your baseline capacity will stay on a specific instance type.

bash
1# Get Savings Plans recommendations from AWS
2aws savingsplans describe-savings-plans-offering-rates \
3  --savings-plans-offering-ids $(aws savingsplans describe-savings-plans-offerings \
4    --plan-type ComputeSavingsPlans \
5    --query "searchResults[0].offeringId" \
6    --output text) \
7  --query "searchResults[?savingsPlanRate.currency=='USD']"
8
9# Or use the Cost Explorer console — provides personalized recommendations based on your 7/30/60-day usage

Practical recommendation: buy a Compute Savings Plan covering your stable baseline (the minimum capacity you run 24/7). Let Spot or on-demand handle peak/burst. Start with a 1-year plan — 3-year gives a better discount but locks in longer.

Spot Instances

Spot instances use spare EC2 capacity at up to 90% off on-demand pricing. AWS can reclaim Spot instances with a 2-minute warning when capacity is needed.

Spot works well for:

  • Stateless, fault-tolerant workloads: web servers, API gateways, workers
  • Batch jobs that can checkpoint: ML training with checkpoint/resume
  • EKS burst capacity: Karpenter with spot in the NodePool requirements

Spot doesn't work for:

  • Stateful workloads that can't be interrupted (databases, single-instance services)
  • Long-running jobs that can't checkpoint — any interruption means restarting from zero, so the cost of a reclaim scales with elapsed runtime
bash
1# Check Spot price history for instance types
2aws ec2 describe-spot-price-history \
3  --instance-types m5.xlarge c5.xlarge m5.2xlarge \
4  --product-descriptions "Linux/UNIX" \
5  --start-time "$(date -u -d '-24 hours' +%Y-%m-%dT%H:%M:%SZ)" \
6  --query "SpotPriceHistory[].{InstanceType:InstanceType,Price:SpotPrice,AZ:AvailabilityZone}" \
7  --output table

Spot diversification: request multiple instance types across multiple AZs. If one instance pool runs out of capacity, the others continue. Karpenter and EC2 Auto Scaling mixed instance policies both support this.


Right-Sizing

Under-utilized instances are waste. An m5.4xlarge running at 10% CPU costs the same as one at 90% CPU.

Finding Right-Sizing Opportunities

bash
1# List EC2 instances with low CPU utilization using CloudWatch
2aws cloudwatch get-metric-statistics \
3  --namespace AWS/EC2 \
4  --metric-name CPUUtilization \
5  --dimensions Name=InstanceId,Value=i-abc123 \
6  --start-time "$(date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ)" \
7  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
8  --period 3600 \
9  --statistics Average Maximum
10
11# Use AWS Compute Optimizer (examines 14 days of CloudWatch metrics)
12aws compute-optimizer get-ec2-instance-recommendations \
13  --instance-arns arn:aws:ec2:us-east-1:012345678901:instance/i-abc123
14
15# Get recommendations for all instances in the account
16aws compute-optimizer get-ec2-instance-recommendations \
17  --query "instanceRecommendations[?finding=='OVER_PROVISIONED']"

AWS Compute Optimizer provides specific recommendations: "this instance could use m5.xlarge instead of m5.2xlarge with your current usage pattern, saving $X/month."

RDS Right-Sizing

bash
# Get RDS instance recommendations
aws compute-optimizer get-rds-db-instance-recommendations \
  --query "dbInstanceRecommendations[?finding=='OVER_PROVISIONED']"

RDS right-sizing often reveals instances sized for peak traffic that only occurs occasionally. Consider Aurora Serverless v2 for variable workloads.


EKS Cost Controls

Karpenter Consolidation

Karpenter's consolidation moves pods from underutilized nodes to fewer nodes, terminating the now-empty nodes. This is the most effective bin-packing mechanism for EKS.

yaml
1# NodePool with aggressive consolidation
2spec:
3  disruption:
4    consolidationPolicy: WhenUnderutilized    # Consolidate any underutilized node
5    consolidateAfter: 1m                      # Wait 1 minute before consolidating
6  template:
7    spec:
8      requirements:
9        - key: karpenter.sh/capacity-type
10          operator: In
11          values: ["spot", "on-demand"]    # Prefer spot; fall back to on-demand

With consolidation enabled, Karpenter typically reduces node count by 20-40% compared to a static node group because it continuously rebalances bin packing.

Spot Node Groups with Karpenter

Mix spot and on-demand at the NodePool level:

yaml
1requirements:
2  - key: karpenter.sh/capacity-type
3    operator: In
4    values: ["spot", "on-demand"]
5  - key: karpenter.k8s.aws/instance-category
6    operator: In
7    values: ["m", "c", "r"]    # Broad selection improves spot availability
8  - key: karpenter.k8s.aws/instance-generation
9    operator: Gt
10    values: ["4"]              # Recent generation instances have better spot availability

Karpenter selects the cheapest available instance from the requirements. With broad instance type selection across multiple families, there's almost always a cheap spot option available.

For workloads that must run on on-demand (stateful services, critical jobs), use a dedicated NodePool with capacity-type: on-demand and add a matching taint:

yaml
taints:
  - key: workload-type
    value: critical
    effect: NoSchedule

Namespace Cost Allocation

Tag Kubernetes namespaces to map costs back to teams:

yaml
# In Karpenter NodePool or cluster node labels
metadata:
  labels:
    team: payments
    cost-center: cc-1234

Use AWS Cost and Usage Reports (CUR) with the EKS cost allocation tags feature:

bash
# Enable EKS cost allocation tags in Cost Explorer
aws eks update-cluster-config \
  --name prod-cluster \
  --tags team=payments,cost-center=cc-1234

For per-namespace cost visibility, tools like Kubecost or OpenCost aggregate CUR data and Kubernetes metadata to show cost per namespace, deployment, or label.


Data Transfer Cost Reduction

AWS charges for data leaving the VPC (outbound to the internet: $0.09/GB in us-east-1), crossing AZs ($0.01/GB each direction), and flowing through NAT Gateways ($0.045/GB).

VPC Endpoints (Biggest Win)

ECR image pulls from private subnets go through NAT Gateway — at ~200 MB per pull × many nodes × many deployments, this adds up.

bash
1# Create ECR interface endpoints to bypass NAT Gateway
2aws ec2 create-vpc-endpoint \
3  --vpc-id vpc-abc123 \
4  --service-name com.amazonaws.us-east-1.ecr.dkr \
5  --vpc-endpoint-type Interface \
6  --subnet-ids subnet-private-1a subnet-private-1b
7
8aws ec2 create-vpc-endpoint \
9  --vpc-id vpc-abc123 \
10  --service-name com.amazonaws.us-east-1.ecr.api \
11  --vpc-endpoint-type Interface \
12  --subnet-ids subnet-private-1a subnet-private-1b
13
14# S3 gateway endpoint (free — must have for ECR to work without NAT)
15aws ec2 create-vpc-endpoint \
16  --vpc-id vpc-abc123 \
17  --service-name com.amazonaws.us-east-1.s3 \
18  --vpc-endpoint-type Gateway \
19  --route-table-ids rtb-private-1a rtb-private-1b

ECR interface endpoints cost $0.01/hour/AZ but typically pay for themselves within days of eliminating NAT Gateway ECR traffic.

Cross-AZ Traffic Reduction

Cross-AZ traffic costs $0.01/GB in each direction. For high-throughput services, this is significant.

Kubernetes topology-aware routing keeps traffic within the same AZ when possible:

yaml
1apiVersion: v1
2kind: Service
3metadata:
4  name: payments-api
5spec:
6  selector:
7    app: payments-api
8  ports:
9    - port: 8080
10  trafficDistribution: PreferClose    # Kubernetes 1.31+ (beta, on by default; alpha in 1.30) — prefers local-AZ endpoints

For older clusters, use topology hints:

yaml
metadata:
  annotations:
    service.kubernetes.io/topology-mode: Auto    # Enable topology-aware routing

Finding Waste

Idle and Unused Resources

bash
1# Find EBS volumes not attached to any instance
2aws ec2 describe-volumes \
3  --filters Name=status,Values=available \
4  --query "Volumes[].{VolumeId:VolumeId,Size:Size,CreateTime:CreateTime}" \
5  --output table
6
7# Find load balancers with no targets
8aws elbv2 describe-load-balancers \
9  --query "LoadBalancers[].LoadBalancerArn" \
10  --output text | xargs -I{} aws elbv2 describe-target-groups \
11    --load-balancer-arn {} \
12    --query "TargetGroups[?TargetType=='instance'].TargetGroupArn" \
13    --output text
14
15# Find Elastic IPs not associated with any resource
16aws ec2 describe-addresses \
17  --filters Name=association-id,Values="*" \
18  --query "Addresses[?AssociationId==null].PublicIp" \
19  --output text
20
21# Find old snapshots (more than 90 days old)
22aws ec2 describe-snapshots \
23  --owner-ids self \
24  --query "Snapshots[?StartTime<='$(date -d '-90 days' +%Y-%m-%d)'].{SnapshotId:SnapshotId,Size:VolumeSize,Date:StartTime}" \
25  --output table

Unattached EBS volumes (available state) accumulate silently — they're charged at storage rates even when not in use. Common sources: deleted EC2 instances where the EBS volume wasn't deleted, test environments that were shut down but not fully cleaned up.

Cost Explorer and Anomaly Detection

bash
1# Get the top 5 services by cost in the last 30 days
2aws ce get-cost-and-usage \
3  --time-period Start=$(date -d '-30 days' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
4  --granularity MONTHLY \
5  --group-by Type=DIMENSION,Key=SERVICE \
6  --metrics "UnblendedCost" \
7  --query "ResultsByTime[0].Groups | sort_by(@, &Metrics.UnblendedCost.Amount) | reverse(@) | [:5]"
8
9# Enable cost anomaly detection
10aws ce create-anomaly-monitor \
11  --anomaly-monitor '{
12    "MonitorName": "AllServices",
13    "MonitorType": "DIMENSIONAL",
14    "MonitorDimension": "SERVICE"
15  }'
16
17aws ce create-anomaly-subscription \
18  --anomaly-subscription '{
19    "SubscriptionName": "AlertOnAnomalies",
20    "MonitorArnList": ["arn:aws:ce::012345678901:anomalymonitor/abc123"],
21    "Subscribers": [{"Address": "platform-team@example.com", "Type": "EMAIL"}],
22    "ThresholdExpression": {"Dimensions": {"Key": "ANOMALY_TOTAL_IMPACT_ABSOLUTE", "Values": ["100"], "MatchOptions": ["GREATER_THAN_OR_EQUAL"]}}
23  }'

Cost Anomaly Detection alerts when a service's spend deviates significantly from its historical baseline. Set a dollar threshold (e.g., $100 absolute impact) to avoid noise from small fluctuations.


Frequently Asked Questions

Should I buy Reserved Instances or Savings Plans?

Savings Plans (specifically Compute Savings Plans) for almost all cases. Here's why:

  • Savings Plans apply across instance families and regions — if you migrate from m5 to m6i, the plan still applies
  • Compute Savings Plans also cover Fargate and Lambda
  • No complex RI management (no instance-level tracking, no expiration dates per purchase)
  • Same or better discount compared to Convertible RIs

The only case for EC2 Instance Savings Plans (or Standard RIs) is when you have very specific committed capacity in a particular region and instance family where you want the maximum possible discount and know your usage won't change.

How much can Spot instances save on EKS?

Spot instances are typically 60-80% cheaper than on-demand for common instance types (m5, c5, r5). In practice, with Karpenter managing a mixed Spot+on-demand NodePool:

  • Spot capacity: 60-80% of compute cost for Spot nodes
  • On-demand baseline for critical workloads
  • Overall: 30-50% reduction in EC2 costs for a typical mixed cluster

The savings depend heavily on Spot availability for your instance types and the Spot interruption rate (typically 5-10% per month for a diversified selection).

What's the fastest way to find AWS billing waste?

In order of impact-to-effort:

  1. Compute Optimizer recommendations: check immediately — it analyzes 14 days of metrics and gives specific downsizing recommendations. Often finds 20-40% waste in compute.

  2. Unattached EBS volumes: run the describe-volumes command above. These are pure waste.

  3. NAT Gateway data transfer: check the NAT Gateway metrics in CloudWatch. If you're processing significant traffic through NAT Gateway to ECR or S3, VPC endpoints will provide immediate savings.

  4. Idle load balancers: each ALB costs ~$16/month + data processing. Check for ALBs with zero request count in CloudWatch.

  5. Old snapshots: snapshots are $0.05/GB/month in us-east-1. A year of weekly snapshots for a 1 TB database costs $2,600.


For Karpenter-specific node consolidation that reduces idle capacity, see Kubernetes Cluster Autoscaler and Karpenter: Node Autoscaling on EKS. For VPC endpoint setup that eliminates NAT Gateway data transfer costs, see AWS VPC Design for EKS: Subnets, NAT, and Security Groups.

Running an EKS cost optimization engagement, implementing Savings Plans coverage for a large AWS account, or designing a Spot-first infrastructure for a batch workload? Talk to us at Coding Protocols — we help platform teams reduce AWS spend without sacrificing reliability or performance.

Related Topics

AWS
Cost Optimization
EC2
Spot
Savings Plans
EKS
Platform Engineering

Read Next