AWS Cost Optimization: EC2 Pricing Models, Spot, Savings Plans, and EKS Cost Control
AWS bills go wrong in predictable ways: on-demand instances running 24/7 that could be Reserved or Spot, EBS volumes attached to stopped instances, NAT Gateway data transfer charges that VPC endpoints would eliminate, and CloudWatch metrics that nobody reads. This covers EC2 pricing models (On-Demand, Reserved, Savings Plans, Spot), right-sizing strategies, EKS-specific cost controls (Karpenter consolidation, Spot node groups, namespace cost allocation), data transfer cost reduction, and the tooling to find what's actually costing you money.

Most AWS bills have two parts: the cost for what you're using, and the cost for what you're not using anymore (forgotten snapshots, unused load balancers, EBS volumes attached to stopped instances). Optimization addresses both — choosing the right pricing model for sustained usage and cleaning up the waste that accumulates over time.
For EKS workloads specifically, the main levers are: instance type selection, capacity type (Spot vs on-demand), bin packing density (Karpenter consolidation), and eliminating data transfer charges with VPC endpoints.
EC2 Pricing Models
On-Demand
Full list price, no commitment, per-second billing (minimum 60 seconds). Use for:
- Unpredictable or short-lived workloads
- Development and testing environments
- Workloads you're not yet sure will be sustained
On-Demand is the right baseline — start here, then move to commitment pricing once you understand your sustained usage.
Reserved Instances
1-year or 3-year commitment for a specific instance type in a specific region. Up to 72% discount vs on-demand.
| Type | Flexibility | Discount | Payment |
|---|---|---|---|
| Standard RI | Fixed instance type + AZ or region | Up to 72% | All upfront, partial upfront, no upfront |
| Convertible RI | Can exchange instance family, size, OS | Up to 66% | Same options |
Standard RI: commit to m5.xlarge in us-east-1. If you switch to m5.2xlarge, the RI doesn't apply.
Convertible RI: commit to an equivalent amount of compute value — you can exchange to m5.2xlarge (two m5.xlarge equivalents) at any time. More flexible, slightly less discount.
Reserved Instances are purchased per-instance, which makes them hard to manage at scale. Savings Plans solve this.
Savings Plans
Savings Plans commit to a consistent dollar amount of compute spend per hour (e.g., $10/hour) for 1 or 3 years. AWS applies the discount automatically to matching usage.
| Type | Scope | Flexibility | Discount |
|---|---|---|---|
| Compute Savings Plan | EC2, Fargate, Lambda | Any region, any instance family, any OS | Up to 66% |
| EC2 Instance Savings Plan | EC2 in a specific region | Any size within a family (e.g., any m5.* in us-east-1) | Up to 72% |
Compute Savings Plan is the most flexible: $10/hour of EC2 spend anywhere, plus Fargate and Lambda. If you switch from m5 to c5 or move to Fargate, the plan still applies.
EC2 Instance Savings Plan gives a higher discount but commits to an instance family in a region — appropriate if you know your baseline capacity will stay on a specific instance type.
1# Get Savings Plans recommendations from AWS
2aws savingsplans describe-savings-plans-offering-rates \
3 --savings-plans-offering-ids $(aws savingsplans describe-savings-plans-offerings \
4 --plan-type ComputeSavingsPlans \
5 --query "searchResults[0].offeringId" \
6 --output text) \
7 --query "searchResults[?savingsPlanRate.currency=='USD']"
8
9# Or use the Cost Explorer console — provides personalized recommendations based on your 7/30/60-day usagePractical recommendation: buy a Compute Savings Plan covering your stable baseline (the minimum capacity you run 24/7). Let Spot or on-demand handle peak/burst. Start with a 1-year plan — 3-year gives a better discount but locks in longer.
Spot Instances
Spot instances use spare EC2 capacity at up to 90% off on-demand pricing. AWS can reclaim Spot instances with a 2-minute warning when capacity is needed.
Spot works well for:
- Stateless, fault-tolerant workloads: web servers, API gateways, workers
- Batch jobs that can checkpoint: ML training with checkpoint/resume
- EKS burst capacity: Karpenter with
spotin the NodePool requirements
Spot doesn't work for:
- Stateful workloads that can't be interrupted (databases, single-instance services)
- Long-running jobs that can't checkpoint — any interruption means restarting from zero, so the cost of a reclaim scales with elapsed runtime
1# Check Spot price history for instance types
2aws ec2 describe-spot-price-history \
3 --instance-types m5.xlarge c5.xlarge m5.2xlarge \
4 --product-descriptions "Linux/UNIX" \
5 --start-time "$(date -u -d '-24 hours' +%Y-%m-%dT%H:%M:%SZ)" \
6 --query "SpotPriceHistory[].{InstanceType:InstanceType,Price:SpotPrice,AZ:AvailabilityZone}" \
7 --output tableSpot diversification: request multiple instance types across multiple AZs. If one instance pool runs out of capacity, the others continue. Karpenter and EC2 Auto Scaling mixed instance policies both support this.
Right-Sizing
Under-utilized instances are waste. An m5.4xlarge running at 10% CPU costs the same as one at 90% CPU.
Finding Right-Sizing Opportunities
1# List EC2 instances with low CPU utilization using CloudWatch
2aws cloudwatch get-metric-statistics \
3 --namespace AWS/EC2 \
4 --metric-name CPUUtilization \
5 --dimensions Name=InstanceId,Value=i-abc123 \
6 --start-time "$(date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ)" \
7 --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
8 --period 3600 \
9 --statistics Average Maximum
10
11# Use AWS Compute Optimizer (examines 14 days of CloudWatch metrics)
12aws compute-optimizer get-ec2-instance-recommendations \
13 --instance-arns arn:aws:ec2:us-east-1:012345678901:instance/i-abc123
14
15# Get recommendations for all instances in the account
16aws compute-optimizer get-ec2-instance-recommendations \
17 --query "instanceRecommendations[?finding=='OVER_PROVISIONED']"AWS Compute Optimizer provides specific recommendations: "this instance could use m5.xlarge instead of m5.2xlarge with your current usage pattern, saving $X/month."
RDS Right-Sizing
# Get RDS instance recommendations
aws compute-optimizer get-rds-db-instance-recommendations \
--query "dbInstanceRecommendations[?finding=='OVER_PROVISIONED']"RDS right-sizing often reveals instances sized for peak traffic that only occurs occasionally. Consider Aurora Serverless v2 for variable workloads.
EKS Cost Controls
Karpenter Consolidation
Karpenter's consolidation moves pods from underutilized nodes to fewer nodes, terminating the now-empty nodes. This is the most effective bin-packing mechanism for EKS.
1# NodePool with aggressive consolidation
2spec:
3 disruption:
4 consolidationPolicy: WhenUnderutilized # Consolidate any underutilized node
5 consolidateAfter: 1m # Wait 1 minute before consolidating
6 template:
7 spec:
8 requirements:
9 - key: karpenter.sh/capacity-type
10 operator: In
11 values: ["spot", "on-demand"] # Prefer spot; fall back to on-demandWith consolidation enabled, Karpenter typically reduces node count by 20-40% compared to a static node group because it continuously rebalances bin packing.
Spot Node Groups with Karpenter
Mix spot and on-demand at the NodePool level:
1requirements:
2 - key: karpenter.sh/capacity-type
3 operator: In
4 values: ["spot", "on-demand"]
5 - key: karpenter.k8s.aws/instance-category
6 operator: In
7 values: ["m", "c", "r"] # Broad selection improves spot availability
8 - key: karpenter.k8s.aws/instance-generation
9 operator: Gt
10 values: ["4"] # Recent generation instances have better spot availabilityKarpenter selects the cheapest available instance from the requirements. With broad instance type selection across multiple families, there's almost always a cheap spot option available.
For workloads that must run on on-demand (stateful services, critical jobs), use a dedicated NodePool with capacity-type: on-demand and add a matching taint:
taints:
- key: workload-type
value: critical
effect: NoScheduleNamespace Cost Allocation
Tag Kubernetes namespaces to map costs back to teams:
# In Karpenter NodePool or cluster node labels
metadata:
labels:
team: payments
cost-center: cc-1234Use AWS Cost and Usage Reports (CUR) with the EKS cost allocation tags feature:
# Enable EKS cost allocation tags in Cost Explorer
aws eks update-cluster-config \
--name prod-cluster \
--tags team=payments,cost-center=cc-1234For per-namespace cost visibility, tools like Kubecost or OpenCost aggregate CUR data and Kubernetes metadata to show cost per namespace, deployment, or label.
Data Transfer Cost Reduction
AWS charges for data leaving the VPC (outbound to the internet: $0.09/GB in us-east-1), crossing AZs ($0.01/GB each direction), and flowing through NAT Gateways ($0.045/GB).
VPC Endpoints (Biggest Win)
ECR image pulls from private subnets go through NAT Gateway — at ~200 MB per pull × many nodes × many deployments, this adds up.
1# Create ECR interface endpoints to bypass NAT Gateway
2aws ec2 create-vpc-endpoint \
3 --vpc-id vpc-abc123 \
4 --service-name com.amazonaws.us-east-1.ecr.dkr \
5 --vpc-endpoint-type Interface \
6 --subnet-ids subnet-private-1a subnet-private-1b
7
8aws ec2 create-vpc-endpoint \
9 --vpc-id vpc-abc123 \
10 --service-name com.amazonaws.us-east-1.ecr.api \
11 --vpc-endpoint-type Interface \
12 --subnet-ids subnet-private-1a subnet-private-1b
13
14# S3 gateway endpoint (free — must have for ECR to work without NAT)
15aws ec2 create-vpc-endpoint \
16 --vpc-id vpc-abc123 \
17 --service-name com.amazonaws.us-east-1.s3 \
18 --vpc-endpoint-type Gateway \
19 --route-table-ids rtb-private-1a rtb-private-1bECR interface endpoints cost $0.01/hour/AZ but typically pay for themselves within days of eliminating NAT Gateway ECR traffic.
Cross-AZ Traffic Reduction
Cross-AZ traffic costs $0.01/GB in each direction. For high-throughput services, this is significant.
Kubernetes topology-aware routing keeps traffic within the same AZ when possible:
1apiVersion: v1
2kind: Service
3metadata:
4 name: payments-api
5spec:
6 selector:
7 app: payments-api
8 ports:
9 - port: 8080
10 trafficDistribution: PreferClose # Kubernetes 1.31+ (beta, on by default; alpha in 1.30) — prefers local-AZ endpointsFor older clusters, use topology hints:
metadata:
annotations:
service.kubernetes.io/topology-mode: Auto # Enable topology-aware routingFinding Waste
Idle and Unused Resources
1# Find EBS volumes not attached to any instance
2aws ec2 describe-volumes \
3 --filters Name=status,Values=available \
4 --query "Volumes[].{VolumeId:VolumeId,Size:Size,CreateTime:CreateTime}" \
5 --output table
6
7# Find load balancers with no targets
8aws elbv2 describe-load-balancers \
9 --query "LoadBalancers[].LoadBalancerArn" \
10 --output text | xargs -I{} aws elbv2 describe-target-groups \
11 --load-balancer-arn {} \
12 --query "TargetGroups[?TargetType=='instance'].TargetGroupArn" \
13 --output text
14
15# Find Elastic IPs not associated with any resource
16aws ec2 describe-addresses \
17 --filters Name=association-id,Values="*" \
18 --query "Addresses[?AssociationId==null].PublicIp" \
19 --output text
20
21# Find old snapshots (more than 90 days old)
22aws ec2 describe-snapshots \
23 --owner-ids self \
24 --query "Snapshots[?StartTime<='$(date -d '-90 days' +%Y-%m-%d)'].{SnapshotId:SnapshotId,Size:VolumeSize,Date:StartTime}" \
25 --output tableUnattached EBS volumes (available state) accumulate silently — they're charged at storage rates even when not in use. Common sources: deleted EC2 instances where the EBS volume wasn't deleted, test environments that were shut down but not fully cleaned up.
Cost Explorer and Anomaly Detection
1# Get the top 5 services by cost in the last 30 days
2aws ce get-cost-and-usage \
3 --time-period Start=$(date -d '-30 days' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
4 --granularity MONTHLY \
5 --group-by Type=DIMENSION,Key=SERVICE \
6 --metrics "UnblendedCost" \
7 --query "ResultsByTime[0].Groups | sort_by(@, &Metrics.UnblendedCost.Amount) | reverse(@) | [:5]"
8
9# Enable cost anomaly detection
10aws ce create-anomaly-monitor \
11 --anomaly-monitor '{
12 "MonitorName": "AllServices",
13 "MonitorType": "DIMENSIONAL",
14 "MonitorDimension": "SERVICE"
15 }'
16
17aws ce create-anomaly-subscription \
18 --anomaly-subscription '{
19 "SubscriptionName": "AlertOnAnomalies",
20 "MonitorArnList": ["arn:aws:ce::012345678901:anomalymonitor/abc123"],
21 "Subscribers": [{"Address": "platform-team@example.com", "Type": "EMAIL"}],
22 "ThresholdExpression": {"Dimensions": {"Key": "ANOMALY_TOTAL_IMPACT_ABSOLUTE", "Values": ["100"], "MatchOptions": ["GREATER_THAN_OR_EQUAL"]}}
23 }'Cost Anomaly Detection alerts when a service's spend deviates significantly from its historical baseline. Set a dollar threshold (e.g., $100 absolute impact) to avoid noise from small fluctuations.
Frequently Asked Questions
Should I buy Reserved Instances or Savings Plans?
Savings Plans (specifically Compute Savings Plans) for almost all cases. Here's why:
- Savings Plans apply across instance families and regions — if you migrate from
m5tom6i, the plan still applies - Compute Savings Plans also cover Fargate and Lambda
- No complex RI management (no instance-level tracking, no expiration dates per purchase)
- Same or better discount compared to Convertible RIs
The only case for EC2 Instance Savings Plans (or Standard RIs) is when you have very specific committed capacity in a particular region and instance family where you want the maximum possible discount and know your usage won't change.
How much can Spot instances save on EKS?
Spot instances are typically 60-80% cheaper than on-demand for common instance types (m5, c5, r5). In practice, with Karpenter managing a mixed Spot+on-demand NodePool:
- Spot capacity: 60-80% of compute cost for Spot nodes
- On-demand baseline for critical workloads
- Overall: 30-50% reduction in EC2 costs for a typical mixed cluster
The savings depend heavily on Spot availability for your instance types and the Spot interruption rate (typically 5-10% per month for a diversified selection).
What's the fastest way to find AWS billing waste?
In order of impact-to-effort:
-
Compute Optimizer recommendations: check immediately — it analyzes 14 days of metrics and gives specific downsizing recommendations. Often finds 20-40% waste in compute.
-
Unattached EBS volumes: run the describe-volumes command above. These are pure waste.
-
NAT Gateway data transfer: check the NAT Gateway metrics in CloudWatch. If you're processing significant traffic through NAT Gateway to ECR or S3, VPC endpoints will provide immediate savings.
-
Idle load balancers: each ALB costs ~$16/month + data processing. Check for ALBs with zero request count in CloudWatch.
-
Old snapshots: snapshots are $0.05/GB/month in us-east-1. A year of weekly snapshots for a 1 TB database costs $2,600.
For Karpenter-specific node consolidation that reduces idle capacity, see Kubernetes Cluster Autoscaler and Karpenter: Node Autoscaling on EKS. For VPC endpoint setup that eliminates NAT Gateway data transfer costs, see AWS VPC Design for EKS: Subnets, NAT, and Security Groups.
Running an EKS cost optimization engagement, implementing Savings Plans coverage for a large AWS account, or designing a Spot-first infrastructure for a batch workload? Talk to us at Coding Protocols — we help platform teams reduce AWS spend without sacrificing reliability or performance.


