AWS RDS and Aurora: Managed Database Patterns on AWS (2026)

RDS manages the infrastructure under your database: OS patching, storage scaling, replication, backups, and failover. What it doesn't manage is everything above the engine — schema design, query performance, connection management, and choosing the right configuration for your workload.

The two main choices on AWS are RDS (running a standard database engine on a managed EC2 instance) and Aurora (AWS's re-engineered database engine with a distributed storage layer). Aurora PostgreSQL and Aurora MySQL are compatible with their standard counterparts but have different performance characteristics, failover behavior, and pricing.

RDS vs Aurora PostgreSQL

	RDS PostgreSQL	Aurora PostgreSQL
Storage	EBS (gp3 or io1), provisioned per instance	Distributed storage, auto-scales 10 GiB → 128 TiB (256 TiB on Aurora PostgreSQL 16.9+, 15.13+)
Write performance	EBS-bound	Up to 3× faster than standard PostgreSQL (AWS claim)
Replication	Physical streaming replication to replicas	Storage-level replication shared with up to 15 replicas
Read replicas	Up to 5, replication lag varies	Up to 15, near-zero replica lag (shared storage)
Failover time	60-120 seconds (Multi-AZ promotion)	30 seconds or less (Aurora replicas use same storage)
Storage cost	You pay for provisioned size	You pay per GB of actual data stored
Minor version upgrades	Manual or auto	Manual or auto
Compatibility	Native PostgreSQL	Compatible, not identical (some extensions differ)
Cross-region replication	Read replicas only	Aurora Global Database (sub-1-second replication lag)
Serverless option	No	Aurora Serverless v2

Choose RDS PostgreSQL when: you need exact PostgreSQL compatibility, you use extensions that Aurora doesn't support (PostGIS with certain configurations, timescaledb, etc.), or your storage needs are predictable and you want simpler cost forecasting.

Choose Aurora PostgreSQL when: you need faster failover (< 30 seconds), more than 5 read replicas, near-zero replica lag for read scaling, or you want storage that auto-scales without planning.

Multi-AZ and Read Replicas

Multi-AZ Deployment

Multi-AZ runs a synchronous standby in a different AZ. Writes are committed to both the primary and standby before returning success. On primary failure, RDS automatically fails over to the standby — DNS is updated to point to the new primary.

bash

1# Create RDS instance with Multi-AZ
2aws rds create-db-instance \
3  --db-instance-identifier prod-postgres \
4  --db-instance-class db.r6g.xlarge \
5  --engine postgres \
6  --engine-version 16.2 \
7  --master-username postgres \
8  --master-user-password "$(aws secretsmanager get-secret-value --secret-id rds/prod/postgres --query SecretString --output text | jq -r .password)" \
9  --allocated-storage 100 \
10  --storage-type gp3 \
11  --multi-az \
12  --db-subnet-group-name prod-db-subnet-group \
13  --vpc-security-group-ids sg-rds-prod \
14  --backup-retention-period 7 \
15  --preferred-backup-window "02:00-03:00" \
16  --preferred-maintenance-window "sun:04:00-sun:05:00" \
17  --deletion-protection \
18  --no-publicly-accessible

Multi-AZ failover timeline:

Primary becomes unresponsive (hardware failure, AZ outage, manual reboot with failover)
RDS detects failure (health check interval: 30 seconds typically)
Standby is promoted to primary
DNS record (prod-postgres.abc123xyz.us-east-1.rds.amazonaws.com) is updated
Applications reconnect — total downtime: 60-120 seconds for RDS, 30 seconds or less for Aurora (when a reader replica is available)

Applications must handle reconnection. Connection poolers (PgBouncer, RDS Proxy) absorb the reconnection spike.

Read Replicas

Read replicas handle read queries, offloading the primary. They use asynchronous streaming replication — there is always some lag.

bash

1# Create read replica in same region
2aws rds create-db-instance-read-replica \
3  --db-instance-identifier prod-postgres-replica-1 \
4  --source-db-instance-identifier prod-postgres \
5  --db-instance-class db.r6g.large \
6  --availability-zone us-east-1b
7
8# Create read replica in another region (cross-region)
9aws rds create-db-instance-read-replica \
10  --db-instance-identifier prod-postgres-replica-dr \
11  --source-db-instance-identifier prod-postgres \
12  --db-instance-class db.r6g.large \
13  --source-region us-east-1 \
14  --region us-west-2

Monitor replication lag:

bash

1aws cloudwatch get-metric-statistics \
2  --namespace AWS/RDS \
3  --metric-name ReplicaLag \
4  --dimensions Name=DBInstanceIdentifier,Value=prod-postgres-replica-1 \
5  --start-time "$(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ)" \
6  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
7  --period 60 \
8  --statistics Average

A read replica can be promoted to a standalone primary — used for disaster recovery or region migration. Promotion is irreversible and disconnects the replica from the source.

Aurora Architecture

Aurora replaces EBS with a distributed storage layer spanning 6 storage nodes across 3 AZs. Writes go to 4 of 6 nodes for a quorum commit; reads can use any of the 6. This design means:

Storage is durable independently of compute: losing a reader or writer instance doesn't affect the data layer
Readers share the same storage as the writer: no replication lag for reads (readers read directly from the storage layer)
Failover promotes a reader: the reader already has the data — failover is flipping which instance handles writes, not copying data

Aurora Cluster Components

Aurora Cluster
├── Writer endpoint: prod-cluster.cluster-abc123.us-east-1.rds.amazonaws.com
│   └── Writer instance (one at a time)
│
├── Reader endpoint: prod-cluster.cluster-ro-abc123.us-east-1.rds.amazonaws.com
│   └── Routes to available reader instances (load balanced)
│
└── Instance endpoints (direct)
    ├── prod-cluster-instance-1.abc123.us-east-1.rds.amazonaws.com (writer)
    ├── prod-cluster-instance-2.abc123.us-east-1.rds.amazonaws.com (reader)
    └── prod-cluster-instance-3.abc123.us-east-1.rds.amazonaws.com (reader)

Applications should use the cluster endpoint (writer) for writes and the reader endpoint for reads. Instance endpoints are for direct connection during troubleshooting.

bash

1# Create Aurora PostgreSQL cluster
2aws rds create-db-cluster \
3  --db-cluster-identifier prod-aurora-cluster \
4  --engine aurora-postgresql \
5  --engine-version 16.2 \
6  --master-username postgres \
7  --master-user-password "$(aws secretsmanager get-secret-value ...)" \
8  --db-subnet-group-name prod-db-subnet-group \
9  --vpc-security-group-ids sg-rds-prod \
10  --backup-retention-period 7 \
11  --preferred-backup-window "02:00-03:00" \
12  --deletion-protection
13
14# Add writer instance
15aws rds create-db-instance \
16  --db-instance-identifier prod-aurora-writer \
17  --db-cluster-identifier prod-aurora-cluster \
18  --db-instance-class db.r6g.xlarge \
19  --engine aurora-postgresql
20
21# Add reader instance
22aws rds create-db-instance \
23  --db-instance-identifier prod-aurora-reader-1 \
24  --db-cluster-identifier prod-aurora-cluster \
25  --db-instance-class db.r6g.large \
26  --engine aurora-postgresql

Aurora Global Database

For cross-region DR or global applications, Aurora Global Database replicates from a primary region to up to 5 secondary regions with sub-second replication lag (typically < 1 second):

bash

1aws rds create-global-cluster \
2  --global-cluster-identifier prod-global \
3  --source-db-cluster-identifier arn:aws:rds:us-east-1:012345678901:cluster:prod-aurora-cluster
4
5# Add secondary region
6aws rds create-db-cluster \
7  --db-cluster-identifier prod-aurora-cluster-eu \
8  --engine aurora-postgresql \
9  --engine-version 16.2 \
10  --global-cluster-identifier prod-global \
11  --db-subnet-group-name prod-db-subnet-group-eu \
12  --region eu-west-1

Failover to a secondary region requires a manual "detach and promote" operation — there's no automatic global failover in most configurations.

Aurora Serverless v2

Aurora Serverless v2 scales compute capacity up and down in increments of 0.5 ACUs (Aurora Capacity Units, where 1 ACU ≈ 2 GiB memory) within a configured min/max range. You pay for the ACUs consumed, not for provisioned instances.

bash

1aws rds create-db-cluster \
2  --db-cluster-identifier prod-serverless-cluster \
3  --engine aurora-postgresql \
4  --engine-version 16.2 \
5  --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=32 \    # MinCapacity=0 (auto-pause) supported on PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+
6  --master-username postgres \
7  --master-user-password "..." \
8  --db-subnet-group-name prod-db-subnet-group \
9  --vpc-security-group-ids sg-rds-prod
10
11# Add serverless writer instance
12aws rds create-db-instance \
13  --db-instance-identifier prod-serverless-writer \
14  --db-cluster-identifier prod-serverless-cluster \
15  --db-instance-class db.serverless \
16  --engine aurora-postgresql

Serverless v2 is appropriate when:

Traffic is variable and unpredictable (dev environments, batch workloads, SaaS multi-tenant with idle tenants)
You want scale-to-near-zero (minimum 0.5 ACU for older engine versions; newer versions (PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+) support scaling to 0 ACUs with auto-pause)
You want to avoid over-provisioning for peak load

Not appropriate for:

Latency-sensitive OLTP with consistent high load (provisioned instances have predictable performance)
Workloads that scale rapidly from zero — scaling up from 0.5 ACU takes a few seconds

RDS Proxy

RDS Proxy manages connection pooling between applications and RDS/Aurora. Reduces connection overhead for serverless applications (Lambda functions that open new database connections per invocation) and maintains connections during failover.

bash

1aws rds create-db-proxy \
2  --db-proxy-name prod-rds-proxy \
3  --engine-family POSTGRESQL \
4  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:012345678901:secret:rds/prod/postgres-abc123","IAMAuth":"REQUIRED"}]' \
5  --role-arn arn:aws:iam::012345678901:role/rds-proxy-role \
6  --vpc-subnet-ids subnet-private-1a subnet-private-1b subnet-private-1c \
7  --vpc-security-group-ids sg-rds-proxy
8
9# Target the RDS instance
10aws rds register-db-proxy-targets \
11  --db-proxy-name prod-rds-proxy \
12  --db-instance-identifiers prod-postgres

Applications connect to the proxy endpoint instead of the RDS endpoint. The proxy maintains a pool of connections to the database.

RDS Proxy benefits:

Connection pooling: multiple application connections multiplex onto fewer database connections
Failover handling: the proxy maintains connections during Multi-AZ failover — application connections reconnect to the proxy (which handles the DB reconnection internally), reducing failover impact
IAM authentication: supports IAM-based authentication as an alternative to password auth
Secrets Manager integration: rotates credentials automatically via Secrets Manager

RDS Proxy limitations:

Adds ~1ms latency per query
Does not support all PostgreSQL features (e.g., SET SESSION CHARACTERISTICS and some LISTEN/NOTIFY patterns have limitations)
Costs $0.015 per vCPU-hour of the target database

Parameter Groups

Parameter groups configure the database engine. Every RDS instance has a parameter group — the default one cannot be modified.

bash

1# Create custom parameter group
2aws rds create-db-parameter-group \
3  --db-parameter-group-name prod-postgres16 \
4  --db-parameter-group-family postgres16 \
5  --description "Production PostgreSQL 16 parameters"
6
7# Set parameters
8aws rds modify-db-parameter-group \
9  --db-parameter-group-name prod-postgres16 \
10  --parameters \
11    "ParameterName=work_mem,ParameterValue=65536,ApplyMethod=immediate" \
12    "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
13    "ParameterName=log_min_duration_statement,ParameterValue=1000,ApplyMethod=immediate" \
14    "ParameterName=shared_preload_libraries,ParameterValue=pg_stat_statements,ApplyMethod=pending-reboot"

Key PostgreSQL parameters for production:

Parameter	Recommended	Notes
`max_connections`	100–300	Higher requires more memory; use RDS Proxy instead
`work_mem`	4–64 MB	Per sort/hash operation, multiplied by concurrent operations
`shared_buffers`	AWS sets a default based on instance class	Configurable (Static — requires reboot) in a custom parameter group; setting too high prevents the instance from starting
`log_min_duration_statement`	500–2000 ms	Log slow queries (milliseconds)
`log_autovacuum_min_duration`	0	Log all autovacuum runs
`shared_preload_libraries`	`pg_stat_statements`	Required for Performance Insights query tracking
`random_page_cost`	1.1 for gp3/io1	Default 4.0 is for spinning disk; SSD is much lower

Parameters with ApplyMethod=pending-reboot require a DB restart to take effect. ApplyMethod=immediate applies without restart but may still be deferred for static parameters.

Backup and Snapshot Strategy

Automated backups: RDS takes daily snapshots and retains transaction logs for point-in-time recovery. Retention: 1–35 days.

bash

# Modify backup retention
aws rds modify-db-instance \
  --db-instance-identifier prod-postgres \
  --backup-retention-period 14 \
  --preferred-backup-window "02:00-03:00" \
  --apply-immediately

Manual snapshots: persist until you delete them (not subject to retention period).

bash

1# Create manual snapshot
2aws rds create-db-snapshot \
3  --db-instance-identifier prod-postgres \
4  --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10
5
6# Restore to new instance from snapshot
7aws rds restore-db-instance-from-db-snapshot \
8  --db-instance-identifier prod-postgres-restored \
9  --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10 \
10  --db-instance-class db.r6g.xlarge \
11  --db-subnet-group-name prod-db-subnet-group \
12  --vpc-security-group-ids sg-rds-prod \
13  --no-publicly-accessible

Point-in-time restore: restore to any second within the backup retention window.

bash

aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier prod-postgres \
  --target-db-instance-identifier prod-postgres-pitr \
  --restore-time "2026-05-10T14:30:00Z"

Cross-region backup copy: for DR, copy snapshots to another region automatically using EventBridge and Lambda, or manually:

bash

aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:us-east-1:012345678901:snapshot:prod-postgres-snapshot \
  --target-db-snapshot-identifier prod-postgres-snapshot-copy \
  --region us-west-2 \
  --source-region us-east-1

Performance Insights

Performance Insights shows database load by wait state, SQL query, user, host, and application — it's the fastest way to identify what's causing database slowness.

bash

# Enable Performance Insights (can also set at creation time)
aws rds modify-db-instance \
  --db-instance-identifier prod-postgres \
  --enable-performance-insights \
  --performance-insights-retention-period 7 \    # 7 days free; paid options: 1–24 months
  --apply-immediately

The key metric is DBLoad (database load in AAS — average active sessions). When DBLoad > number of vCPUs, the database is CPU-bound or waiting on I/O. Drill into the SQL dimension to find the top queries by load.

Performance Insights requires pg_stat_statements extension loaded via shared_preload_libraries. Set this in the parameter group before enabling PI or the SQL query data won't be available.

Frequently Asked Questions

What's the RTO for RDS Multi-AZ failover?

For RDS: 60–120 seconds. The standby is promoted, DNS is updated, and applications reconnect. The actual downtime depends on how fast applications detect the DNS change and reconnect — DNS TTL for RDS is 5 seconds, but some connection pools cache DNS longer.

For Aurora: 30 seconds or less when a reader replica is available (the reader is promoted to writer). Without a reader, Aurora follows the same promotion path as RDS (~60 seconds).

Use RDS Proxy to absorb the failover — the proxy maintains connections and re-establishes them to the new primary without applications needing to reconnect.

Should I use gp3 or io1 storage?

gp3 is the right choice for most workloads:

Baseline 3,000 IOPS and 125 MiB/s throughput for volumes under 400 GiB; 12,000 IOPS and 500 MiB/s free baseline for 400 GiB+ (automatic striping across 4 volumes)
Scale IOPS up to 64,000 and throughput to 4,000 MB/s independently
~20% cheaper than gp2 at the same size

io1 is appropriate for:

Workloads requiring > 64,000 IOPS (io1 goes to 256,000 IOPS on db.x2iedn instances)
Extreme latency sensitivity where predictable IOPS guarantees matter

Start with gp3. Monitor WriteIOPS, ReadIOPS, and DiskQueueDepth in CloudWatch. Switch to io1 only if gp3 consistently shows queue depth > 1 during peak load.

How do I handle schema migrations without downtime?

Online schema changes on production databases:

Use pg_repack to rebuild tables without long locks (for index changes and table rewrites)
Use additive migrations first (new columns with defaults, new tables) before destructive ones (drop column)
Use feature flags to control when applications use the new schema
For large table alterations, consider ALTER TABLE ... ADD COLUMN with a default value — PostgreSQL 11+ handles this without a table rewrite when the default is a non-volatile constant (e.g., a literal value or immutable function); volatile defaults like DEFAULT now() still require a full table rewrite

For RDS specifically: schedule migrations during the maintenance window or use blue/green deployments — create a new RDS instance, restore a snapshot, run migrations, redirect traffic.

What's the difference between Multi-AZ and a read replica for DR?

Multi-AZ standby:

Synchronous replication (no data loss on failover)
Automatic failover (RDS-managed, no manual intervention)
Cannot be used for reads (the standby is not accessible)
Higher cost (same instance class as primary)

Read replica:

Asynchronous replication (some lag, potential data loss on failover)
Manual promotion required (not automatic)
Can serve read traffic (explicit purpose)
Can be in another region
Must be promoted manually for DR

For production: Multi-AZ for automatic high availability + read replicas for read scaling. Not either/or.

For the VPC networking that isolates RDS instances in dedicated subnets, see AWS VPC Design for EKS: Subnets, NAT, and Security Groups. For secret rotation that manages RDS credentials, see AWS Secrets Manager and Parameter Store: Secrets Management on AWS.

Migrating from self-managed PostgreSQL to RDS or Aurora, investigating slow query performance, or designing a multi-region database architecture for disaster recovery? Talk to us at Coding Protocols — we help platform teams design managed database architectures that reduce operational burden without sacrificing performance or reliability.

AWS RDS and Aurora: Managed Database Patterns on AWS

RDS vs Aurora PostgreSQL

Multi-AZ and Read Replicas

Multi-AZ Deployment

Read Replicas

Aurora Architecture

Aurora Cluster Components

Aurora Global Database

Aurora Serverless v2

RDS Proxy

Parameter Groups

Backup and Snapshot Strategy

Performance Insights

Frequently Asked Questions

What's the RTO for RDS Multi-AZ failover?

Should I use gp3 or io1 storage?

How do I handle schema migrations without downtime?

What's the difference between Multi-AZ and a read replica for DR?

Related Topics

Read Next

AWS Route 53: DNS, Routing Policies, and Health Checks

AWS Step Functions: Orchestrating Distributed Workflows

Terraform for EKS: Complete Infrastructure as Code Guide