AWS
15 min readMay 7, 2026

AWS RDS and Aurora: Managed Database Patterns on AWS

RDS removes the operational burden of database installation, patching, backups, and replication — but choosing between RDS and Aurora, sizing the instance, configuring parameter groups, and setting up read replicas correctly still requires understanding the trade-offs. This covers RDS PostgreSQL vs Aurora PostgreSQL, Multi-AZ and read replica architecture, RDS Proxy for connection pooling, backup and snapshot strategy, Performance Insights for query-level visibility, and when to choose Aurora Serverless v2 over provisioned.

CO
Coding Protocols Team
Platform Engineering
AWS RDS and Aurora: Managed Database Patterns on AWS

RDS manages the infrastructure under your database: OS patching, storage scaling, replication, backups, and failover. What it doesn't manage is everything above the engine — schema design, query performance, connection management, and choosing the right configuration for your workload.

The two main choices on AWS are RDS (running a standard database engine on a managed EC2 instance) and Aurora (AWS's re-engineered database engine with a distributed storage layer). Aurora PostgreSQL and Aurora MySQL are compatible with their standard counterparts but have different performance characteristics, failover behavior, and pricing.


RDS vs Aurora PostgreSQL

RDS PostgreSQLAurora PostgreSQL
StorageEBS (gp3 or io1), provisioned per instanceDistributed storage, auto-scales 10 GiB → 128 TiB (256 TiB on Aurora PostgreSQL 16.9+, 15.13+)
Write performanceEBS-boundUp to 3× faster than standard PostgreSQL (AWS claim)
ReplicationPhysical streaming replication to replicasStorage-level replication shared with up to 15 replicas
Read replicasUp to 5, replication lag variesUp to 15, near-zero replica lag (shared storage)
Failover time60-120 seconds (Multi-AZ promotion)30 seconds or less (Aurora replicas use same storage)
Storage costYou pay for provisioned sizeYou pay per GB of actual data stored
Minor version upgradesManual or autoManual or auto
CompatibilityNative PostgreSQLCompatible, not identical (some extensions differ)
Cross-region replicationRead replicas onlyAurora Global Database (sub-1-second replication lag)
Serverless optionNoAurora Serverless v2

Choose RDS PostgreSQL when: you need exact PostgreSQL compatibility, you use extensions that Aurora doesn't support (PostGIS with certain configurations, timescaledb, etc.), or your storage needs are predictable and you want simpler cost forecasting.

Choose Aurora PostgreSQL when: you need faster failover (< 30 seconds), more than 5 read replicas, near-zero replica lag for read scaling, or you want storage that auto-scales without planning.


Multi-AZ and Read Replicas

Multi-AZ Deployment

Multi-AZ runs a synchronous standby in a different AZ. Writes are committed to both the primary and standby before returning success. On primary failure, RDS automatically fails over to the standby — DNS is updated to point to the new primary.

bash
1# Create RDS instance with Multi-AZ
2aws rds create-db-instance \
3  --db-instance-identifier prod-postgres \
4  --db-instance-class db.r6g.xlarge \
5  --engine postgres \
6  --engine-version 16.2 \
7  --master-username postgres \
8  --master-user-password "$(aws secretsmanager get-secret-value --secret-id rds/prod/postgres --query SecretString --output text | jq -r .password)" \
9  --allocated-storage 100 \
10  --storage-type gp3 \
11  --multi-az \
12  --db-subnet-group-name prod-db-subnet-group \
13  --vpc-security-group-ids sg-rds-prod \
14  --backup-retention-period 7 \
15  --preferred-backup-window "02:00-03:00" \
16  --preferred-maintenance-window "sun:04:00-sun:05:00" \
17  --deletion-protection \
18  --no-publicly-accessible

Multi-AZ failover timeline:

  1. Primary becomes unresponsive (hardware failure, AZ outage, manual reboot with failover)
  2. RDS detects failure (health check interval: 30 seconds typically)
  3. Standby is promoted to primary
  4. DNS record (prod-postgres.abc123xyz.us-east-1.rds.amazonaws.com) is updated
  5. Applications reconnect — total downtime: 60-120 seconds for RDS, 30 seconds or less for Aurora (when a reader replica is available)

Applications must handle reconnection. Connection poolers (PgBouncer, RDS Proxy) absorb the reconnection spike.

Read Replicas

Read replicas handle read queries, offloading the primary. They use asynchronous streaming replication — there is always some lag.

bash
1# Create read replica in same region
2aws rds create-db-instance-read-replica \
3  --db-instance-identifier prod-postgres-replica-1 \
4  --source-db-instance-identifier prod-postgres \
5  --db-instance-class db.r6g.large \
6  --availability-zone us-east-1b
7
8# Create read replica in another region (cross-region)
9aws rds create-db-instance-read-replica \
10  --db-instance-identifier prod-postgres-replica-dr \
11  --source-db-instance-identifier prod-postgres \
12  --db-instance-class db.r6g.large \
13  --source-region us-east-1 \
14  --region us-west-2

Monitor replication lag:

bash
1aws cloudwatch get-metric-statistics \
2  --namespace AWS/RDS \
3  --metric-name ReplicaLag \
4  --dimensions Name=DBInstanceIdentifier,Value=prod-postgres-replica-1 \
5  --start-time "$(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ)" \
6  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
7  --period 60 \
8  --statistics Average

A read replica can be promoted to a standalone primary — used for disaster recovery or region migration. Promotion is irreversible and disconnects the replica from the source.


Aurora Architecture

Aurora replaces EBS with a distributed storage layer spanning 6 storage nodes across 3 AZs. Writes go to 4 of 6 nodes for a quorum commit; reads can use any of the 6. This design means:

  • Storage is durable independently of compute: losing a reader or writer instance doesn't affect the data layer
  • Readers share the same storage as the writer: no replication lag for reads (readers read directly from the storage layer)
  • Failover promotes a reader: the reader already has the data — failover is flipping which instance handles writes, not copying data

Aurora Cluster Components

Aurora Cluster
├── Writer endpoint: prod-cluster.cluster-abc123.us-east-1.rds.amazonaws.com
│   └── Writer instance (one at a time)
│
├── Reader endpoint: prod-cluster.cluster-ro-abc123.us-east-1.rds.amazonaws.com
│   └── Routes to available reader instances (load balanced)
│
└── Instance endpoints (direct)
    ├── prod-cluster-instance-1.abc123.us-east-1.rds.amazonaws.com (writer)
    ├── prod-cluster-instance-2.abc123.us-east-1.rds.amazonaws.com (reader)
    └── prod-cluster-instance-3.abc123.us-east-1.rds.amazonaws.com (reader)

Applications should use the cluster endpoint (writer) for writes and the reader endpoint for reads. Instance endpoints are for direct connection during troubleshooting.

bash
1# Create Aurora PostgreSQL cluster
2aws rds create-db-cluster \
3  --db-cluster-identifier prod-aurora-cluster \
4  --engine aurora-postgresql \
5  --engine-version 16.2 \
6  --master-username postgres \
7  --master-user-password "$(aws secretsmanager get-secret-value ...)" \
8  --db-subnet-group-name prod-db-subnet-group \
9  --vpc-security-group-ids sg-rds-prod \
10  --backup-retention-period 7 \
11  --preferred-backup-window "02:00-03:00" \
12  --deletion-protection
13
14# Add writer instance
15aws rds create-db-instance \
16  --db-instance-identifier prod-aurora-writer \
17  --db-cluster-identifier prod-aurora-cluster \
18  --db-instance-class db.r6g.xlarge \
19  --engine aurora-postgresql
20
21# Add reader instance
22aws rds create-db-instance \
23  --db-instance-identifier prod-aurora-reader-1 \
24  --db-cluster-identifier prod-aurora-cluster \
25  --db-instance-class db.r6g.large \
26  --engine aurora-postgresql

Aurora Global Database

For cross-region DR or global applications, Aurora Global Database replicates from a primary region to up to 5 secondary regions with sub-second replication lag (typically < 1 second):

bash
1aws rds create-global-cluster \
2  --global-cluster-identifier prod-global \
3  --source-db-cluster-identifier arn:aws:rds:us-east-1:012345678901:cluster:prod-aurora-cluster
4
5# Add secondary region
6aws rds create-db-cluster \
7  --db-cluster-identifier prod-aurora-cluster-eu \
8  --engine aurora-postgresql \
9  --engine-version 16.2 \
10  --global-cluster-identifier prod-global \
11  --db-subnet-group-name prod-db-subnet-group-eu \
12  --region eu-west-1

Failover to a secondary region requires a manual "detach and promote" operation — there's no automatic global failover in most configurations.


Aurora Serverless v2

Aurora Serverless v2 scales compute capacity up and down in increments of 0.5 ACUs (Aurora Capacity Units, where 1 ACU ≈ 2 GiB memory) within a configured min/max range. You pay for the ACUs consumed, not for provisioned instances.

bash
1aws rds create-db-cluster \
2  --db-cluster-identifier prod-serverless-cluster \
3  --engine aurora-postgresql \
4  --engine-version 16.2 \
5  --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=32 \    # MinCapacity=0 (auto-pause) supported on PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+
6  --master-username postgres \
7  --master-user-password "..." \
8  --db-subnet-group-name prod-db-subnet-group \
9  --vpc-security-group-ids sg-rds-prod
10
11# Add serverless writer instance
12aws rds create-db-instance \
13  --db-instance-identifier prod-serverless-writer \
14  --db-cluster-identifier prod-serverless-cluster \
15  --db-instance-class db.serverless \
16  --engine aurora-postgresql

Serverless v2 is appropriate when:

  • Traffic is variable and unpredictable (dev environments, batch workloads, SaaS multi-tenant with idle tenants)
  • You want scale-to-near-zero (minimum 0.5 ACU for older engine versions; newer versions (PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+) support scaling to 0 ACUs with auto-pause)
  • You want to avoid over-provisioning for peak load

Not appropriate for:

  • Latency-sensitive OLTP with consistent high load (provisioned instances have predictable performance)
  • Workloads that scale rapidly from zero — scaling up from 0.5 ACU takes a few seconds

RDS Proxy

RDS Proxy manages connection pooling between applications and RDS/Aurora. Reduces connection overhead for serverless applications (Lambda functions that open new database connections per invocation) and maintains connections during failover.

bash
1aws rds create-db-proxy \
2  --db-proxy-name prod-rds-proxy \
3  --engine-family POSTGRESQL \
4  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:012345678901:secret:rds/prod/postgres-abc123","IAMAuth":"REQUIRED"}]' \
5  --role-arn arn:aws:iam::012345678901:role/rds-proxy-role \
6  --vpc-subnet-ids subnet-private-1a subnet-private-1b subnet-private-1c \
7  --vpc-security-group-ids sg-rds-proxy
8
9# Target the RDS instance
10aws rds register-db-proxy-targets \
11  --db-proxy-name prod-rds-proxy \
12  --db-instance-identifiers prod-postgres

Applications connect to the proxy endpoint instead of the RDS endpoint. The proxy maintains a pool of connections to the database.

RDS Proxy benefits:

  • Connection pooling: multiple application connections multiplex onto fewer database connections
  • Failover handling: the proxy maintains connections during Multi-AZ failover — application connections reconnect to the proxy (which handles the DB reconnection internally), reducing failover impact
  • IAM authentication: supports IAM-based authentication as an alternative to password auth
  • Secrets Manager integration: rotates credentials automatically via Secrets Manager

RDS Proxy limitations:

  • Adds ~1ms latency per query
  • Does not support all PostgreSQL features (e.g., SET SESSION CHARACTERISTICS and some LISTEN/NOTIFY patterns have limitations)
  • Costs $0.015 per vCPU-hour of the target database

Parameter Groups

Parameter groups configure the database engine. Every RDS instance has a parameter group — the default one cannot be modified.

bash
1# Create custom parameter group
2aws rds create-db-parameter-group \
3  --db-parameter-group-name prod-postgres16 \
4  --db-parameter-group-family postgres16 \
5  --description "Production PostgreSQL 16 parameters"
6
7# Set parameters
8aws rds modify-db-parameter-group \
9  --db-parameter-group-name prod-postgres16 \
10  --parameters \
11    "ParameterName=work_mem,ParameterValue=65536,ApplyMethod=immediate" \
12    "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
13    "ParameterName=log_min_duration_statement,ParameterValue=1000,ApplyMethod=immediate" \
14    "ParameterName=shared_preload_libraries,ParameterValue=pg_stat_statements,ApplyMethod=pending-reboot"

Key PostgreSQL parameters for production:

ParameterRecommendedNotes
max_connections100–300Higher requires more memory; use RDS Proxy instead
work_mem4–64 MBPer sort/hash operation, multiplied by concurrent operations
shared_buffersAWS sets a default based on instance classConfigurable (Static — requires reboot) in a custom parameter group; setting too high prevents the instance from starting
log_min_duration_statement500–2000 msLog slow queries (milliseconds)
log_autovacuum_min_duration0Log all autovacuum runs
shared_preload_librariespg_stat_statementsRequired for Performance Insights query tracking
random_page_cost1.1 for gp3/io1Default 4.0 is for spinning disk; SSD is much lower

Parameters with ApplyMethod=pending-reboot require a DB restart to take effect. ApplyMethod=immediate applies without restart but may still be deferred for static parameters.


Backup and Snapshot Strategy

Automated backups: RDS takes daily snapshots and retains transaction logs for point-in-time recovery. Retention: 1–35 days.

bash
# Modify backup retention
aws rds modify-db-instance \
  --db-instance-identifier prod-postgres \
  --backup-retention-period 14 \
  --preferred-backup-window "02:00-03:00" \
  --apply-immediately

Manual snapshots: persist until you delete them (not subject to retention period).

bash
1# Create manual snapshot
2aws rds create-db-snapshot \
3  --db-instance-identifier prod-postgres \
4  --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10
5
6# Restore to new instance from snapshot
7aws rds restore-db-instance-from-db-snapshot \
8  --db-instance-identifier prod-postgres-restored \
9  --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10 \
10  --db-instance-class db.r6g.xlarge \
11  --db-subnet-group-name prod-db-subnet-group \
12  --vpc-security-group-ids sg-rds-prod \
13  --no-publicly-accessible

Point-in-time restore: restore to any second within the backup retention window.

bash
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier prod-postgres \
  --target-db-instance-identifier prod-postgres-pitr \
  --restore-time "2026-05-10T14:30:00Z"

Cross-region backup copy: for DR, copy snapshots to another region automatically using EventBridge and Lambda, or manually:

bash
aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:us-east-1:012345678901:snapshot:prod-postgres-snapshot \
  --target-db-snapshot-identifier prod-postgres-snapshot-copy \
  --region us-west-2 \
  --source-region us-east-1

Performance Insights

Performance Insights shows database load by wait state, SQL query, user, host, and application — it's the fastest way to identify what's causing database slowness.

bash
# Enable Performance Insights (can also set at creation time)
aws rds modify-db-instance \
  --db-instance-identifier prod-postgres \
  --enable-performance-insights \
  --performance-insights-retention-period 7 \    # 7 days free; paid options: 1–24 months
  --apply-immediately

The key metric is DBLoad (database load in AAS — average active sessions). When DBLoad > number of vCPUs, the database is CPU-bound or waiting on I/O. Drill into the SQL dimension to find the top queries by load.

Performance Insights requires pg_stat_statements extension loaded via shared_preload_libraries. Set this in the parameter group before enabling PI or the SQL query data won't be available.


Frequently Asked Questions

What's the RTO for RDS Multi-AZ failover?

For RDS: 60–120 seconds. The standby is promoted, DNS is updated, and applications reconnect. The actual downtime depends on how fast applications detect the DNS change and reconnect — DNS TTL for RDS is 5 seconds, but some connection pools cache DNS longer.

For Aurora: 30 seconds or less when a reader replica is available (the reader is promoted to writer). Without a reader, Aurora follows the same promotion path as RDS (~60 seconds).

Use RDS Proxy to absorb the failover — the proxy maintains connections and re-establishes them to the new primary without applications needing to reconnect.

Should I use gp3 or io1 storage?

gp3 is the right choice for most workloads:

  • Baseline 3,000 IOPS and 125 MiB/s throughput for volumes under 400 GiB; 12,000 IOPS and 500 MiB/s free baseline for 400 GiB+ (automatic striping across 4 volumes)
  • Scale IOPS up to 64,000 and throughput to 4,000 MB/s independently
  • ~20% cheaper than gp2 at the same size

io1 is appropriate for:

  • Workloads requiring > 64,000 IOPS (io1 goes to 256,000 IOPS on db.x2iedn instances)
  • Extreme latency sensitivity where predictable IOPS guarantees matter

Start with gp3. Monitor WriteIOPS, ReadIOPS, and DiskQueueDepth in CloudWatch. Switch to io1 only if gp3 consistently shows queue depth > 1 during peak load.

How do I handle schema migrations without downtime?

Online schema changes on production databases:

  1. Use pg_repack to rebuild tables without long locks (for index changes and table rewrites)
  2. Use additive migrations first (new columns with defaults, new tables) before destructive ones (drop column)
  3. Use feature flags to control when applications use the new schema
  4. For large table alterations, consider ALTER TABLE ... ADD COLUMN with a default value — PostgreSQL 11+ handles this without a table rewrite when the default is a non-volatile constant (e.g., a literal value or immutable function); volatile defaults like DEFAULT now() still require a full table rewrite

For RDS specifically: schedule migrations during the maintenance window or use blue/green deployments — create a new RDS instance, restore a snapshot, run migrations, redirect traffic.

What's the difference between Multi-AZ and a read replica for DR?

Multi-AZ standby:

  • Synchronous replication (no data loss on failover)
  • Automatic failover (RDS-managed, no manual intervention)
  • Cannot be used for reads (the standby is not accessible)
  • Higher cost (same instance class as primary)

Read replica:

  • Asynchronous replication (some lag, potential data loss on failover)
  • Manual promotion required (not automatic)
  • Can serve read traffic (explicit purpose)
  • Can be in another region
  • Must be promoted manually for DR

For production: Multi-AZ for automatic high availability + read replicas for read scaling. Not either/or.


For the VPC networking that isolates RDS instances in dedicated subnets, see AWS VPC Design for EKS: Subnets, NAT, and Security Groups. For secret rotation that manages RDS credentials, see AWS Secrets Manager and Parameter Store: Secrets Management on AWS.

Migrating from self-managed PostgreSQL to RDS or Aurora, investigating slow query performance, or designing a multi-region database architecture for disaster recovery? Talk to us at Coding Protocols — we help platform teams design managed database architectures that reduce operational burden without sacrificing performance or reliability.

Related Topics

AWS
RDS
Aurora
PostgreSQL
Databases
Platform Engineering

Read Next