AWS RDS and Aurora: Managed Database Patterns on AWS
RDS removes the operational burden of database installation, patching, backups, and replication — but choosing between RDS and Aurora, sizing the instance, configuring parameter groups, and setting up read replicas correctly still requires understanding the trade-offs. This covers RDS PostgreSQL vs Aurora PostgreSQL, Multi-AZ and read replica architecture, RDS Proxy for connection pooling, backup and snapshot strategy, Performance Insights for query-level visibility, and when to choose Aurora Serverless v2 over provisioned.

RDS manages the infrastructure under your database: OS patching, storage scaling, replication, backups, and failover. What it doesn't manage is everything above the engine — schema design, query performance, connection management, and choosing the right configuration for your workload.
The two main choices on AWS are RDS (running a standard database engine on a managed EC2 instance) and Aurora (AWS's re-engineered database engine with a distributed storage layer). Aurora PostgreSQL and Aurora MySQL are compatible with their standard counterparts but have different performance characteristics, failover behavior, and pricing.
RDS vs Aurora PostgreSQL
| RDS PostgreSQL | Aurora PostgreSQL | |
|---|---|---|
| Storage | EBS (gp3 or io1), provisioned per instance | Distributed storage, auto-scales 10 GiB → 128 TiB (256 TiB on Aurora PostgreSQL 16.9+, 15.13+) |
| Write performance | EBS-bound | Up to 3× faster than standard PostgreSQL (AWS claim) |
| Replication | Physical streaming replication to replicas | Storage-level replication shared with up to 15 replicas |
| Read replicas | Up to 5, replication lag varies | Up to 15, near-zero replica lag (shared storage) |
| Failover time | 60-120 seconds (Multi-AZ promotion) | 30 seconds or less (Aurora replicas use same storage) |
| Storage cost | You pay for provisioned size | You pay per GB of actual data stored |
| Minor version upgrades | Manual or auto | Manual or auto |
| Compatibility | Native PostgreSQL | Compatible, not identical (some extensions differ) |
| Cross-region replication | Read replicas only | Aurora Global Database (sub-1-second replication lag) |
| Serverless option | No | Aurora Serverless v2 |
Choose RDS PostgreSQL when: you need exact PostgreSQL compatibility, you use extensions that Aurora doesn't support (PostGIS with certain configurations, timescaledb, etc.), or your storage needs are predictable and you want simpler cost forecasting.
Choose Aurora PostgreSQL when: you need faster failover (< 30 seconds), more than 5 read replicas, near-zero replica lag for read scaling, or you want storage that auto-scales without planning.
Multi-AZ and Read Replicas
Multi-AZ Deployment
Multi-AZ runs a synchronous standby in a different AZ. Writes are committed to both the primary and standby before returning success. On primary failure, RDS automatically fails over to the standby — DNS is updated to point to the new primary.
1# Create RDS instance with Multi-AZ
2aws rds create-db-instance \
3 --db-instance-identifier prod-postgres \
4 --db-instance-class db.r6g.xlarge \
5 --engine postgres \
6 --engine-version 16.2 \
7 --master-username postgres \
8 --master-user-password "$(aws secretsmanager get-secret-value --secret-id rds/prod/postgres --query SecretString --output text | jq -r .password)" \
9 --allocated-storage 100 \
10 --storage-type gp3 \
11 --multi-az \
12 --db-subnet-group-name prod-db-subnet-group \
13 --vpc-security-group-ids sg-rds-prod \
14 --backup-retention-period 7 \
15 --preferred-backup-window "02:00-03:00" \
16 --preferred-maintenance-window "sun:04:00-sun:05:00" \
17 --deletion-protection \
18 --no-publicly-accessibleMulti-AZ failover timeline:
- Primary becomes unresponsive (hardware failure, AZ outage, manual reboot with failover)
- RDS detects failure (health check interval: 30 seconds typically)
- Standby is promoted to primary
- DNS record (
prod-postgres.abc123xyz.us-east-1.rds.amazonaws.com) is updated - Applications reconnect — total downtime: 60-120 seconds for RDS, 30 seconds or less for Aurora (when a reader replica is available)
Applications must handle reconnection. Connection poolers (PgBouncer, RDS Proxy) absorb the reconnection spike.
Read Replicas
Read replicas handle read queries, offloading the primary. They use asynchronous streaming replication — there is always some lag.
1# Create read replica in same region
2aws rds create-db-instance-read-replica \
3 --db-instance-identifier prod-postgres-replica-1 \
4 --source-db-instance-identifier prod-postgres \
5 --db-instance-class db.r6g.large \
6 --availability-zone us-east-1b
7
8# Create read replica in another region (cross-region)
9aws rds create-db-instance-read-replica \
10 --db-instance-identifier prod-postgres-replica-dr \
11 --source-db-instance-identifier prod-postgres \
12 --db-instance-class db.r6g.large \
13 --source-region us-east-1 \
14 --region us-west-2Monitor replication lag:
1aws cloudwatch get-metric-statistics \
2 --namespace AWS/RDS \
3 --metric-name ReplicaLag \
4 --dimensions Name=DBInstanceIdentifier,Value=prod-postgres-replica-1 \
5 --start-time "$(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ)" \
6 --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
7 --period 60 \
8 --statistics AverageA read replica can be promoted to a standalone primary — used for disaster recovery or region migration. Promotion is irreversible and disconnects the replica from the source.
Aurora Architecture
Aurora replaces EBS with a distributed storage layer spanning 6 storage nodes across 3 AZs. Writes go to 4 of 6 nodes for a quorum commit; reads can use any of the 6. This design means:
- Storage is durable independently of compute: losing a reader or writer instance doesn't affect the data layer
- Readers share the same storage as the writer: no replication lag for reads (readers read directly from the storage layer)
- Failover promotes a reader: the reader already has the data — failover is flipping which instance handles writes, not copying data
Aurora Cluster Components
Aurora Cluster
├── Writer endpoint: prod-cluster.cluster-abc123.us-east-1.rds.amazonaws.com
│ └── Writer instance (one at a time)
│
├── Reader endpoint: prod-cluster.cluster-ro-abc123.us-east-1.rds.amazonaws.com
│ └── Routes to available reader instances (load balanced)
│
└── Instance endpoints (direct)
├── prod-cluster-instance-1.abc123.us-east-1.rds.amazonaws.com (writer)
├── prod-cluster-instance-2.abc123.us-east-1.rds.amazonaws.com (reader)
└── prod-cluster-instance-3.abc123.us-east-1.rds.amazonaws.com (reader)
Applications should use the cluster endpoint (writer) for writes and the reader endpoint for reads. Instance endpoints are for direct connection during troubleshooting.
1# Create Aurora PostgreSQL cluster
2aws rds create-db-cluster \
3 --db-cluster-identifier prod-aurora-cluster \
4 --engine aurora-postgresql \
5 --engine-version 16.2 \
6 --master-username postgres \
7 --master-user-password "$(aws secretsmanager get-secret-value ...)" \
8 --db-subnet-group-name prod-db-subnet-group \
9 --vpc-security-group-ids sg-rds-prod \
10 --backup-retention-period 7 \
11 --preferred-backup-window "02:00-03:00" \
12 --deletion-protection
13
14# Add writer instance
15aws rds create-db-instance \
16 --db-instance-identifier prod-aurora-writer \
17 --db-cluster-identifier prod-aurora-cluster \
18 --db-instance-class db.r6g.xlarge \
19 --engine aurora-postgresql
20
21# Add reader instance
22aws rds create-db-instance \
23 --db-instance-identifier prod-aurora-reader-1 \
24 --db-cluster-identifier prod-aurora-cluster \
25 --db-instance-class db.r6g.large \
26 --engine aurora-postgresqlAurora Global Database
For cross-region DR or global applications, Aurora Global Database replicates from a primary region to up to 5 secondary regions with sub-second replication lag (typically < 1 second):
1aws rds create-global-cluster \
2 --global-cluster-identifier prod-global \
3 --source-db-cluster-identifier arn:aws:rds:us-east-1:012345678901:cluster:prod-aurora-cluster
4
5# Add secondary region
6aws rds create-db-cluster \
7 --db-cluster-identifier prod-aurora-cluster-eu \
8 --engine aurora-postgresql \
9 --engine-version 16.2 \
10 --global-cluster-identifier prod-global \
11 --db-subnet-group-name prod-db-subnet-group-eu \
12 --region eu-west-1Failover to a secondary region requires a manual "detach and promote" operation — there's no automatic global failover in most configurations.
Aurora Serverless v2
Aurora Serverless v2 scales compute capacity up and down in increments of 0.5 ACUs (Aurora Capacity Units, where 1 ACU ≈ 2 GiB memory) within a configured min/max range. You pay for the ACUs consumed, not for provisioned instances.
1aws rds create-db-cluster \
2 --db-cluster-identifier prod-serverless-cluster \
3 --engine aurora-postgresql \
4 --engine-version 16.2 \
5 --serverless-v2-scaling-configuration MinCapacity=0.5,MaxCapacity=32 \ # MinCapacity=0 (auto-pause) supported on PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+
6 --master-username postgres \
7 --master-user-password "..." \
8 --db-subnet-group-name prod-db-subnet-group \
9 --vpc-security-group-ids sg-rds-prod
10
11# Add serverless writer instance
12aws rds create-db-instance \
13 --db-instance-identifier prod-serverless-writer \
14 --db-cluster-identifier prod-serverless-cluster \
15 --db-instance-class db.serverless \
16 --engine aurora-postgresqlServerless v2 is appropriate when:
- Traffic is variable and unpredictable (dev environments, batch workloads, SaaS multi-tenant with idle tenants)
- You want scale-to-near-zero (minimum 0.5 ACU for older engine versions; newer versions (PostgreSQL 16.3+, 15.7+, 14.12+, 13.15+) support scaling to 0 ACUs with auto-pause)
- You want to avoid over-provisioning for peak load
Not appropriate for:
- Latency-sensitive OLTP with consistent high load (provisioned instances have predictable performance)
- Workloads that scale rapidly from zero — scaling up from 0.5 ACU takes a few seconds
RDS Proxy
RDS Proxy manages connection pooling between applications and RDS/Aurora. Reduces connection overhead for serverless applications (Lambda functions that open new database connections per invocation) and maintains connections during failover.
1aws rds create-db-proxy \
2 --db-proxy-name prod-rds-proxy \
3 --engine-family POSTGRESQL \
4 --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:012345678901:secret:rds/prod/postgres-abc123","IAMAuth":"REQUIRED"}]' \
5 --role-arn arn:aws:iam::012345678901:role/rds-proxy-role \
6 --vpc-subnet-ids subnet-private-1a subnet-private-1b subnet-private-1c \
7 --vpc-security-group-ids sg-rds-proxy
8
9# Target the RDS instance
10aws rds register-db-proxy-targets \
11 --db-proxy-name prod-rds-proxy \
12 --db-instance-identifiers prod-postgresApplications connect to the proxy endpoint instead of the RDS endpoint. The proxy maintains a pool of connections to the database.
RDS Proxy benefits:
- Connection pooling: multiple application connections multiplex onto fewer database connections
- Failover handling: the proxy maintains connections during Multi-AZ failover — application connections reconnect to the proxy (which handles the DB reconnection internally), reducing failover impact
- IAM authentication: supports IAM-based authentication as an alternative to password auth
- Secrets Manager integration: rotates credentials automatically via Secrets Manager
RDS Proxy limitations:
- Adds ~1ms latency per query
- Does not support all PostgreSQL features (e.g.,
SET SESSION CHARACTERISTICSand someLISTEN/NOTIFYpatterns have limitations) - Costs $0.015 per vCPU-hour of the target database
Parameter Groups
Parameter groups configure the database engine. Every RDS instance has a parameter group — the default one cannot be modified.
1# Create custom parameter group
2aws rds create-db-parameter-group \
3 --db-parameter-group-name prod-postgres16 \
4 --db-parameter-group-family postgres16 \
5 --description "Production PostgreSQL 16 parameters"
6
7# Set parameters
8aws rds modify-db-parameter-group \
9 --db-parameter-group-name prod-postgres16 \
10 --parameters \
11 "ParameterName=work_mem,ParameterValue=65536,ApplyMethod=immediate" \
12 "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
13 "ParameterName=log_min_duration_statement,ParameterValue=1000,ApplyMethod=immediate" \
14 "ParameterName=shared_preload_libraries,ParameterValue=pg_stat_statements,ApplyMethod=pending-reboot"Key PostgreSQL parameters for production:
| Parameter | Recommended | Notes |
|---|---|---|
max_connections | 100–300 | Higher requires more memory; use RDS Proxy instead |
work_mem | 4–64 MB | Per sort/hash operation, multiplied by concurrent operations |
shared_buffers | AWS sets a default based on instance class | Configurable (Static — requires reboot) in a custom parameter group; setting too high prevents the instance from starting |
log_min_duration_statement | 500–2000 ms | Log slow queries (milliseconds) |
log_autovacuum_min_duration | 0 | Log all autovacuum runs |
shared_preload_libraries | pg_stat_statements | Required for Performance Insights query tracking |
random_page_cost | 1.1 for gp3/io1 | Default 4.0 is for spinning disk; SSD is much lower |
Parameters with ApplyMethod=pending-reboot require a DB restart to take effect. ApplyMethod=immediate applies without restart but may still be deferred for static parameters.
Backup and Snapshot Strategy
Automated backups: RDS takes daily snapshots and retains transaction logs for point-in-time recovery. Retention: 1–35 days.
# Modify backup retention
aws rds modify-db-instance \
--db-instance-identifier prod-postgres \
--backup-retention-period 14 \
--preferred-backup-window "02:00-03:00" \
--apply-immediatelyManual snapshots: persist until you delete them (not subject to retention period).
1# Create manual snapshot
2aws rds create-db-snapshot \
3 --db-instance-identifier prod-postgres \
4 --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10
5
6# Restore to new instance from snapshot
7aws rds restore-db-instance-from-db-snapshot \
8 --db-instance-identifier prod-postgres-restored \
9 --db-snapshot-identifier prod-postgres-pre-migration-2026-05-10 \
10 --db-instance-class db.r6g.xlarge \
11 --db-subnet-group-name prod-db-subnet-group \
12 --vpc-security-group-ids sg-rds-prod \
13 --no-publicly-accessiblePoint-in-time restore: restore to any second within the backup retention window.
aws rds restore-db-instance-to-point-in-time \
--source-db-instance-identifier prod-postgres \
--target-db-instance-identifier prod-postgres-pitr \
--restore-time "2026-05-10T14:30:00Z"Cross-region backup copy: for DR, copy snapshots to another region automatically using EventBridge and Lambda, or manually:
aws rds copy-db-snapshot \
--source-db-snapshot-identifier arn:aws:rds:us-east-1:012345678901:snapshot:prod-postgres-snapshot \
--target-db-snapshot-identifier prod-postgres-snapshot-copy \
--region us-west-2 \
--source-region us-east-1Performance Insights
Performance Insights shows database load by wait state, SQL query, user, host, and application — it's the fastest way to identify what's causing database slowness.
# Enable Performance Insights (can also set at creation time)
aws rds modify-db-instance \
--db-instance-identifier prod-postgres \
--enable-performance-insights \
--performance-insights-retention-period 7 \ # 7 days free; paid options: 1–24 months
--apply-immediatelyThe key metric is DBLoad (database load in AAS — average active sessions). When DBLoad > number of vCPUs, the database is CPU-bound or waiting on I/O. Drill into the SQL dimension to find the top queries by load.
Performance Insights requires pg_stat_statements extension loaded via shared_preload_libraries. Set this in the parameter group before enabling PI or the SQL query data won't be available.
Frequently Asked Questions
What's the RTO for RDS Multi-AZ failover?
For RDS: 60–120 seconds. The standby is promoted, DNS is updated, and applications reconnect. The actual downtime depends on how fast applications detect the DNS change and reconnect — DNS TTL for RDS is 5 seconds, but some connection pools cache DNS longer.
For Aurora: 30 seconds or less when a reader replica is available (the reader is promoted to writer). Without a reader, Aurora follows the same promotion path as RDS (~60 seconds).
Use RDS Proxy to absorb the failover — the proxy maintains connections and re-establishes them to the new primary without applications needing to reconnect.
Should I use gp3 or io1 storage?
gp3 is the right choice for most workloads:
- Baseline 3,000 IOPS and 125 MiB/s throughput for volumes under 400 GiB; 12,000 IOPS and 500 MiB/s free baseline for 400 GiB+ (automatic striping across 4 volumes)
- Scale IOPS up to 64,000 and throughput to 4,000 MB/s independently
- ~20% cheaper than gp2 at the same size
io1 is appropriate for:
- Workloads requiring > 64,000 IOPS (io1 goes to 256,000 IOPS on
db.x2iedninstances) - Extreme latency sensitivity where predictable IOPS guarantees matter
Start with gp3. Monitor WriteIOPS, ReadIOPS, and DiskQueueDepth in CloudWatch. Switch to io1 only if gp3 consistently shows queue depth > 1 during peak load.
How do I handle schema migrations without downtime?
Online schema changes on production databases:
- Use
pg_repackto rebuild tables without long locks (for index changes and table rewrites) - Use additive migrations first (new columns with defaults, new tables) before destructive ones (drop column)
- Use feature flags to control when applications use the new schema
- For large table alterations, consider
ALTER TABLE ... ADD COLUMNwith a default value — PostgreSQL 11+ handles this without a table rewrite when the default is a non-volatile constant (e.g., a literal value or immutable function); volatile defaults likeDEFAULT now()still require a full table rewrite
For RDS specifically: schedule migrations during the maintenance window or use blue/green deployments — create a new RDS instance, restore a snapshot, run migrations, redirect traffic.
What's the difference between Multi-AZ and a read replica for DR?
Multi-AZ standby:
- Synchronous replication (no data loss on failover)
- Automatic failover (RDS-managed, no manual intervention)
- Cannot be used for reads (the standby is not accessible)
- Higher cost (same instance class as primary)
Read replica:
- Asynchronous replication (some lag, potential data loss on failover)
- Manual promotion required (not automatic)
- Can serve read traffic (explicit purpose)
- Can be in another region
- Must be promoted manually for DR
For production: Multi-AZ for automatic high availability + read replicas for read scaling. Not either/or.
For the VPC networking that isolates RDS instances in dedicated subnets, see AWS VPC Design for EKS: Subnets, NAT, and Security Groups. For secret rotation that manages RDS credentials, see AWS Secrets Manager and Parameter Store: Secrets Management on AWS.
Migrating from self-managed PostgreSQL to RDS or Aurora, investigating slow query performance, or designing a multi-region database architecture for disaster recovery? Talk to us at Coding Protocols — we help platform teams design managed database architectures that reduce operational burden without sacrificing performance or reliability.


