February | 2025 | Running Databases on AWS: Architecture, Performance & Practical Benchmarks

This post presents the results of IOPS performance testing for Oracle running on Custom RDS in AWS. The tests were conducted using the fio benchmarking tool across three instance types: db.m5.large, db.m5.xlarge, and db.m5.4xlarge. The storage used was General Purpose SSD (gp2) volumes. The database remained active but was not under any load during the tests.
Here are Step-by-Step guide to creating a CEV and deploying an Oracle Database on RDS Custom.

Disclaimer:
Some steps or configurations described in this post may incur charges when using AWS services. It is your responsibility to review AWS pricing and monitor your usage. After testing, ensure that all resources are properly deleted to avoid unexpected charges. Always use the AWS Free Tier where applicable and test responsibly.

Sequential Reads

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
Read IOPS	513	513	513
Read Bandwidth (MiB/s)	513	514	514
Read Bandwidth (MB/s)	538	539	539
Total Read Data (GiB)	150	151	151
Run Duration (ms)	300,003	300,003	300,003
Submit Latency Avg (usec)	53.36	52.65	57.34
Complete Latency Avg (usec)	3,843.65	3,838.99	3,833.52
Total Latency Avg (usec)	3,897.16	3,891.79	3,891.03
IOPS Min/Max/Avg	485 / 1024 / 513.68	487 / 1077 / 514.05	496 / 1014 / 514.29
Latency Percentiles	5,407	4,424	5,014
CPU Usage (sys)	1.29%	1.41%	1.48%
Queue Utilization (%)	99.29%	100.00%	100.00%

Key Observations:

Performance Consistency – The read bandwidth and IOPS are nearly identical across all instance types, hovering at 513-514 MiB/s (538-539 MB/s).
Latency – Minor differences in latency averages and percentiles suggest similar read patterns and IO efficiencies.
Queue Utilization – All tests indicate almost full utilization (100%) of the storage system.
CPU System Load – A slight increase in system CPU usage with larger instance types might be attributed to handling higher parallel operations efficiently.

Performance Summary – All three instance types (db.m5.large, db.m5.xlarge, and db.m5.4xlarge) achieved the same Read IOPS during the sequential read tests and the sequential read throughput remained stable across all instance types without any significant variations.
The results indicate that sequential read performance is limited by the characteristics of the gp2 SSD storage rather than the instance size.

Sequential Writes

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	513	514	513
Bandwidth (MiB/s)	514	514	514
Avg Latency (usec)	3890.53	3889.16	3890.80
CPU Usage (usr/sys)	1.37% / 1.26%	1.06% / 1.26%	1.15% / 1.57%
Disk Utilization (%)	99.05%	99.98%	100.00%

Key Observations:

Bandwidth and IOPS – All instance types exhibit similar performance for sequential writes, with approximately 514 MiB/s bandwidth and 513 to 14 IOPS.
Latency – Latency percentiles for all instances show a consistent pattern with an average latency of ~3890 usec across the board. Higher percentiles indicate occasional spikes, with the 99.99th percentile showing latencies up to 14,484 usec.
CPU Utilization – CPU usage is relatively low across all instance types, with the db.m5.large showing slightly higher user-mode CPU usage (1.37%) than the other configurations.
System CPU utilization remains consistent at around 1.26%.
Disk Utilization – Disk utilization for db.m5.xlarge and db.m5.4xlarge reaches 100%, indicating they are fully saturated during sequential write operations.
The db.m5.large instance shows 99.07% disk utilization, suggesting near-complete resource saturation.
Disk Stats – Across all instances, NVMe devices report consistent statistics, with variations in disk utilization percentages across different drives. Some drives on the larger instances operate at utilization rates between 83% to 86%, likely due to parallel I/O distribution.

Performance Summary – The three instance types demonstrate comparable sequential write performance in terms of throughput, IOPS, and latency. The db.m5.xlarge and db.m5.4xlarge instances fully utilize disk resources, which aligns with their larger resource allocations. The marginally lower CPU usage for the db.m5.xlarge could indicate a more efficient processing path for sequential write tasks.

Random Reads

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	2991	2991	2993
Bandwidth (MiB/s)	23.4	23.4	23.4
Avg Latency (usec)	667.41	667.39	667.04
Latency Percentiles (99%) (usec)	1303	1336	1303
CPU Usage (usr/sys)	0.47% / 1.15%	0.53% / 1.39%	0.44% / 1.03%
Disk Utilization (%)	98.75%	100.00%	100.00%

Key Observations:

Consistent IOPS and Bandwidth – All instance types achieved nearly identical IOPS (~2991–2993) and bandwidth (~23.4 MiB/s), indicating the random read workload is not heavily dependent on instance size.
Latency Distribution – The average read latency for all instances remained consistent at approximately 667 µs, with the 99th percentile ranging between 1303 and 1336 µs.
CPU Utilization – Slightly higher system CPU usage was observed on the db.m5.xlarge instance (1.39% sys) compared to the other types.
Disk Utilization – Disk utilization reached near-saturation levels (98.75% to 100%), demonstrating the workload’s reliance on storage performance rather than compute power.

Performance Summary – The random read performance appears bottlenecked by storage IOPS capacity rather than instance compute or memory resources. Scaling to higher instance sizes (db.m5.large → db.m5.4xlarge) did not yield any improvement in IOPS or bandwidth.
Stable read latencies indicate that the underlying storage maintained consistent performance across different instance sizes.
So, given the results, larger instances do not provide significant benefits for random read workloads on these storage configurations. Focus should be on storage optimization if performance enhancements are required.

Random Writes

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	2051	2086	2122
Bandwidth (MiB/s)	16.0	16.3	16.6
Avg Latency (usec)	973.33	957.14	941.02
Latency Percentiles (99%) (usec)	1631	1188	1037
CPU Usage (usr/sys)	0.43% / 2.09%	0.50% / 2.21%	0.40% / 2.16%
Disk Utilization (%)	98.47%	99.40%	100.00%

Key Observations:

IOPS and Bandwidth Consistency – The IOPS and bandwidth performance is quite stable across all instance sizes with slight improvements on larger instances. The difference between the db.m5.large and db.m5.4xlarge instances is minimal in terms of IOPS (2051–2122) and bandwidth (~16 MiB/s–16.6 MiB/s).
Latency Distribution – The average latency is relatively consistent across instances, with a small decrease as instance size increases. The 99th percentile latency improved as you scale up the instance sizes (1631 µs to 1037 µs), although the reduction is modest.
CPU Utilization – CPU utilization is low across all instances, with the system CPU utilization being slightly higher on db.m5.xlarge (2.21%) compared to the db.m5.large (2.09%) and db.m5.4xlarge (2.16%).
Disk Utilization – The disk utilization is near saturation levels, peaking at 100% on the db.m5.4xlarge instance. All instances show similar disk utilization (~98–100%), which highlights the potential storage bottleneck in random write operations.

Performance Summary – The performance for random writes appears to be limited by the storage subsystem rather than the CPU or memory of the instances. All instance sizes achieved similar bandwidth and IOPS, even with the increasing instance size.
Scaling up to larger instances (db.m5.xlarge and db.m5.4xlarge) provided minimal improvements in random write performance. This suggests that disk I/O is the limiting factor in this workload.
While latency did improve slightly as we moved to larger instances, the improvement is marginal, and the overall behavior shows that random writes are not significantly affected by the instance’s compute capacity.
For workloads with heavy random writes, optimizing storage performance (e.g., choosing faster EBS volumes) rather than scaling up instances would likely lead to more significant performance gains.

Happy clouding!

AWS offers different preconfigured OS images (Amazon Machine Images or AMI) to be used by launching EC2 instance – Amazon Linux, Ubuntu, RedHat, SUSE, but still not Oracle Linux(OL).
This doesn’t mean that it is not possible to have own EC2 instance running OL – here’s how this can be done.

By EC2 instance launch (EC2 Console) search in the AMI catalog for 131827586825 – there are Oracle-provided Oracle Linux AMIs
Refer to the policy document Licensing Oracle Software in the Cloud Computing Environment for licensing and support pricing details

Just select one (f.i. for Architecture: x86_64) and then continue with the EC2 Console (Key pair, Network Settings, Storage, etc.). Free tier is eligible.
In a short time the instance is ready and can be used.
To connect by SSH Client, the instance must have public IP, then just use your .pem file to connect – using ec2_user instead of root –
ssh -i “myKey.pem” ec2-user@3.125.8.112 (IP of course will be different).
If you want to use Session Manager to connect to the instance, some additional steps must be followed.

1. SSM Agent must be installed on the machine – make ssh session and install the rpm package –
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm

Ensure that the service is up by systemctl status amazon-ssm-agent.

2. Create Endpoints needed for the SSM
Go to VPC Console->Endpoints and create endpoint. Select Type ‘AWS service’

Under Services filter for ssm and select com.amazonaws.eu-central-1.ssm (if you are not in eu-central-1 the name will be different, showing the region)

Make sure you select the VPC where the EC2 instance is running and correct subnet as well security group and then create the endpoint.

Create two additional endpoints for ssmmessages and ec2messages(here filter for ec2 by searching for service)

3. EC2 instance needs appropriate IAM role. There is a plenty of information in AWS Doc/Forums about that. I used the role AWSRDSCustomInstanceRole-eu-central-1 which was created before for Oracle RDS Custom (custom-oracle-iam.json).

With these steps Session Manager is ready and connection to the instance is working

Happy clouding!

Running Databases on AWS: Architecture, Performance & Practical Benchmarks

Month: February 2025

Oracle RDSCustom – Performance Analysis of AWS db.m5 Instances: Sequential & Random Read/Write Benchmarks

Sequential Reads

Sequential Writes

Random Reads

Random Writes

Oracle Linux on EC2