One of the most vicious and hard to detect issues in database performance deterioration is I/O. When the I/O of a database is lagging there are multiple and unpredicted issues occuring.
Some of the most common are:
The immediate reaction of the person troubleshooting a growing list of pending queries, is to check the slow query log. If the slow log contains queries (probably will) then one will start investigating which of the queries was the cause of the problem.
However when machine I/O is the problem, it is likely that none of the queries is actually problematic.
This is the reason that I/O issues are very difficult to detect – infrastructure is the last thing to come to mind as the root of the problems.
When using AWS RDS, one does not have traditional OS tools such as systat, iostat, dtstat or sar. The only tool to understand what is happening in RDS is cloudwatch metrics and the graphs provided.
The IOPS cloudwatch metrics provide great insights into how much IOPS occur in your db.
You can view them by visiting cloudwatch, selecting RDS and then finding the ReadIOPS and WriteIOPS metrics for your database.
Once the graph shows up, select the 1 minute granularity and “average” from the dropdown.
By summing up the ReadIOPS and WriteIOPS you will see how much IOPS your operations consume.
The DiskQueueDepth metric provides the number of outstanding IOs (read/write requests) waiting to access the disk. If this metrics is frequently above 2, then you should expect sooner or later to face performance issues.
By using this metric you can immediately identify how many requests are waiting queued at your disk.
Using the above two graphs it is easy to identify if you are under-provisioned or over-provisioned in IOPS.
To see how many IOPS are needed to have a steady performance, use the ReadIOPS and WriteIOPS metrics and sum up the values. Choose a descent time interval or a typical day from a performance point of view and also remove outliers. Compare this value with the IOPS you have provisioned.
Once you calculate how many IOPS are needed, then you have two ways to acquire them.
The first is to purchase PIOPS, which is more reliable but a lot more costly. The second is to use a gp2 disk for your RDS instance, which provides 3 IOPS per GB of storage.
Lets take an example.
Assuming on a typical day we have an avearge of 400 ReadIOPS and 500 WriteIOPS, it means that our disk is consuming 900 IOPS. It therefore makes sense to acquire approximately that amount of IOPS.
Using the above two ways one has the below options:
PIOPS is considered more reliable however it is more costly.
Hope this guide provided some good undertsanding into how IOPS work in RDS.
For more information please also check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html