Storage Limits

In an IaaS environment, most cloud providers do apply QoS on the storage layer in order to provide predictable and consistent performance distribution across multiple workloads and volumes. The objective of QoS is to ensure that all workloads receive their fair amount of shares, in order to minimise the effects of victim and bully [1] contention. It also makes it much simpler to do capacity planning analysis.

Storage IO Characteristics

On a given volume configuration, certain I/O characteristics drive the performance behaviour on the back end. I/O block sizes have big impact on IO response. In order to fully understand how provisioned storage will perform in your application, it is important to know what IOPS are and how they are measured.

What are IOPS?

IOPS are input/output operations per second. Most big cloud providers measure each I/O operation per second on some normalised block sizes [2]. For example, AWS and Azure measure I/O operations based on a normalised IO block size of 256 KiB. [3] [4] An I/O operation smaller than 256 KB is counted as a single I/O, simiiliarly, an I/O that larger than 256 KiB are counted in multiples of 256 KiB capacity units. For example, a single 64 KiB I/O operation would count as 1 IOPS; however, 1 I/O operations at 1 MiB block would count as 4 IOPS.

Note

Normalised on large block size like 256 KiB in storage QoS makes it easier for the storage to be consumed. On top of that, storage bandwidth threshold limit (MBps) can then be applied to provide a set building block. For example, on an IOPS limit of 1000 volume (with 256 KiB normalised IO), it allows smaller block sizes to run at the same 1000 IOPS threshold, whilst also provide very good throughput capability of 256 MB/s.

Things would not work so well for small normalised IO size. For example, many storage vendors that implement storage QoS limit generally normalised IO size at 8 KiB. The problem with small normalised IO size is that it makes it difficult to assign an IOPS can satisfy both IOPS counts and throughput requirements. For example, a limit of 1000 IOPS with 8 KiB normalised IO, the maximum throughput that it can delivery is 8 MB/s - most systems will have trouble running well on that limit.

I/O size and volume throughput limits

If a workload’s I/O chunks are very large, it may experience a smaller number of IOPS than provisioned because it is hitting the throughput limit of the volume. For example a volume that has an IOPS limit of 200 and a volume throughput limit of 100 MiB/s. If the workload is using a 1 MiB I/O size, the volume will reach its throughput limit at 100 IOPS (100 x 1 MiB = 100 MiB/s). For smaller I/O sizes (such as 8 KiB or less), the same volume can sustain the prescribed 200 IOPS because the throughput is well below 100 MiB/s.

For smaller I/O operations, you may even see an IOPS value that is higher than what you have provisioned (when measured on the client side), and this is because the client operating system may be coalescing multiple smaller I/O operations into a smaller number of large chunks.

Ultimately, however way that you try to slice and dice, it all aligns to the following equation,

\[Throughput = IOPS \times IO\ Block\ Size\]

which is equivalent to,

\[IOPS = \frac{Throughput}{IO\ Block\ Size}\]

When thinking about IO limit, the relationships between the bandwidth, IOPS and IO block sizes are tightly coupled. To size for storage requirement, one cannot think of any one of the elements without considering the direct consequences on the others.

Workload Demand

Workload demand plays an important role on how we solution and size volumes. In order for volumes to deliver the amount of IOPS that are available, they need to have enough I/O requests sent to them. There is a relationship between the demand on the volumes, the amount of IOPS that are available to them, and the latency of the request (the amount of time it takes for the I/O operation to complete).

Average Queue Length

The queue length is the number of pending I/O requests for a device. Optimal average queue length will vary for every customer workload, and this value depends on your particular application’s sensitivity to IOPS and latency. If the workload is not delivering enough I/O requests to maintain its optimal average queue length, then the volume might not consistently deliver the IOPS that was provisioned. However, if the workload maintains an average queue length that is higher than its optimal value, then the per-request I/O latency will increase; in this case, we should provision more IOPS for the volume by moving to higher tier or increase the volume sizing.

To estimate the optimal average queue length for a workload, an example sizing exercise would be (based on the AWS EBS storage) to target a queue length of 1 for every 200 provisioned IOPS. Then monitor the application performance and tune that value based on the application requirements. Different storage back-end will have different parameters. The only way to really know is to generate enough test cases (or simulate real-world application workloads) to find out.

Note

Per-request I/O latency may increase with higher average queue lengths.

Latency

Latency is the measure of actual response time as seen by application. i.e., it measures how long it takes for IO request to get actioned by the storage systems.

High disk latency are usually caused by workloads generating more IOPS than the underlying storage volume was capable of delivering. If the application requires a greater number of IOPS than the volume can provide, consider moving to higher performance tier (e.g., moving from HDD-backed storage to SSD-backed storage in AWS).

[1]

A victim is a workload whose performance has decreased due to other workloads, called bullies, that are over-using the storage IOPS.

Workloads on back-end storage can share many of the cluster components, such as storage aggregates and the CPU for network and data processing. When a workload, such as a volume, increases its usage of a cluster component to the point that the component is unable to meet the demand of other workloads, the component is in contention. The workload whose activity is over-using a component is a bully. The other workloads that share those components, and whose performance is impacted by the bully, are the victims.

[2]Normalised IO block sizes are used to simplifying the measurement IOPS by standardising on a specific chosen block size value (usually in KB). For example, if the chosen normalised IO block size is 256 KB, we would then count each storage IO based on 256 KB unit. Any IO smaller than 256 KB would be counted as 1 IO, and any IO greater than 256 KB would be counted with multiples of 256 KB.
[3]AWS EBS QoS normalised IO block size
[4]Azure Premium Storage QoS normalised IO block size

Comments

comments powered by Disqus