I use the term spindle contention often, but rarely take the step back and review what that means.
Sometimes I forget that hard drives literally spin. Each hard drive has a rate of spinning - 7K, 10K, 15K - and that number is a literal acknowledgement of how fast the little motors go on the drive. The higher the rate, the higher the amount of I/O that can be processed per second because it is literally going faster. That speed can be the point where we either reach an I/O bottleneck or prevent one.
Now that we remember the physical layer, let's get back down to the logical.
Any logical device (often called a LUN) on a set of physical hard drives requests for I/O as fast as the drive will allow it. Every I/O always thinks its the most important request. When a peer VM's workload is actively and aggressively requesting I/O from its storage, it can slow down ALL peer workloads on the same disks. These VMs, affectionately called noisy neighbors, cause a series of events that backs up requests into write cache, increases response times on read request and, in some cases, brings production to a halt.
Now think about it: every VM thinks its I/O is the most important, which means every VM sharing spindles underneath the covers is a potential noisy neighbor. Yes, you can dedicate physical spindles to logical devices to prevent this contention, but it gets expensive to dedicate spindles (and all their capacity) to just a single workload.
That's spindle contention: when VMs tax the physical storage that others share with it. The competition for reads from disk and writes to disk is always the worry in virtual environments. Your workload is no longer happily writing on its LBAs without contending with other workloads that are just as prioritized.
To end on an optimistic note, not every shared workload results in spindle contention. If you have a virtual environment on top of ESXi, there are two measurements worth noting:
- CPU Ready
- Disk Latency
Both of these measurements will be early indicators of a bottleneck due to spindle contention. If you find either to be above acceptable rates for your environment, you're bound to have downstream impact to VM performance.