I spend most of my weeks talking to VMware customers about applications that they want to perform better. It’s probably not a surprise that the majority of these conversations revolve around databases: ERP, CRM, EMR, BI, and an assortment of other fun acronyms. What’s interesting is that even the organizations who are already using high-end hardware -- including all-flash arrays -- wonder where they can find the “next big thing” to aid them in IT’s ongoing saga to deliver greater business value and help their users be more productive.
The primary bottleneck that these administrators and engineers are encountering are largely due how quickly database requests can be processed, which is dependent on how fast a storage stack can serve up I/O. The following new and upcoming technologies should be able to help alleviate some of the I/O bottlenecks that are often experienced in current infrastructures.
NVMe over Fabrics (NVMe-oF)
In the past year, we have started to see all-flash arrays comprised of NVMe cards. While this takes performance a step in the right direction, the network and communication protocol still create a bottleneck, preventing VMs from realizing the full performance of NVMe flash. Particularly, today’s arrays work off of SCSI-based protocols and this innately requires protocol translation between SCSI and NVMe, which adds overhead (latency) to the I/O.
This is where NVMe over Fabrics (NVMe-oF) enters. NVMe-oF is a protocol that allows a server and a storage array to communicate over a network -- e.g. ethernet and fibre channel -- without that aforementioned protocol translation overhead. In fact, it’s supposed to allow for faster communication between the server and array as well. My expectation is that if there are databases you’ve been hesitant to virtualize due to storage performance concerns, NVMe-oF might be your ticket.
For a primer on NVMe-oF, I’d recommend checking out this blog post. It’s important to note, however, that NVMe-oF arrays are still on the way, along with vSphere support for NVMe-oF. It’s also very likely the technology will take some time to mature before being ready for primetime.
3D XPoint (Intel Optane)
Speaking of NVMe, probably the hottest thing to hit storage in recent history is this new 3D XPoint (pronounced “three-dee cross-point”) technology. You’ll primarily see these under the Intel Optane brand (the P4800X series) with a PCIe NVMe interface. But the interesting detail here is that 3D XPoint is not actually a flash technology; it’s a new form of non-volatile memory.
In its current form, Intel’s Optane cards are delivering unparalleled durability and performance, particularly in comparison to existing flash technologies. At VMworld 2018 in Las Vegas, I saw several marketing displays for “Optane with vSAN”, which actually makes a ton of sense, assuming you’re using Optane for your performance tier. Even in an all-flash vSAN, using Optane as a write-caching tier will provide a noticeable edge in performance.
Persistent Memory (PMem)
The PCIe NVMe interface for Optane isn’t its endgame though. Where Optane will really begin to sing is when placed in a DIMM slot with an NVDIMM form factor, or what the industry is generally referring to as persistent memory (PMem). This means we’ll be seeing upwards of 512GB of capacity per DIMM with access times that are closer to the speed of DRAM. Imagine booting an in-memory database off of that. Pretty cool, right?
vSphere 6.7 has already paved the way with PMem support and Intel is supposed to release Optane NVDIMMs sometime this year <crosses fingers>. The one major hurdle many of us might face, however, is that Optane NVDIMMs will require Cascade Lake processors, the latter of which is supposed to be released this year, but may get pushed to early 2019. Either way, this will likely require a server refresh to take advantage of.
While we wait for some of these other technologies to come to fruition (and mature), one of the options for delivering higher levels of database performance is to create localized caches in ESXi hosts to cut networking and protocols out of the mix. Oftentimes, this can be done through vSphere Flash Read Cache (vFRC; for vSphere Enterprise Plus users), but the past couple of years have opened other paths as well.
In particular, VMware released a new set APIs called the vSphere APIs for I/O Filtering (VAIO), which allows third-party software to have a safe, secure, and performant method of inserting itself into the ESXi storage stack. Various companies -- including the one I work for, Infinio, -- are able to leverage these vSphere APIs to create caching layers inside ESXi that can not only use flash (or 3D XPoint) as high performance tiers, but even DRAM.
The end result is storage performance that can be upwards of 10X faster than an all-flash array, or even an all-flash vSAN. Perhaps it’s even the most actionable solution too, because server-side caching can usually be deployed non-disruptively and using idle resources your ESXi hosts already have.
I hope this has helped some of you along in your journey to finding the “next best thing” for database performance. Feel free to comment below to share thoughts, insights, or questions you might have.