Infinite I/O

Intel Optane: Our POV

Posted by Arun Agarwal on Jul 13, 2017

Disclosure: As a leading DRAM and NAND caching company for data centers and private clouds, we’re collaborating closely with Intel on Optane testing, and it is an area of great interest to us. We’ve been closely following the discussion around the recent Register article that quotes Wikibon’s report on Optane by David Floyer. While we don’t have a subscription to Wikibon and thus can’t read the full report, we wanted to respond, as most IT decision makers will only see the Register article or the Wikibon free preview.

In David Floyer’s Wikibon report on Intel’s Optane, he says, “CIOs and CTOs should assume 3D XPoint will NOT be important to the enterprise, and not evaluate it further.” Put bluntly, we respectfully disagree with the bold and summary viewpoint that Optane should not be a priority for CIOs & CTOs. Our disagreement is less about challenging any technical data, and comes more from the unique perspective we have working directly with customers to implement high-performance DRAM caching in their production data centers – regularly in front of all-flash arrays. This goes far beyond any synthetic benchmarking, and really comes down to 3 topics: 

#1: The real world workloads which benefit from ultra low latency storage I/O: We’ve studied which applications Optane can be highly impactful to, those that really benefit from getting storage access multiple times faster. These workloads are not niche applications, they are the future of IT that every CIO and CTO needs to be highly focused on. 

The customers who care most about microsecond latency (i.e. going 20X faster than all-flash storage) have one thing in common: they rely on data processing jobs whose value to the business increases as their time to completion decreases.  For example,

  • A complex analytics / big data job that currently runs nightly but would enable the business to make better decisions if it could run hourly.
  • A search indexing or eDiscovery job that brings greater value when it can complete in interactive seconds rather than minutes.
  • An automated nightly software testing task that couldn’t complete before developers show back up in the morning, but can now run many more regressions in a shorter timeframe.
  • A backup job that can’t hit recovery point objectives becoming one that exceeds RPOs.

In the world of big data, web-scale commerce, the Internet of Things, and software development – what could possibly be more important to CIOs and CTOs than enabling the applications that power these initiatives? In short, it is exactly what CIOs and CTOs should be concerned about.

It is insufficient to continue the long-standing tradition of talking about “IT applications” as simply the likes of CRM, ERP and e-mail. Increasingly, traditional transactional systems are becoming SaaS hosted offerings (just look at the prevalence of Office 365). When evaluating storage technologies like Optane, NAND, and DRAM, CIOs and CTOs need to be thinking about what differentiates their business, and investing in the technologies that enhance those core competencies. That will dwarf any impact to IT costs (which I’ll discuss more below).

Arguing that Optane is niche is like arguing that Hadoop is niche because a lot of the data center isn’t doing “big data.” We should anticipate the demand for Optane to grow at similar substantial rates as Hadoop has.

The next generation of companies like Facebook and Google take this performance and data analytics edge so seriously that they often build performance infrastructure from scratch. Today’s Global 2000 CIOs and CTOs can’t do that. They are inheriting massive existing infrastructure, with existing standards. This is where vendors like Infinio come in – we can help give that performance edge to everyone else who is still running on traditional x86 and VMware. This will enable retailers like Walmart to better compete with technology companies like Amazon, when Amazon purchases Whole Foods and establishes a retail presence in Walmart’s backyard.

#2: For these workloads, can you deliver sufficient performance with a DRAM-only cache? Is there no benefit to having Optane over 3D NAND as the tier after DRAM?

So what about the argument that you can just put a DRAM cache in front of NAND to achieve these results? Our description of applications above basically answers this for us. When an organization’s primary applications revolve around data analytics and machine generated output – the working set (i.e., the data that gets accessed regularly) will simply be too large to fit in DRAM. Here at Infinio, working set is one of the most important metrics we continuously seek to better understand.

One of the highlights of Infinio’s technology is that we employ advanced data reduction techniques like deduplication to make the most out of ALL storage media: whether it be DRAM, Optane, or NAND. And while these techniques significantly drive down the cost of all media, they will not contain the entire working set of these applications, and thus not enable performance that is 20X faster than all-flash.

We do have a class of customer whose goal is merely parity with all-flash (i.e. they are trying to speed up a slower HDD or hybrid array). In this case, the combination of Infinio’s data reduction techniques and DRAM is good enough – Infinio serves some percentage of I/O very quickly out of DRAM, and the array serves some more slowly, so blended application latency feels more flash-like than their native storage. In that case, Wikibon’s argument is fine. But this simply will not work for the next generation of IT applications.

#3: Is Optane worth the cost?

I’ll keep this section short. The difficulty in answering this question is a disconnect in whether the analysis should be one of relative cost or absolute cost. There are two things to remember:

  1. Optane combines speeds that are much faster than flash with persistence that DRAM doesn’t have. This enables new business results. The relative costs of Optane vs. NAND is miniscule when thinking about the absolute ROI of getting reports hourly rather than nightly, or running software tests more quickly, enabling more frequent releases of products. Driving revenue or strategic initiatives with technology usually dwarfs the cost of such technology.  
  1. The value of data reduction technologies. From a relative value perspective, because you can dedupe both Optane and NAND, one could argue it doesn’t make a difference in their relative $/TB. However, from an absolute value perspective it makes a massive difference. Consider the ratio in the Register article: going from $6 for every $1 makes a big difference, but with data deduplication techniques, going from $1.20 to $0.20 for the same size working set is less impactful. The business benefit needs to be measured against the absolute cost for a working set, not the relative cost.

Conclusion 

In short, Optane is an important and game changing technology that CIOs and CTOs need to have top of mind. They will need enabling technologies to integrate this technology into their existing infrastructure, and companies like Intel, VMware, and Infinio are building solutions to make this as easy as possible.

As a footnote, I want to highlight Intel’s clarifying comment in the Register about device level versus application level latency. All too often, the software stack required to allow applications to use media gets ignored. DRAM latency is often quoted as “in the nanoseconds” – but the reality is that it is almost always in the microseconds when you add the software stack. Infinio is architected on VMware’s VAIO framework, giving us perhaps the most performant access to I/O possible in Global 2000 production cloud data centers, and we see on the order of 80 microsecond latency.

I recognize that this post is long, and detailed in nature. Thanks for reading.

Topics: Talking Tech, Performance, NAND, Optane, Intel, 3D Xpoint, RAM