Infinite I/O

Do more with less — with content-based caching

Posted by Matt Brender on Apr 24, 2014
Find me on:

Me on the phone: “Infinio requires only 8GB per ESXi host to stitch together into its globally distributed, deduplicated cache.”

I say the above statement five to ten times a day as I introduce customers to Infinio Accelerator’s architecture. It’s always accompanied by a conversation about scaling with your environment, and this image:

Infinio's global, deduplicated, server-side cache

I’m most often met with an excited sense of surprise, “That’s all you need?”

Yup, that’s all we need to run and see some impressive results.

Occasionally, a potential customer needs more convincing: “That's too little. How could that be enough?”

The reaction is understandable. When you compare Infinio Accelerator’s request for 8GB of cache per host with a solution that leverages SSDs, you see a handful of cache compared to a few hundred gigabytes. With such a big difference, there must be a catch.

And there is.

There are two attributes of Infinio's cache that allow it to work with such a light footprint.

This week, Infinio SE Jonathan Klick did a wonderful job of explaining our lightweight requirements at the Silicon Valley VMUG in his breakout session, The Art of Conquering Storage Performance. I’ll walk you through one section of his talk that explained location-based versus content-based caching.

Below is an example of a few VMs actively requesting data in the common location-based caching schema:

Location based caching

Most caching solutions pair content alongside location. Each of these pairs is unique, and therefore need to have their own space within the cache.

But if we examine the content itself we see the same (delicious) request twice.

If we adapt the content mapping mechanism to ignore location and focus on content, we see something resembling a content-based cache like Infinio:

Content-based caching

Here we have stored the same amount of information in half the space. We can achieve that with a unique mapping process designed by the team behind Infinio.

As Peter Smith, our VP of Product Management, discusses on the Greybeards podcast, Infinio maps addresses to a digest of what data is stored where in the cluster. The digest then points directly to the stored content in order for it to be retrieved on request without further lookup.

Data digest

In this example, the design allows for a 50% reduction in the required space for the same exact data.

Wouldn’t it be nice to cache unique content without worrying about address? That is exactly why content-based caching is designed into our architecture. Designing a deterministic mapping of unique addresses to common digest gives Infinio a powerful deduplication strategy.

____
Matt is a Sales Engineer at Infinio

Topics: About Us, Talking Tech