Sometimes the Obvious is Revolutionary

At EqualLogic, we were always surprised that storage automation was considered revolutionary. We thought that the idea was obvious, and that it was just a matter of changing the storage hardware and software architecture to make this happen. This was back in 2001, and by 2010, storage automation was expected. The big innovation of EqualLogic was bringing storage into the automation era.

Dial forward to 2011. As I was thinking about what was next, I started to read about the big data movement. Somewhere around that time, Hadoop and big data became the same thing. Not sure how this happened, but at least in the media and analyst communities, this definition had stuck. I was also reading about digital currency and how important it is to understand and get actionable insights from information at every level in the company. I was surprised no one was talking about primary storage being responsible for contributing to these insights. This is where the data is created and lives.

As I thought about this, I realized that a significant number of actionable insights should be surfaced at the point of storage. This would allow the storage array itself to contribute to operational, storage and data intelligence.  The idea seemed obvious that this should just all happen from the array. Deeper, more targeted analytics would still need to be done in a Hadoop-like infrastructure. But that’s no reason why storage shouldn’t be part of the solution.

However, as I started to work on how this could happen, I quickly hit a couple roadblocks. If you tried to do this with the existing storage software architectures, you’d kill performance. But more importantly, storage architectures at the time did not retain rich intelligence about the data it stored – the array maintained metadata about the storage it was managing, but not about the data. The metadata for storage tracked blocks and/or files and nothing more. Simple information like who touched the data, how and when was not maintained, or insights into how the data related to people, content and time were not even remotely possible.  I was running into the same issues that servers and ancillary install-and-crawl products have today. No way to capture the information, no place to coalesce the information, and no way do this without impacting the primary storage.

At the same time, there was some buzz about using storage snapshots as a replacement for backing up data. Nothing could be further than the truth. Snapshots live on the same physical media as primary storage. So a failure of the primary storage (which happens quite a bit), meant losing the snapshots and the entire backup.  Not a very reliable backup system. In addition, recovery to a previous snapshot is the largest unknown sanctioned data loss I know of, since you don’t know what’s changed between the current state of the system and the snapshot to which you are restoring. Alternatively, you could mount a bunch of snapshots on a server and start doing differences to figure out what’s changed and what you want.

Since I can remember (and I am old), backup has been in the top three pain points for customers. Restores were painful and uncertain. So, I started to think perhaps it was time to tackle this, as well. No one had reimagined the snapshot in a long time. I was tired of the debate of copy-on-write or reallocate-on-write. Perhaps primary storage should do neither.  Perhaps primary storage needed to have backup intelligence built-in from the ground up.

These reflection points are what lead us on the journey for what we now call data-aware storage. Although things didn’t look very hopeful initially, and I understood why no one was doing the obvious, I’d participated in changing the storage software architecture once before, so I wasn’t daunted by the roadblocks.

When we first started talking to potential customers and investors back in 2012, The Eyewe showed a slide that had a graphic indicating that storage should watch, protect, analyze and visualize the data it held. We believed these key capabilities should be inherent in the core architecture for storage, much like thin provisioning, inline compression and deduplication are today. We called this adding data intelligence tostorage.

For details on how we pulled this off and fundamentally changed the storage architecture, take a look at our technical white paper.

Data-aware storage Two-plus years later, we have stopped using what we now fondly refer to as the creepy eye picture, but the mission is the same.  Storage should be data-aware.
Data-aware storage must:

  • Track who is accessing the data, what operations they are performing and when. This is used for data security and governance, but it can also show who is collaborating through the data and who the experts are for various topics. It can identify the most active users are for a given mount point, and is used to help construct the catalog used for restores.
  • Provide integrated data protection with fault isolation. Remove the old metrics around RPO and RTO. RPO should be whatever matches your business needs, whether it’s time based or operationally based, such as data change rate. It should allow for near instant data recovery at any granularity that has the least impact on your business: file, file system, file within a VM, VM or LUN. This granular recovery means protection points, which we call DiscoveryPoints, must have a catalog and be searchable. Recovery shouldn’t require an IT administrator. End-users should be able to do their own recoveries.
  • Surface insights about the data stored on the array, at the content layer when possible, by providing search, discovery and visualization capacities.

Having the primary storage array provide the capabilities that embody data-aware storage is the obvious place for this to happen, and that’s what we’ve done with the DataGravity Discovery Series. Yet once again people are calling this advance in storage revolutionary. This makes me smile, since it feels a little like “Groundhog Day.”

Storage table stakes have changed. Data-aware storage will be fundamental to storage systems, just like storage automation has now become mainstream. If you believe storage should be more than a very expensive filing cabinet, we’re on the same page.

 

Want to see data-aware storage in action? Stop by the DataGravity booth, #1647, at VMworld next week in San Francisco.

  Like This

Paula Long

Paula Long is the CEO and co-founder of DataGravity. She previously co-founded storage provider EqualLogic, which was acquired by Dell for $1.4 billion in 2008. She remained at Dell as vice president of storage until 2010. Prior to EqualLogic, she served in engineering management positions at Allaire Corporation and oversaw the ClusterCATS product line at Bright Tiger Technologies. She is a graduate of Westfield State College.