Software-led Storage: Hyperscale Storage Requirements

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Software-led Storage: Hyperscale Storage Requirements

Currently 5/5 Stars.
1
2
3
4
5

rate this

Last Update: Feb 14, 2013 | 01:59

Viewed 13515 times | Community Rating: 5

Originating Author: David Floyer

Hyperscale and Big Data are two trends that necessitate new methods of protecting data. Specifically, traditional approaches of data backup are inappropriate for large, scale out infrastructures such as those being popularized by Internet giants (e.g. Facebook), cloud service providers and many government agencies. Simplified storage approaches based on object stores, combined with erasure coding as a means of protecting large quantities of data will dramatically lower storage costs. Moreover, flash will play an increasingly important role in this new storage paradigm to house metadata and enable "in-time" (i.e. near real-time) analytics to be performed on large data repositories.

As Data Grows Recovery Becomes Impossible

Data today is backed up by taking copies. The number of copies, whether by snapshots or physical, is growing out of control increasing complexity. But as data volumes grow, the real problem becomes one of recovery.

The key questions for hyperscale storage are:

How are petabytes or exabytes of data backed up?
How are petabytes or exabytes of data restored?
How are petabytes or exabytes of data accessed?

The simplest analysis of the elapsed time and telecommunication costs for transporting magnetic media will show that traditional methods of backup are prohibitive and over time will become unsustainable. As costs decline and capacities rise due to Moore’s Law, elapsed time only becomes more problematic because access times and transfer rates from magnetic media barely improve.

The bottom line is that as data volumes grow, it takes too long to backup and recover data, increasing costs and the probability of data loss.

How to Protect a Petabyte

As one Wikibon practitioner phrased the problem: "How do you backup a petabyte? You don't."

Currently, the only technology available to address these issues is erasure coding, which uses compute cycles to split up and transform the data into n slices with the ability to recover from only m slices ( where n>m). The slices can be distributed locally (e.g. within a data center) or geographically, and the most advanced implementations combine enough redundant slices (n-m) so that the data can be recovered either locally or remotely.

Traditional methods of reading hyperscale volumes of data from disk to find and analyze patterns are no longer viable. The most important challenge that has to be addressed is how to create metadata for each object that describes where it is, what it is, when it was stored, and how it is related to other objects. This metadata must be accessible at very high speed. As such, it has to be held centrally (mainly in non-volatile memory). The huge benefit of getting this right is that the object store can be used for multiple purposes-- a single store that can contain data warehouses, backups, and multiple archives.

The combined technologies of erasure coding, object storage, and high-performance, flash-resident metadata are the only technologies available (at the moment) that will address the three challenges above. These will be integrated as services within a software-led storage architecture as shown in Figure 1. The potential benefit is to reduce the amount of data stored by a factor of five to ten times, dramatically improving storage efficiencies while at the same time delivering substantially more business value.

Figure 1 – Software-led Storage, a component of Software-led Infrastructure
Source: Wikibon 2013

Action Item: Hyperscale storage must use erasure coding techniques to allow the following:

Avoid backing up petabytes/exabytes of data: redundancy of data should be included in how it is stored locally and remotely,
Avoid restoring petabytes/exabytes of data. Rather restores should be completely integrated into the storage system so that no change of access method is required,
Use an object storage foundation for storing and accessing petabytes/exabytes of data.

This software-led storage model will allow storage services to deliver the lowest cost of storing self-protecting, self-restoring data and high-performance metadata that can self-protect, self-restore, and allow multiple uses of historic data.

Footnotes:

Comments on 'Software-led Storage: Hyperscale Storage Requirements'

There are currently no comments. Be the first!

Revision ID	Author	Timestamp	Comment
45387	Wikibon Daemon	13 Feb 12 12:44:26
45346	Wikibon Daemon	13 Feb 11 16:47:07
45135	Dvellante	13 Jan 30 16:49:51
45045	Dvellante	13 Jan 30 08:05:09
45044	Dvellante	13 Jan 30 08:02:18
45043	Dvellante	13 Jan 30 08:00:30
45041	Dvellante	13 Jan 30 07:57:52
45040	Dvellante	13 Jan 30 07:56:58
45039	Dvellante	13 Jan 30 07:55:54
45038	Dvellante	13 Jan 30 07:55:08
45037	Dvellante	13 Jan 30 07:48:01
45036	Dvellante	13 Jan 30 07:38:55
45035	Dvellante	13 Jan 30 07:37:12
45011	David Floyer	13 Jan 29 18:17:41
45009	David Floyer	13 Jan 29 18:14:39
44972	Bert Latamore	13 Jan 29 15:37:58
44958	David Floyer	13 Jan 26 19:06:12
44947	David Floyer	13 Jan 25 18:23:04
44946	David Floyer	13 Jan 25 18:18:53	Created page with 'Data today is backed up by taking copies. The number of copies, whether by snapshots or physical, is growing out of control. The key questions for hyperscale storag...'

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

As Data Grows Recovery Becomes Impossible

How to Protect a Petabyte

Comments on 'Software-led Storage: Hyperscale Storage Requirements'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information