Data Deduplication: Declawing The Clones

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Data deduplication: Declawing the clones

Currently 3/5 Stars.
1
2
3
4
5

rate this

Last Update: Jun 15, 2009 | 02:16

Viewed 51581 times | Community Rating: 3

Originating Author: 68.189.241.40

Moderator: Peter Burris

Analyst: David Floyer

Users have been struggling for years with the challenges of trying to reduce the amount of storage necessary to support critical applications in their organizations. A technology that has been put forward for quite some time recently received a pretty significant boost in announcements by both Network Appliance and IBM.

Data deduplication promises potentially very high space savings (30%-50%) for storage environments that feature frequent cloning of single pieces of data either at a file, record or block level. Data deduplication takes three basic forms, including in-line, block hashing and logical construct. Each of these different technical approaches has their pros and cons but they all basically seek to find circumstances in which the same bit of data has been replicated multiple times in response to often arbitrary backup and/or application activities.

It is important to note that the types of applications that tend to receive the largest benefit from deduplication tend to be those that feature very high backup and restore requirements such as database backup, software archiving, etc. where the notion of truth in the data becomes very important and as a consequence cloning of that data is often repeated across different application forms (e.g. to data warehouses, etc.).

The concerns users will face as they evaluate data deduplication today are a few but important nonetheless. The most significant is data deduplication is applied utilizing proprietary formats. Data is written directly into the file headers that basically describe how the data has been deduped and presents pointers to applications so that those applications can be assured that they will get access to the copy of the data that they need. The system of pointers that results from these technologies can lead to some performance degradation. Indeed storage environments which benefit the most from data-deduplication are likely also to be those that face the greatest performance concerns. Additionally, it is critical that encryption occur after data deduplication to ensure that overall integrity and other very basic concerns regarding storage can be maintained.

We will see a fair amount of discussion regarding how data deduplication can be a general purpose replacement for tape in a backup restore scenario. However, due to a variety of reasons, not the least of which remains the cost of communicating large volumes of data over potentially great distances, tape will continue to have a viable life for the foreseeable future despite some of the advantages of data deduplication. At this juncture it is safe to say the best current data deduplication implementation offers no major advantage over the worst tape solution for very high volume data backup and recovery applications.

As we look forward, we see deduplication becoming a critical enabling technology that can be successfully paired with other emerging storage technologies including thin provisioning, virtualization and others. However it is imperative that users take very close looks at the tradeoffs between the advantages of deduplication and the potential performance costs on the one hand and on the other hand fully understand the consequences of buying into yet another storage technology featuring relatively proprietary formats.

Action Item: Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs. However like all technology arrows, users must be careful to choose which targets to shoot data deduplication at and be very certain in their aim.

Comments on 'Data deduplication: Declawing the clones'

This is a good high level discussion of data de-duplication. It is important realize not all de-duplication tools are created equal. Some important factors to consider include bandwidth requirements across the network, storage cost, restore speed and efficiency when a fault does occur, ability to do file restore directly from de-duped VM's backups, and overall system performance.

There is no doubt that de-duplication is important in over all infrastructure efficiency. When you first start your investigation of various de-duplication technologies take the time to understand all the efficiencies that can be gained. I will quickly outline a few technologies to consider before you start.

There are three technologies that directly impact communication bandwidth. 1) Source de-duplication reduces the amount of data between the client machines and storage or media server. This can be especially important when supporting multiple remote offices. When you reduce the amount of data backed up traffic is reduced. 2) Media server de-duplication can reduce the amount of data on the storage network. Reducing the amount of data between the media server and SAN can postpone an expensive network upgrade by using the available network more efficiently. 3) Single pass backup of VM’s and de-duped clients reduces the number of times a backup has to pass data over the network. Each of the above technologies improves the overall network efficiency when using de-duplication.

Investigate individual file recovery as a potential money saver. Technology exists to recover individual files from de-duplicated VM backups. There are two significant advantages to individual file recovery. Restore time and network efficiency. Being able to recover an individual email vs. having to recover a full VDK or an entire mail box will both speed up the recovery time and reduce the necessary bandwidth consumed by the event.

De-duplication has a direct impact on storage cost. There are many de-duplication appliances available on the market. Each supplier has variety of additional value for your infrastructure. The additional features can include enhanced management, replication, security, and other features. Another consideration is whether or not low cost commodity storage can be used. The advantage of using commodity storage is you pay for the de-duplication technology once and when the time comes to upgrade the system a new low cost storage array can be added with no additional licensing costs.

In summary taking the time to understand what technology exists will open the door to many potential options and overall infrastructure savings. Before you begin your de-duplication project understand what issues you are facing and you may discover your de-dupe project will solve other concerns you are facing.

Bruce Kyro
bruce.kyro@gcisystems.com
651-604-5738

Posted By:Bruce Kyro| Mon Feb 15, 2010 07:58
Article makes a good point of bandwidth issues with current dedupe, but it’s really the fault of archaic backup technology. Fundamentally, backup in its current popular form is very broken. This isn’t such a surprise when you consider that most backup applications have architectures that date back 10 to 20 years. Backup solutions send multiple full backups at each rotation cycle and incrementals are stuffed with redundant data as well. These “legacy” backup products consume bandwidth across LAN and WAN networks and massive capacity on disk storage pools.

Legacy backup with cutting edge de-duplication appliances (or vendor offered target add-ons) still involves trawling the source systems and sending masses of data across the network. The same de-dupe efficiency can be gained using Changed Byte Transfer, saving you 20 – 30 times the network transfer, plus huge amounts of time and processing at the dedupe target.

Changed Byte Transfer replaces full and incremental cycles, and only changed data needs to be sent without a heavy window to deal with. This way, you don’t need to implement separate, CPU and network intensive dedupe solutions. Dedupe becomes inherent within the backup / data protection solution at Source and Target. A good example of next-generation backup/dedupe technology is at http://www.cofio.com

Tony Cerqueira
tony@cofio.com
1-858-581-6500

Posted By:Tony Cerqueira| Thu Jul 01, 2010 02:00

Revision ID	Author	Timestamp	Comment
23168	Wikibon Daemon	09 Jun 15 14:16:46
18529	Dab4168	08 Dec 31 18:51:51
14862	Dab4168	08 Mar 25 10:45:44	misc
14861	Dab4168	08 Mar 25 10:42:16	Removed category companies & box
14350	Dab4168	08 Feb 26 17:04:33	Fixed spelling on Category: Netapp
14276	Dab4168	08 Feb 22 18:27:22	Added category: Companies
13838	Dab4168	08 Feb 16 19:13:59	removed category Author PBurris
13488	Wikibon Daemon	08 Feb 12 09:32:43
10739	Dvellante	07 Sep 26 13:54:59
10738	Dvellante	07 Sep 26 13:54:30
10735	Dvellante	07 Sep 26 13:52:22
10224	Dvellante	07 Aug 29 13:17:28
8599	Dvellante	07 May 22 23:55:33	/* Storage Markets predictions on data deduplication */
8598	Dvellante	07 May 22 23:55:13	/* Storage Markets predictions on data deduplication */
8597	Dvellante	07 May 22 23:54:45	/* Storage Markets predictions on data deduplication */
8596	Dvellante	07 May 22 23:53:37
8595	Dvellante	07 May 22 23:51:31
8594	Dvellante	07 May 22 23:50:26
8593	Dvellante	07 May 22 23:49:01
8592	Dvellante	07 May 22 23:48:11
8591	Dvellante	07 May 22 23:47:23
8589	Dvellante	07 May 22 23:46:22	[[Data de-duplication: Declawing the clones]] moved to [[Data deduplication: Declawing the clones]]: typo in title
8588	Dvellante	07 May 22 23:45:57	Storage Markets predictions on data deduplication
8587	Dvellante	07 May 22 23:42:29
8586	Dvellante	07 May 22 23:33:25
8585	Dvellante	07 May 22 23:27:16
8584	Dvellante	07 May 22 23:23:48
8583	Dvellante	07 May 22 23:15:23
8582	Dvellante	07 May 22 23:09:58
8581	Dvellante	07 May 22 23:07:38
8580	68.189.241.40	07 May 22 23:01:46

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Comments on 'Data deduplication: Declawing the clones'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information