Dedup In Primary Vs. Secondary Storage

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Dedup in Primary vs. Secondary Storage

Currently 4/5 Stars.
1
2
3
4
5

rate this

Last Update: Feb 03, 2009 | 08:57

Viewed 10311 times | Community Rating: 4

Originating Author: Taylor Allis

Some think primary storage is not always the best place for de-dup. The thinking is that de-dup works where there is a lot of…duplication. Primary storage tends to hold more transactional data, while secondary storage has more duplicate data.

While this is true, there is more duplicate data on primary storage than users know. Specifically, there is plenty of inert data sitting on primary storage – data that has not been referenced in more than 6 months. Users are almost always surprised about how much we find – around 40% on average.

The next question is what to do with this data – it needs to be cleaned up or moved in order to return that 40% to free pool capacity.

One clean up step is data de-duplication – and in some instances a significant amount can be de-duplicated. What are duplicates doing on primary storage? A lot of data management practices (or lack thereof) lead to online storage being littered with duplicate or wasteful data sets.

One example: In many cases application engineers will be testing new applications or updates. They need to run tests on real data – but obviously can’t run them on live, production data. So, they make a snap copy of the production data and run the tests against this data set. If they want to run another test, they’ll make another copy and so on. Do they remember to go back into the system and clean up their copies? Most often the answer is no – and this simple process (which is one of many) robs a primary disk system of its precious capacity.

Data de-duplication can have a significant impact on primary storage in addition to secondary storage. But like any storage technology, the way in which it is implemented is the critical part of the equation.

Action Item: Users should recognize that considerable online storage space is wasted due to bad storage management practices, including leaving multiple redundant copies of data on primary disk storage. Data de-duplication applied to primary storage may offer some hope, however in this economic climate, users should start by assessing their specific installations, developing classification, data retention and migration policies and implementing better storage management practices prior to making large investments.

Footnotes:

Comments on 'Dedup in Primary vs. Secondary Storage'

Thanks for the post Taylor. What are the performance implications of de-duping online storage? Are there any gotcha, compatibility issues, data integrity risks, etc?

Posted By:David Vellante| Tue Feb 03, 2009 09:04

Revision ID	Author	Timestamp	Comment
20569	Dvellante	09 Feb 03 08:57:53
20568	Dvellante	09 Feb 03 08:51:22
20566	Taylor	09 Feb 02 23:24:48	New page: Some think primary storage is not always the best place for dedup. The thinking is that de-dup works where there is a lot of…duplication. Primary storage tends to hold more transactional...

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Comments on 'Dedup in Primary vs. Secondary Storage'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information