Not Logged In

You could:

Log in
Register

research notes
  • Wikitips
  • Professional Alerts
  • Case Studies
  • How-to Notes
  • Community Questions
research meetings
  • Peer Incite Podcasts
  • Peer Incite Archive
Events
  • Enterprise Architect Summit 2008
    Oct 4-6, 2008
  • Computerworld: Storage Networking World
    Oct 12-15, 2008
  • Energy Efficiency and Sustainability Symposium
    Nov 5-6, 2008
  • End-to-End Reliability: The Green Horizon
    Nov 15-18, 2008
  • Business Continuity Planning 2008: Architecting a Reliable Data Management and Protection Plan
    Nov 18, 3:00-7:30 AM

Announcements
  • 10-07-08 Peer Incite: Best practice in tape backup and recovery
  • IBM's stealth XIV announcement
  • Welcome to Wikibon 2.0!
  • The IBM XIV Storage System Model A14
  • Storage Customers Seeing Green with Conserve IT
Home Profile Peers Wiki Groups Feedback


  • Article
  • Comments (1)
  • Page Protected
  • History
  • Vault
Classifying data
  • Currently 4.5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
rate this
Last Update: Feb 15, 2008 | 03:02
Viewed 5102 times | Community Rating: 4.5
Originating Author: David Vellante

Originating Author: Fred Moore

This article is intended to provide storage managers guidelines to effectively categorize data and improve storage management. The article is written for storage managers, storage architects, database managers and IT personnel involved in storage planning and management.

This article will describe data classification, what it is, how classifying data improves storage mangement, how to classify information, the business impacts of classifying data and how ongoing data classification processes can best be managed.

Contents

  • 1 Summary
  • 2 What is Data Classification
  • 3 How to Classify Data
    • 3.1 Classes of Data
      • 3.1.1 Mission-critical data
      • 3.1.2 Vital data
      • 3.1.3 Sensitive data
      • 3.1.4 Non-critical data
    • 3.2 Technology Dependencies
    • 3.3 Skills Dependencies
    • 3.4 Organizational Dependencies
    • 3.5 Business Benefits of Classifying Data
  • 4 How to Ensure Data Classification Schema are Adopted
  • 5 Issues Covered in this Article

Summary

In a study entitled "How Much Information? 2003" produced by the School of Information Management and Systems (SIMS) at the University of California, Berkeley it was estimated that 5 Exabytes of information was created in 2002 and produced in print, film, magnetic, and optical storage media. This figure represents about a doubling from 1999. In 2006, EMC commissioned a study by IDC entitled "The Expanding Digital Universe" which was intended to build upon the Berkeley work and estimated that 161 exabytes of digital information was created, captured and replicated.

These figures are astounding and bring up two questions:

  1. How can organizations classify information so that storage device characteristics can be matched with information requirements?
  2. How can data classification disciplines help organizations determine how much they should spend on storage?

What is Data Classification

Data and information classification refers to the policies and procedures by which stored data is categorized so the information can be accessed, updated, protected, recovered and managed more efficiently in accordance with specific application requirements.

How to Classify Data

Analysis by Horison Information Strategies indicates that data, information and applications can be classified in four primary ways based on criticality, according to their RTO (recovery time objective) and other measures as follows:


Data Classification Estimates
AttributesMission-critical - 15%Vital - 20%Sensitive - 25%Non-critical - 40%
Recovery Time Objectiveimmediatesecondsminuteshours,days
Availability index99.999+99.9999.9<99.0
Retention periodhoursdaysyearsinfinite

Source: Horison Information Strategies

Classes of Data

Classifying data is becoming a critical IT activity for the purposes of implementing the optimal data solution to store and protect data throughout its lifetime. Developing a data classification methodology for a business involves establishing criteria for classes of data or application based on its value to the business. Four distinct levels of classifying data or applications are commonly used: mission-critical data, vital data, sensitive data and non-critical data. Determining these levels takes some cooperative effort within the business and when completed, enables the most cost-effective storage and data protection solutions to be implemented. Data classification levels also identify which backup and recovery or business resumption solution is best suited for each level to meet the RPO (Recovery Point Objective) and RTO (Recovery Time Objective) requirements. While very important, RTO & RPO are not the only parameters used to classify data. Other considerations include availability, length of data retention, service levels and performance requirements, and overall costs. The figure below illustrates an effective data classification model.

Data Classification Model
Data Classification Model

Here is a summary of each of the four data classification categories with a description of the attributes found in each:

Mission-critical data

Mission-critical data is used in the key business processes or customer facing applications and can account for as much as 15 percent of all data stored online and typically has very fast response time requirements. Mission-critical applications have a RTO (Recovery Time Objective)of one-minute or less, to immediately resume business after the disruption. Losing access to mission-critical data means a rapid loss of revenue, potential loss of customers and places the survival of the business at risk. Mirroring protects against device failures but not from data corruption, intrusion, human or software errors. Therefore, all mission-critical data that is mirrored should also have point-in-time copies that enable full recovery prior to the point in time of the corruption event. Mission-critical data is usually classified as company secret and some applications may be a candidate for encryption. Mission-critical data is normally backed up using integrated virtual tape libraries (disk arrays and tape libraries combined)or SATA-based disk arrays. Maintaining mirrored copies for non mission-critical data is extremely expensive.

Vital data

Vital data accounts for about 20 percent of all data stored online; however, vital data doesn’t require instantaneous recovery for the business to remain in operation. Vital data may be classified as company secret. Data recovery times, the RTO, ranging from a few minutes to an hour or more, are acceptable and vital data is normally backed up using integrated virtual tape libraries or SATA-based disk arrays. Mirroring is not normally required for vital data as techniques such as point-in-time copy, snapshot copy, CDP (Continuous Data Protection) and de-duplication are sufficient to meet the application’s RTO while avoiding the additional hardware costs associated with disk mirroring.

Sensitive data

Sensitive data accounts for about 25 percent of all data stored online. Recovery times the RTO, can take from several minutes to several hours without causing major operational or business impact. With sensitive data, alternative sources exist for accessing or reconstructing the data in case of data loss. The growing popularity of SATA-based disk subsystems for backup now provides viable and cost-effective technology options along with tape, which has historically been the primary choice for backup and recovery.

Non-critical data

Non-critical data represents approximately 40 percent of all data stored online making it the largest classification category. Lost, corrupted or damaged non-critical data can be reconstructed with minimal effort, and acceptable recovery times can range from hours to several days since this data is not essential for business survival. Non-critical data may suddenly become valuable based on unknown circumstances however giving momentum to extending the useful lifecycle of data significantly. E-mail archives, legal records, medical information, scientific data, financial transactions, security data and fixed content often fit this profile. Most non-critical data is backed up to lower-cost storage solutions with tape being the most popular choice.

Technology Dependencies

With the advent of more advanced storage management tools and information lifecycle management initiatives, the classification of data and information becomes critical to establish initial data placement and ongoing automated management. The ingredients for a successful data classification implementation include a policy-engine and a tiered storage hierarchy.

Skills Dependencies

Tools, processes, experience and leadership are often lacking in many organizations to effectively classify data and to make potentially difficult choices. A policy-driven data classification approach provides an automated method to enforce the assignment of correct levels. Several companies are now delivering data classification tools for non-mainframe systems today and each should be reviewed to determine their product focus meets the business requirements. Mainframe computers using DF/SMS (Data Facility Storage Management System)have enjoyed an excellent data classification capability since 1988.

Organizational Dependencies

A clear owner for the data classification process greatly facilitates the effort, class assignment process and prioritization. Businesses can get bogged down in the assignment process if a clear leader isn't established.


Business Benefits of Classifying Data

Implementing a data classification scheme enables the optimal, most cost-effective storage hierarchy to be implemented while insuring the highest level of data availability that will meet the service levels of the business.

How to Ensure Data Classification Schema are Adopted

Ensuring that data classification schema is implemented requires storage administrators and end-users to agree on classification criteria. Policies can then be established to enforce the criteria on an ongoing basis.

Issues Covered in this Article

  1. How can an organization effectively and reliably classify data and information?
  2. What are the likely cost associated with data classification?
  3. What benefits will an organization see from better data and information classification?
  4. What technologies are available to help implement data classification?
  5. What common mistakes are made in undertaking data classification initiatives?
  6. What is the most reliable way to construct, implement and maintain an ongoing data and information classification discipline?

Here's text after classifications

categories
Data classification, Managing storage, Storage professional alerts
Contributors

Dab4168

Wikibon Daemon

Mrgood

David Floyer

PBurris

Fmoore

DavidFloyer

Comments (1)
Comments on 'Classifying data'
  • Very nice discussion of data classification schema from a storage viewpoint. It would, of course, be nice if that same classification schema can be applied to data security, and the terminology used in this piece imply that it can. Actually, however, this is not always the case. The data that must be most available is not always the most sensitive. The obvious example is the famous case of the formula for Coca Cola. This is the single most valuable piece of data that the company owns. As a result, there is only one copy, written on paper, and kept in a bank fault in Atlanta. Obviously this does not make it very accessible, but that, after all, is the idea of security. A more common example is transactional data. This typically is Tier 0 or at least Tier 1A data from a storage viewpoint, and it also is highly sensitive data from the security standpoint. But it is often a poor candidate for encryption simply because the encryption and decryption of this data, particularly if there is a very large data flow and if the data needs to be accessed often and quickly, can seriously slow down the process first of writing it to storage and then of accessing it. Thus unfortunately data often has to be classified twice: once for storage tiering and again for security purposes.


    Posted By:Bert Latamore| Fri Sep 26, 2008 12:10

Post A Comment

You must be logged in to post a comment, please Sign in

Revision ID Author Timestamp Comment
13702 Dab4168 08 Feb 15 15:02:30 Removed Category:Author Fmoore
13414 Wikibon Daemon 08 Feb 12 06:48:09 /* Classes of Data */
13413 66.189.93.106 08 Feb 12 06:43:17 /* Issues Covered in this Article (checklist) */
7962 68.189.241.40 07 Apr 04 21:17:05 /* Summary */
7961 68.189.241.40 07 Apr 04 21:15:55 /* Summary */
7960 68.189.241.40 07 Apr 04 21:00:11 /* Summary */
7278 Dvellante 07 Mar 06 22:49:30
6929 Dvellante 07 Feb 26 23:39:56
6480 Dvellante 07 Feb 19 22:54:59
5846 66.202.41.205 07 Feb 07 10:21:30
4681 Mrgood 07 Jan 04 14:12:43
4609 Dvellante 07 Jan 03 22:59:59
2821 Dab4168 06 Nov 28 12:09:26 /* What is Data Classification */ grammer
2820 Dab4168 06 Nov 28 12:06:10 /* How to Ensure Data Classification Schema are Adopted */ grammer
2622 Dvellante 06 Nov 16 15:42:14
2621 152.163.100.11 06 Nov 16 15:41:49
2620 Dvellante 06 Nov 16 15:39:16
2534 68.189.241.40 06 Nov 15 20:33:31 edits
2329 David Floyer 06 Nov 06 14:50:47 Added Data Classification Model
2293 Dvellante 06 Nov 05 12:56:08 category
2242 Dvellante 06 Oct 31 13:15:51 /* Mission-critical data */
2241 Dvellante 06 Oct 31 13:14:08
2131 Mrgood 06 Oct 28 15:32:28
1945 Dab4168 06 Oct 27 11:57:40 /* Classes of Data */ typo fix
1944 Dab4168 06 Oct 27 11:42:57 /* How to Classify Data */ typo fix
1943 Dab4168 06 Oct 27 11:41:23 /* Summary */ typo fix
1942 Dab4168 06 Oct 27 10:36:38 /* Vital data */ typo fix
1925 Dvellante 06 Oct 26 18:00:35 [[Data classification]] moved to [[Classifying data]]: change to convention
1920 66.202.41.205 06 Oct 25 16:21:29 linked horison
1919 66.202.41.205 06 Oct 25 16:20:10 fixed ext link
1918 66.202.41.205 06 Oct 25 16:17:41 fixed ext. link
1883 PBurris 06 Oct 21 15:30:56
1802 PBurris 06 Oct 18 17:02:50
1771 204.56.33.246 06 Oct 18 14:30:13
1770 PBurris 06 Oct 18 14:22:26
1769 PBurris 06 Oct 18 14:21:58
1768 PBurris 06 Oct 18 14:21:09
1767 PBurris 06 Oct 18 14:18:16
1751 Fmoore 06 Oct 18 11:32:06
1726 Dvellante 06 Oct 12 17:32:59 /* How to Ensure Data Classification Schema are Adopted */
1725 Dvellante 06 Oct 12 17:32:45 /* Business Benefits of Classifying Data */
1724 Dvellante 06 Oct 12 17:32:34 /* Organizational Dependencies */
1723 Dvellante 06 Oct 12 17:31:40
1722 Dvellante 06 Oct 12 17:31:22
1721 Dvellante 06 Oct 12 17:30:17
1720 Dvellante 06 Oct 12 17:29:29 apply instructive article model
1709 DavidFloyer 06 Oct 12 14:04:13
1707 DavidFloyer 06 Oct 12 14:03:19
1705 DavidFloyer 06 Oct 12 13:59:23 RTO RPO Links
1684 75.20.217.246 06 Oct 11 14:17:38
1682 75.20.217.246 06 Oct 11 14:14:30
1681 75.20.217.246 06 Oct 11 14:13:54
1679 75.20.217.246 06 Oct 11 14:10:00
1678 75.20.217.246 06 Oct 11 13:58:30
1676 Dvellante 06 Oct 11 13:48:11
1675 Dvellante 06 Oct 11 13:47:25
1674 Dvellante 06 Oct 11 13:47:06
1673 Dvellante 06 Oct 11 13:43:36 add intro sentence
1672 Dvellante 06 Oct 11 13:24:40
1671 DavidFloyer 06 Oct 11 13:23:43 Added RPO
1670 Dvellante 06 Oct 11 13:22:46
1669 Dvellante 06 Oct 11 13:20:23 edit df's edits
1668 Dvellante 06 Oct 11 13:17:25
1667 Dvellante 06 Oct 11 13:17:00
1666 Dvellante 06 Oct 11 13:15:50
1665 Dvellante 06 Oct 11 13:14:30
1664 Dvellante 06 Oct 11 13:13:38 add sub headings under 'data classification'
1663 Dvellante 06 Oct 11 13:11:22
1662 Dvellante 06 Oct 11 13:11:04
1661 Dvellante 06 Oct 11 13:10:42
1660 Dvellante 06 Oct 11 13:10:23
1659 Dvellante 06 Oct 11 13:09:55
1658 Dvellante 06 Oct 11 13:09:09 add table
1657 Dvellante 06 Oct 11 12:57:53
1656 Dvellante 06 Oct 11 12:56:54 /* Summary */
1655 Dvellante 06 Oct 11 12:56:31
1654 Dvellante 06 Oct 11 12:55:56
1653 Dvellante 06 Oct 11 12:55:21 add spending question
1652 Dvellante 06 Oct 11 12:37:45 add link to how much information 2003
1644 DavidFloyer 06 Oct 11 07:25:40 Expansion of questions
1612 Dvellante 06 Oct 09 23:47:10 [[Data classicfiction]] moved to [[Data classification]]: typo
1610 Dvellante 06 Oct 09 23:42:21 Stub for Fred Moore's article
1609 Dvellante 06 Oct 09 23:22:38
1608 Dvellante 06 Oct 09 23:22:08
1607 Dvellante 06 Oct 09 23:12:28
1606 Dvellante 06 Oct 09 23:04:40
1601 Dvellante 06 Oct 08 09:05:19

Search:

news feed
  • Latest from Computerworld - Ask.com upgrade to add improved relevance, speed
  • eWeek - RSS Feeds - Passports: Another Bad Use of Self-Signed Certificates
  • InfoWorld RSS Feed - Google, Yahoo delay ad deal over DOJ investigation
  • Byte and Switch: - Unitrends Enhances Rapid Recovery Backup Platform
  • SearchStorage: News and trends in the storage industry - FalconStor CEO: Recovering data is problem No. 1
all »
blogs
  • Hu Yoshida - This down turn requires a focus on ROA
  • NetApp - Dave's Blog - Lessons from the Last Crash
  • DrunkenData.com - XAM-it!
  • Storagezilla - Something on Rainfinity (And it's creator)
  • Paul Gillin's Blog - Can You Hear Me Now?
all »
companies
  • Sun
  • NetApp
  • LSI
  • STEC inc
  • EMC
  • XIV
all »
Want a Wikibon
Peer Incite
newsletter?

Email: Privacy by Safe Subscribe
Storage Spectrum
Order Storage Spectrum
By Fred Moore
US & Canada Only!
Browse best practices . publish tips . access project tools . collaborate with peers . get help on RFP's . use privacy settings to control who sees your info . join a group and share experiences with colleagues . review case studies . read professional alerts
  • Cloud Computing
    Clustered storage, Storage services, WEB2.0
  • Companies
    3PAR, Compellent, Dell, EMC, EqualLogic, HP, Hitachi, IBM, LSI, LeftHand Networks, NetApp, STEC inc, Sun, XIV
  • Data Protection
    Backup and restore, Business compliance, CDP, Data deduplication, Storage disaster recovery, Storage security
  • Energy Efficiency
    Data deduplication, Green storage, MAID, Thin provisioning, Tiered storage, VMware, Virtual tape
  • Planning Design Implementation Management
    Backup and restore, Business compliance, Data classification, Green storage, Managing storage, ROI, SRM, Storage Design, Storage asset management, Storage capacity management, Storage capacity planning, Storage implementation, Storage management, Storage operations, Storage planning, Storage vendor management, Tiered storage
  • Storage networks
    Clustered storage, ISCSI, NAS, SAN, SRM, Storage consolidation, Tiered storage, VMware
  • Virtualization
    Clustered storage, Green storage, Storage consolidation, Storage virtualization, Thin provisioning, VMware, Virtual tape
© Wikibon 2008 About Wikibon l Contacts l Terms of Service l Disclaimers l Privacy l Help