Classifying Data

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Classifying data

Currently 4.5/5 Stars.
1
2
3
4
5

rate this

Last Update: Jun 15, 2009 | 02:03

Viewed 129818 times | Community Rating: 4.5

Originating Author: David Vellante

Originating Author: Fred Moore

This article is intended to provide storage managers guidelines to effectively categorize data and improve storage management. The article is written for storage managers, storage architects, database managers and IT personnel involved in storage planning and management.

This article will describe data classification, what it is, how classifying data improves storage mangement, how to classify information, the business impacts of classifying data and how ongoing data classification processes can best be managed.

1 Summary
2 What is Data Classification
3 How to Classify Data
4 How to Ensure Data Classification Schema are Adopted
5 Issues Covered in this Article

Summary

In a study entitled "How Much Information? 2003" produced by the School of Information Management and Systems (SIMS) at the University of California, Berkeley it was estimated that 5 Exabytes of information was created in 2002 and produced in print, film, magnetic, and optical storage media. This figure represents about a doubling from 1999. In 2006, EMC commissioned a study by IDC entitled "The Expanding Digital Universe" which was intended to build upon the Berkeley work and estimated that 161 exabytes of digital information was created, captured and replicated.

These figures are astounding and bring up two questions:

How can organizations classify information so that storage device characteristics can be matched with information requirements?
How can data classification disciplines help organizations determine how much they should spend on storage?

What is Data Classification

Data and information classification refers to the policies and procedures by which stored data is categorized so the information can be accessed, updated, protected, recovered and managed more efficiently in accordance with specific application requirements.

How to Classify Data

Analysis by Horison Information Strategies indicates that data, information and applications can be classified in four primary ways based on criticality, according to their RTO (recovery time objective) and other measures as follows:

**Data Classification Estimates**
Attributes	Mission-critical - 15%	Vital - 20%	Sensitive - 25%	Non-critical - 40%
Recovery Time Objective	immediate	seconds	minutes	hours,days
Availability index	99.999+	99.99	99.9	<99.0
Retention period	hours	days	years	infinite

Source: Horison Information Strategies

Classes of Data

Classifying data is becoming a critical IT activity for the purposes of implementing the optimal data solution to store and protect data throughout its lifetime. Developing a data classification methodology for a business involves establishing criteria for classes of data or application based on its value to the business. Four distinct levels of classifying data or applications are commonly used: mission-critical data, vital data, sensitive data and non-critical data. Determining these levels takes some cooperative effort within the business and when completed, enables the most cost-effective storage and data protection solutions to be implemented. Data classification levels also identify which backup and recovery or business resumption solution is best suited for each level to meet the RPO (Recovery Point Objective) and RTO (Recovery Time Objective) requirements. While very important, RTO & RPO are not the only parameters used to classify data. Other considerations include availability, length of data retention, service levels and performance requirements, and overall costs. The figure below illustrates an effective data classification model.

Data Classification Model

Here is a summary of each of the four data classification categories with a description of the attributes found in each:

Mission-critical data

Mission-critical data is used in the key business processes or customer facing applications and can account for as much as 15 percent of all data stored online and typically has very fast response time requirements. Mission-critical applications have a RTO (Recovery Time Objective)of one-minute or less, to immediately resume business after the disruption. Losing access to mission-critical data means a rapid loss of revenue, potential loss of customers and places the survival of the business at risk. Mirroring protects against device failures but not from data corruption, intrusion, human or software errors. Therefore, all mission-critical data that is mirrored should also have point-in-time copies that enable full recovery prior to the point in time of the corruption event. Mission-critical data is usually classified as company secret and some applications may be a candidate for encryption. Mission-critical data is normally backed up using integrated virtual tape libraries (disk arrays and tape libraries combined)or SATA-based disk arrays. Maintaining mirrored copies for non mission-critical data is extremely expensive.

Vital data

Vital data accounts for about 20 percent of all data stored online; however, vital data doesn’t require instantaneous recovery for the business to remain in operation. Vital data may be classified as company secret. Data recovery times, the RTO, ranging from a few minutes to an hour or more, are acceptable and vital data is normally backed up using integrated virtual tape libraries or SATA-based disk arrays. Mirroring is not normally required for vital data as techniques such as point-in-time copy, snapshot copy, CDP (Continuous Data Protection) and de-duplication are sufficient to meet the application’s RTO while avoiding the additional hardware costs associated with disk mirroring.

Sensitive data

Sensitive data accounts for about 25 percent of all data stored online. Recovery times the RTO, can take from several minutes to several hours without causing major operational or business impact. With sensitive data, alternative sources exist for accessing or reconstructing the data in case of data loss. The growing popularity of SATA-based disk subsystems for backup now provides viable and cost-effective technology options along with tape, which has historically been the primary choice for backup and recovery.

Non-critical data

Non-critical data represents approximately 40 percent of all data stored online making it the largest classification category. Lost, corrupted or damaged non-critical data can be reconstructed with minimal effort, and acceptable recovery times can range from hours to several days since this data is not essential for business survival. Non-critical data may suddenly become valuable based on unknown circumstances however giving momentum to extending the useful lifecycle of data significantly. E-mail archives, legal records, medical information, scientific data, financial transactions, security data and fixed content often fit this profile. Most non-critical data is backed up to lower-cost storage solutions with tape being the most popular choice.

Technology Dependencies

With the advent of more advanced storage management tools and information lifecycle management initiatives, the classification of data and information becomes critical to establish initial data placement and ongoing automated management. The ingredients for a successful data classification implementation include a policy-engine and a tiered storage hierarchy.

Skills Dependencies

Tools, processes, experience and leadership are often lacking in many organizations to effectively classify data and to make potentially difficult choices. A policy-driven data classification approach provides an automated method to enforce the assignment of correct levels. Several companies are now delivering data classification tools for non-mainframe systems today and each should be reviewed to determine their product focus meets the business requirements. Mainframe computers using DF/SMS (Data Facility Storage Management System)have enjoyed an excellent data classification capability since 1988.

Organizational Dependencies

A clear owner for the data classification process greatly facilitates the effort, class assignment process and prioritization. Businesses can get bogged down in the assignment process if a clear leader isn't established.

Business Benefits of Classifying Data

Implementing a data classification scheme enables the optimal, most cost-effective storage hierarchy to be implemented while insuring the highest level of data availability that will meet the service levels of the business.

How to Ensure Data Classification Schema are Adopted

Ensuring that data classification schema is implemented requires storage administrators and end-users to agree on classification criteria. Policies can then be established to enforce the criteria on an ongoing basis.

Issues Covered in this Article

How can an organization effectively and reliably classify data and information?
What are the likely cost associated with data classification?
What benefits will an organization see from better data and information classification?
What technologies are available to help implement data classification?
What common mistakes are made in undertaking data classification initiatives?
What is the most reliable way to construct, implement and maintain an ongoing data and information classification discipline?

Here's text after classifications

Comments on 'Classifying data'

Very nice discussion of data classification schema from a storage viewpoint. It would, of course, be nice if that same classification schema can be applied to data security, and the terminology used in this piece imply that it can. Actually, however, this is not always the case. The data that must be most available is not always the most sensitive. The obvious example is the famous case of the formula for Coca Cola. This is the single most valuable piece of data that the company owns. As a result, there is only one copy, written on paper, and kept in a bank fault in Atlanta. Obviously this does not make it very accessible, but that, after all, is the idea of security.
A more common example is transactional data. This typically is Tier 0 or at least Tier 1A data from a storage viewpoint, and it also is highly sensitive data from the security standpoint. But it is often a poor candidate for encryption simply because the encryption and decryption of this data, particularly if there is a very large data flow and if the data needs to be accessed often and quickly, can seriously slow down the process first of writing it to storage and then of accessing it.
Thus unfortunately data often has to be classified twice: once for storage tiering and again for security purposes.

Posted By:Bert Latamore| Fri Sep 26, 2008 12:10

Post A Comment

You must be logged in to post a comment, please Sign in

Revision ID	Author	Timestamp	Comment
23157	Wikibon Daemon	09 Jun 15 14:03:59
13702	Dab4168	08 Feb 15 15:02:30	Removed Category:Author Fmoore
13414	Wikibon Daemon	08 Feb 12 06:48:09	/* Classes of Data */
13413	66.189.93.106	08 Feb 12 06:43:17	/* Issues Covered in this Article (checklist) */
7962	68.189.241.40	07 Apr 04 21:17:05	/* Summary */
7961	68.189.241.40	07 Apr 04 21:15:55	/* Summary */
7960	68.189.241.40	07 Apr 04 21:00:11	/* Summary */
7278	Dvellante	07 Mar 06 22:49:30
6929	Dvellante	07 Feb 26 23:39:56
6480	Dvellante	07 Feb 19 22:54:59
5846	66.202.41.205	07 Feb 07 10:21:30
4681	Mrgood	07 Jan 04 14:12:43
4609	Dvellante	07 Jan 03 22:59:59
2821	Dab4168	06 Nov 28 12:09:26	/* What is Data Classification */ grammer
2820	Dab4168	06 Nov 28 12:06:10	/* How to Ensure Data Classification Schema are Adopted */ grammer
2622	Dvellante	06 Nov 16 15:42:14
2621	152.163.100.11	06 Nov 16 15:41:49
2620	Dvellante	06 Nov 16 15:39:16
2534	68.189.241.40	06 Nov 15 20:33:31	edits
2329	David Floyer	06 Nov 06 14:50:47	Added Data Classification Model
2293	Dvellante	06 Nov 05 12:56:08	category
2242	Dvellante	06 Oct 31 13:15:51	/* Mission-critical data */
2241	Dvellante	06 Oct 31 13:14:08
2131	Mrgood	06 Oct 28 15:32:28
1945	Dab4168	06 Oct 27 11:57:40	/* Classes of Data */ typo fix
1944	Dab4168	06 Oct 27 11:42:57	/* How to Classify Data */ typo fix
1943	Dab4168	06 Oct 27 11:41:23	/* Summary */ typo fix
1942	Dab4168	06 Oct 27 10:36:38	/* Vital data */ typo fix
1925	Dvellante	06 Oct 26 18:00:35	[[Data classification]] moved to [[Classifying data]]: change to convention
1920	66.202.41.205	06 Oct 25 16:21:29	linked horison
1919	66.202.41.205	06 Oct 25 16:20:10	fixed ext link
1918	66.202.41.205	06 Oct 25 16:17:41	fixed ext. link
1883	PBurris	06 Oct 21 15:30:56
1802	PBurris	06 Oct 18 17:02:50
1771	204.56.33.246	06 Oct 18 14:30:13
1770	PBurris	06 Oct 18 14:22:26
1769	PBurris	06 Oct 18 14:21:58
1768	PBurris	06 Oct 18 14:21:09
1767	PBurris	06 Oct 18 14:18:16
1751	Fmoore	06 Oct 18 11:32:06
1726	Dvellante	06 Oct 12 17:32:59	/* How to Ensure Data Classification Schema are Adopted */
1725	Dvellante	06 Oct 12 17:32:45	/* Business Benefits of Classifying Data */
1724	Dvellante	06 Oct 12 17:32:34	/* Organizational Dependencies */
1723	Dvellante	06 Oct 12 17:31:40
1722	Dvellante	06 Oct 12 17:31:22
1721	Dvellante	06 Oct 12 17:30:17
1720	Dvellante	06 Oct 12 17:29:29	apply instructive article model
1709	DavidFloyer	06 Oct 12 14:04:13
1707	DavidFloyer	06 Oct 12 14:03:19
1705	DavidFloyer	06 Oct 12 13:59:23	RTO RPO Links
1684	75.20.217.246	06 Oct 11 14:17:38
1682	75.20.217.246	06 Oct 11 14:14:30
1681	75.20.217.246	06 Oct 11 14:13:54
1679	75.20.217.246	06 Oct 11 14:10:00
1678	75.20.217.246	06 Oct 11 13:58:30
1676	Dvellante	06 Oct 11 13:48:11
1675	Dvellante	06 Oct 11 13:47:25
1674	Dvellante	06 Oct 11 13:47:06
1673	Dvellante	06 Oct 11 13:43:36	add intro sentence
1672	Dvellante	06 Oct 11 13:24:40
1671	DavidFloyer	06 Oct 11 13:23:43	Added RPO
1670	Dvellante	06 Oct 11 13:22:46
1669	Dvellante	06 Oct 11 13:20:23	edit df's edits
1668	Dvellante	06 Oct 11 13:17:25
1667	Dvellante	06 Oct 11 13:17:00
1666	Dvellante	06 Oct 11 13:15:50
1665	Dvellante	06 Oct 11 13:14:30
1664	Dvellante	06 Oct 11 13:13:38	add sub headings under 'data classification'
1663	Dvellante	06 Oct 11 13:11:22
1662	Dvellante	06 Oct 11 13:11:04
1661	Dvellante	06 Oct 11 13:10:42
1660	Dvellante	06 Oct 11 13:10:23
1659	Dvellante	06 Oct 11 13:09:55
1658	Dvellante	06 Oct 11 13:09:09	add table
1657	Dvellante	06 Oct 11 12:57:53
1656	Dvellante	06 Oct 11 12:56:54	/* Summary */
1655	Dvellante	06 Oct 11 12:56:31
1654	Dvellante	06 Oct 11 12:55:56
1653	Dvellante	06 Oct 11 12:55:21	add spending question
1652	Dvellante	06 Oct 11 12:37:45	add link to how much information 2003
1644	DavidFloyer	06 Oct 11 07:25:40	Expansion of questions
1612	Dvellante	06 Oct 09 23:47:10	[[Data classicfiction]] moved to [[Data classification]]: typo
1610	Dvellante	06 Oct 09 23:42:21	Stub for Fred Moore's article
1609	Dvellante	06 Oct 09 23:22:38
1608	Dvellante	06 Oct 09 23:22:08
1607	Dvellante	06 Oct 09 23:12:28
1606	Dvellante	06 Oct 09 23:04:40
1601	Dvellante	06 Oct 08 09:05:19

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.