Originating Author: Fred Moore
This article is intended to provide storage managers guidelines to effectively categorize data and improve storage management. The article is written for storage managers, storage architects, database managers and IT personnel involved in storage planning and management.
This article will describe data classification, what it is, how classifying data improves storage mangement, how to classify information, the business impacts of classifying data and how ongoing data classification processes can best be managed.
In a study entitled "How Much Information? 2003" produced by the School of Information Management and Systems (SIMS) at the University of California, Berkeley it was estimated that 5 Exabytes of information was created in 2002 and produced in print, film, magnetic, and optical storage media. This figure represents about a doubling from 1999. In 2006, EMC commissioned a study by IDC entitled "The Expanding Digital Universe" which was intended to build upon the Berkeley work and estimated that 161 exabytes of digital information was created, captured and replicated.
These figures are astounding and bring up two questions:
- How can organizations classify information so that storage device characteristics can be matched with information requirements?
- How can data classification disciplines help organizations determine how much they should spend on storage?
What is Data Classification
Data and information classification refers to the policies and procedures by which stored data is categorized so the information can be accessed, updated, protected, recovered and managed more efficiently in accordance with specific application requirements.
How to Classify Data
Analysis by Horison Information Strategies indicates that data, information and applications can be classified in four primary ways based on criticality, according to their RTO (recovery time objective) and other measures as follows:
|Attributes||Mission-critical - 15%||Vital - 20%||Sensitive - 25%||Non-critical - 40%|
|Recovery Time Objective||immediate||seconds||minutes||hours,days|
Source: Horison Information Strategies
Classes of Data
Classifying data is becoming a critical IT activity for the purposes of implementing the optimal data solution to store and protect data throughout its lifetime. Developing a data classification methodology for a business involves establishing criteria for classes of data or application based on its value to the business. Four distinct levels of classifying data or applications are commonly used: mission-critical data, vital data, sensitive data and non-critical data. Determining these levels takes some cooperative effort within the business and when completed, enables the most cost-effective storage and data protection solutions to be implemented. Data classification levels also identify which backup and recovery or business resumption solution is best suited for each level to meet the RPO (Recovery Point Objective) and RTO (Recovery Time Objective) requirements. While very important, RTO & RPO are not the only parameters used to classify data. Other considerations include availability, length of data retention, service levels and performance requirements, and overall costs. The figure below illustrates an effective data classification model.
Here is a summary of each of the four data classification categories with a description of the attributes found in each:
Mission-critical data is used in the key business processes or customer facing applications and can account for as much as 15 percent of all data stored online and typically has very fast response time requirements. Mission-critical applications have a RTO (Recovery Time Objective)of one-minute or less, to immediately resume business after the disruption. Losing access to mission-critical data means a rapid loss of revenue, potential loss of customers and places the survival of the business at risk. Mirroring protects against device failures but not from data corruption, intrusion, human or software errors. Therefore, all mission-critical data that is mirrored should also have point-in-time copies that enable full recovery prior to the point in time of the corruption event. Mission-critical data is usually classified as company secret and some applications may be a candidate for encryption. Mission-critical data is normally backed up using integrated virtual tape libraries (disk arrays and tape libraries combined)or SATA-based disk arrays. Maintaining mirrored copies for non mission-critical data is extremely expensive.
Vital data accounts for about 20 percent of all data stored online; however, vital data doesnâ€™t require instantaneous recovery for the business to remain in operation. Vital data may be classified as company secret. Data recovery times, the RTO, ranging from a few minutes to an hour or more, are acceptable and vital data is normally backed up using integrated virtual tape libraries or SATA-based disk arrays. Mirroring is not normally required for vital data as techniques such as point-in-time copy, snapshot copy, CDP (Continuous Data Protection) and de-duplication are sufficient to meet the applicationâ€™s RTO while avoiding the additional hardware costs associated with disk mirroring.
Sensitive data accounts for about 25 percent of all data stored online. Recovery times the RTO, can take from several minutes to several hours without causing major operational or business impact. With sensitive data, alternative sources exist for accessing or reconstructing the data in case of data loss. The growing popularity of SATA-based disk subsystems for backup now provides viable and cost-effective technology options along with tape, which has historically been the primary choice for backup and recovery.
Non-critical data represents approximately 40 percent of all data stored online making it the largest classification category. Lost, corrupted or damaged non-critical data can be reconstructed with minimal effort, and acceptable recovery times can range from hours to several days since this data is not essential for business survival. Non-critical data may suddenly become valuable based on unknown circumstances however giving momentum to extending the useful lifecycle of data significantly. E-mail archives, legal records, medical information, scientific data, financial transactions, security data and fixed content often fit this profile. Most non-critical data is backed up to lower-cost storage solutions with tape being the most popular choice.
With the advent of more advanced storage management tools and information lifecycle management initiatives, the classification of data and information becomes critical to establish initial data placement and ongoing automated management. The ingredients for a successful data classification implementation include a policy-engine and a tiered storage hierarchy.
Tools, processes, experience and leadership are often lacking in many organizations to effectively classify data and to make potentially difficult choices. A policy-driven data classification approach provides an automated method to enforce the assignment of correct levels. Several companies are now delivering data classification tools for non-mainframe systems today and each should be reviewed to determine their product focus meets the business requirements. Mainframe computers using DF/SMS (Data Facility Storage Management System)have enjoyed an excellent data classification capability since 1988.
A clear owner for the data classification process greatly facilitates the effort, class assignment process and prioritization. Businesses can get bogged down in the assignment process if a clear leader isn't established.
Business Benefits of Classifying Data
Implementing a data classification scheme enables the optimal, most cost-effective storage hierarchy to be implemented while insuring the highest level of data availability that will meet the service levels of the business.
How to Ensure Data Classification Schema are Adopted
Ensuring that data classification schema is implemented requires storage administrators and end-users to agree on classification criteria. Policies can then be established to enforce the criteria on an ongoing basis.
Issues Covered in this Article
- How can an organization effectively and reliably classify data and information?
- What are the likely cost associated with data classification?
- What benefits will an organization see from better data and information classification?
- What technologies are available to help implement data classification?
- What common mistakes are made in undertaking data classification initiatives?
- What is the most reliable way to construct, implement and maintain an ongoing data and information classification discipline?
Here's text after classifications