Not Logged In

You could:

Log in
Register

research notes
  • Wikitips
  • Professional Alerts
  • Case Studies
  • How-to Notes
  • Community Questions
research meetings
  • Peer Incite Podcasts
  • Peer Incite Archive
Events
  • Peer Incite meeting - Topic: Best practice in tape backup and recovery
    Oct 7, 12:00-1:00 PM
  • Computerworld: Storage Networking World
    Oct 12-15, 2008
  • Usenix on the Road: Next Generation Storage Networking - 1/2 Day Lecture at the University of North Carolina
    Oct 16, 12:30-4:00 PM
  • Usenix on the Road: Next Generation Storage Networking - 1/2 Day Lecture at Virginia Tech
    Oct 21, 1:30-5:00 PM
  • Usenix on the Road: Next Generation Storage Networking - 1/2 Day Lecture at the University of Maryland
    Oct 22, 9:00-1:00 PM

Announcements
  • 10-07-08 Peer Incite: Best practice in tape backup and recovery
  • IBM's stealth XIV announcement
  • Welcome to Wikibon 2.0!
  • The IBM XIV Storage System Model A14
  • Storage Customers Seeing Green with Conserve IT
Home Profile Peers Wiki Groups Feedback


  • Article
  • Comments (0)
  • Page Protected
  • History
  • Vault
Data Classification
  • Currently n/a/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
rate this
Last Update: Jun 04, 2008 | 09:49
Viewed 1630 times | Community Rating: n/a
Originating Author: David Butler

Storage Peer Incite: Notes from Wikibon’s March 27, 2007 Research Meeting

Dave Vellante presents Data classification: Brains or brawn? New business value drivers necessitate a break from historical methods of classifying data. Auto-classification is a key pre-requisite that must be designed into data architectures early in the process.

Contents

  • 1 Data classification: Brains or brawn?
  • 2 Data classification value transcends storage efficiencies
  • 3 Data classification: So much more than storage optimization
  • 4 Data classification: Managing metadata
  • 5 Auto-classification of metadata means truckloads of terabytes
  • 6 How much meta in the data

Data classification: Brains or brawn?

Dave Vellante and David Floyer

The current state of data classification is largely a byproduct of historical, hierarchical storage management (HSM) implementations where data age is the primary classification criterion. Early visions of classifying data based on business value never fully came to fruition because it required a manual, brute force approach and was too hard to automate. Age-based classification enabled automation processes to be more easily applied to data classification initiatives and became the de facto standard.

A new emphasis on compliance, discovery, archiving and provenance substantially challenges existing data classification taxonomies. New business value drivers include 'never delete' retention policies as well as performance, availability and recovery attributes which are the underpinning of resurgent data classification efforts. While generally age-based schema predominate, they must more aggressively incorporate richer classification attributes. However this extension should be accomplished with an eye toward automation where data set meta-data is auto-classified upon creation and/or use of the data set. Future data classification efforts will involve much broader perspectives and serve as the mainspring of multiple enterprise initiatives, including: ILM, tiered storage, email archiving, decision support, data mining, electronic content management and compliance. In short, data classification will serve as the foundation for information value management and while the manual development of business categories is always necessary, without auto-classification there is no chance of success.

Action Item: IT organizations must break with the past and make business process, not age of data sets the defining catalyst for classification schema. This approach will not scale without auto-classification capabilities that assign meta-data to data sets at the point of creation or use. Emerging tagging methodologies borrowed from social networking may provide a complementary user-driven approach, but these will not suffice for compliance and legal requirements.

Data classification value transcends storage efficiencies

Dave Vellante

Traditional drivers of data classification from a storage point of view have been to improve efficiencies, map data and device characteristics and better serve application users. More than ever, with compliance, legal discovery and audit initiatives influencing corporate agendas a new value proposition is emerging where classification can enable the reconstruction of a continuum of organizational activities performed and decisions made over a period of time.

What this means is that the traditional reliance on a 'corporate memory' to piece together a series of events, or conduct a cumbersome discovery has the potential to be supplanted by a much more reliable and auditable system of infrastructure, meta-data, applications and business processes. To be sure, the justification, internal arm-twisting and development of this capability will not be trivial; however the technologies, regulatory imperatives and competitive pressures are coming together in a sort of perfect storm scenario that will dictate investment in this area for the next several years. At the heart of this opportunity is the automatic creation of classification meta-data and the enticement of users to provide meaningful input into the process.

Action Item: IT must sell the vision of how enabling automation of meta-data will drive huge improvements in productivity and facilitate the exploitation of untapped corporate knowledge. Application owners must be persuaded to develop meta-data creation function and supporting architectures. Finally, meta-data creation must be simplified in order for end users to participate in the process and add incremental value.

Data classification: So much more than storage optimization

David Floyer

Storage executives have traditionally been responsible for data classification implementations. Data classification is a fundamental building block for effective ILM and archiving initiatives and the potential benefits to the organization go far beyond storage optimization. However out-of-scope organizational requirements can disrupt the initial objectives of data classification projects and managers must be extra careful of scope creep.

In order for IT to implement a full data classification architecture, detailed assessments will be needed with legal, audit, risk management, business lines, architects regarding metadata architecture, application developers and owners to determine metadata automation requirements and operations professionals. In the meantime, storage executives need to limit the scope of any data classification project to what can be achieved in the immediate term.

Action item: Executives responsible for storage must keep data classification schema simple and limited to data that is system generated (e.g. date of creation and last use). While necessary, expanding the scope of classification efforts should not proceed until data classification schema are defined and automated methods of generation are in place. Relying on any manual entry of classification information will doom data classification projects to failure.

Data classification: Managing metadata

David Floyer

Metadata is data about data, and enables data management. Provenance and respect for order are guiding principles for data management. Metadata includes when data was created, who and/or what created it, where the data was used, and when it was destroyed. We need to be confident that the data was not changed without record. Metadata is a key enabler for data classification.

Applications and users create data, and should create the metadata at the time of creation or use. Metadata is additive in nature, and does not need a single point of control. Operating systems, applications, system management software, databases, storage management software and storage hardware are all important contributors to the creation and storage of metadata. The creation of metadata has to be automated for applications, and made as simple as possible for end-users.

Action item: The key imperative for enabling data classification is automation of the creation of metadata. The first and most important step is to agree metadata types, and the layout and structure of each type of metadata.

Auto-classification of metadata means truckloads of terabytes

Dave Vellante

Automatic data classification at the time of data set creation should have vendors salivating. This is because the amount of storage created will easily be twice the amount of core information captured (consider all the meta-data associated with a bounced email). Perhaps more importantly, auto-classification will remove a barrier to projects related to ILM, tiered storage, electronic content management, email archiving, etc. Savvy buyers will not disrupt progress due to the added storage expense but in order to capitalize, storage vendors must enable a new class of application that will exploit classification meta-data. This means suppliers must re-tool technology portfolios to provide solutions that perform function such as the following:

  • Organize and classify meta-data
  • Enable auto-classification
  • Exploit meta-data using file system, index and search functionality
  • Accommodate classification meta-data tables in high speed cache
  • Provide high performance data movement
  • Enable asset discovery and meta-data analysis
  • Secure and encrypt meta-data

Action Item: Key strategic initiatives supporting corporate and regulatory mandates will not go unfunded. Vendors that address the growing problems presented by the lack of solutions to automatically create classification metadata will reap the greatest rewards. Developed solutions must be ecosystem friendly with published entries and exits into key technology components that entice and facilitate partnerships.

How much meta in the data

Peter Burris

As storage teams expand use of data-driven storage technologies (e.g., virtualization, tiered storage), pressures to formulate comprehensive meta data strategies at the level of storage increase dramatically. Moreover, meta data from technologies like storage virtualization, which intervene between applications and previously dedicated pools of storage, may even supersede many classes of application-level metadata. Traditionally, storage administrators have created modest amounts of meta data, usually dictated by device types or formats. However, piecemeal approaches to creating, manipulating, and using meta data will not work where enormous volumes of data have to be logically integrated for storage automation to operate. The good news is that conventions, methods, and tools for managing meta data are very mature. The bad news is storage professionals typically know nothing more about them than how much storage they require.

Action Item: Storage professionals must begin formulating realistic storage meta data strategies that work in concert with enterprise meta data approaches, borrowing knowledge, tools, and methods to rapidly prepare for adoption of emerging data-driven automation technologies.

categories
Business compliance, Capacity management, Data classification, ECM, Email archiving, Managing storage
Contributors

Dvellante

Comments (0)
Comments on 'Data Classification'
There are currently no comments. Be the first!
Post A Comment

You must be logged in to post a comment, please Sign in

Revision ID Author Timestamp Comment
15830 Dab4168 08 Jun 04 09:49:17 misc
15829 Dvellante 08 Jun 03 21:32:37 /* [[Data classification: Brains or brawn?]] */
14757 Dab4168 08 Mar 19 17:37:31 Added categories
8400 Dab4168 07 May 08 13:18:18 Corrected date
8321 Dab4168 07 May 02 20:08:42 Protected "[[Data Classification]]": Content on other pages [edit=sysop:move=sysop]
8320 Dab4168 07 May 02 20:07:43 misc
8318 Dab4168 07 May 02 19:57:09 Misc
8317 Dab4168 07 May 02 19:52:58 Created page

Search:

news feed
  • Latest from Computerworld - Game economy grows with micropayments
  • eWeek - RSS Feeds - 5 Technology Businesses Poised to Boom in the Financial Crisis
  • InfoWorld RSS Feed - Microsoft lays out SQL Server roadmap
  • SearchStorage: News and trends in the storage industry - F5 Networks adds 10 GigE to ARX file virtualization product
  • Byte and Switch: - F5 Enhances File Virtualization Storage, Management
all »
blogs
  • Storagezilla - Sun batter NetApp in court
  • DrunkenData.com - Market Woes
  • StorageMojo - 3.5″ drives: the end is near
  • StorageRap - Mashup in blogland - will there be a future feeding franzy in 09?
  • Chuck's Blog - Virtual IT: A Frictionless World?
all »
companies
  • Dell
  • STEC inc
  • IBM
  • Sun
  • Compellent
  • XIV
all »
Want a Wikibon
Peer Incite
newsletter?

Email: Privacy by Safe Subscribe
Storage Spectrum
Order Storage Spectrum
By Fred Moore
US & Canada Only!
Browse best practices . publish tips . access project tools . collaborate with peers . get help on RFP's . use privacy settings to control who sees your info . join a group and share experiences with colleagues . review case studies . read professional alerts
  • Cloud Computing
    Clustered storage, Storage services, WEB2.0
  • Companies
    3PAR, Compellent, Dell, EMC, EqualLogic, HP, Hitachi, IBM, LSI, LeftHand Networks, NetApp, STEC inc, Sun, XIV
  • Data Protection
    Backup and restore, Business compliance, CDP, Data deduplication, Storage disaster recovery, Storage security
  • Energy Efficiency
    Data deduplication, Green storage, MAID, Thin provisioning, Tiered storage, VMware, Virtual tape
  • Planning Design Implementation Management
    Backup and restore, Business compliance, Data classification, Green storage, Managing storage, ROI, SRM, Storage Design, Storage asset management, Storage capacity management, Storage capacity planning, Storage implementation, Storage management, Storage operations, Storage planning, Storage vendor management, Tiered storage
  • Storage networks
    Clustered storage, ISCSI, NAS, SAN, SRM, Storage consolidation, Tiered storage, VMware
  • Virtualization
    Clustered storage, Green storage, Storage consolidation, Storage virtualization, Thin provisioning, VMware, Virtual tape
© Wikibon 2008 About Wikibon l Contacts l Terms of Service l Disclaimers l Privacy l Help