Hello SourceOne, Good Bye EmailXtender

To use a term from Robin Harris and Chuck Hollis, I’ve been ‘squinting’ through the EMC Source One announcement. Michael Brown’s Chalk Talk on the architecture is worth a look to see what’s new here.

It looks to me like this announcement includes lots of catch up and plenty of vision with the implied promise that SourceOne will deliver. EMC’s done some good work at integrating multiple piece parts, but this still appears to be a shove everything in a central archive approach. And as my colleagues and I have been saying on Wikibon, this won’t solve the problem of managing information risk, which is the main driver of email archiving. Let’s face it, legal is steering this bus right now, not IT and while maybe you can take a centralized approach to solve email archiving problems, files and content distributed throughout the organization present more pressing challenges.

With this announcement, EMC is resurrecting its ILM vision, which is cool. I’ve always liked the concept but it’s never been actionable.  I’m not sure SourceOne makes it so but it’s a step in the right direction. What the SourceOne announcement seems to do is fix well-known problems with EmailXtender (by throwing it out) and introduce a new architecture that from what I can tell, breaks up the task of archiving emails and parses the work to different resources so the system can perform better and scale. SourceOne also uses concepts like stubbing, .PST ingestion, single instancing, de-dupe and the like. All good things but nothing really that radicallly new or exciting. So you’re left with a new architecture that coordinates work across multiple resource components but still appears to be a centralize-everything approach.

Symantec has predictably responded by soliciting EmailXtender customers to join the Enterprise Vault bandwagon. The open letter to EmailXtender users basically says – why shove everything into a new version 1.0 centralize everything archive from EMC when you can shove it all into a mature centralize everything architecture like EV from Symantec? My advice is there are better approaches coming so look before you leap. My colleague Gary MacFadden and I are pretty charged up about some new innovators like Digital Reef and Rational Retention that are taking what we see as more sensible approaches to unstructured ESI. Yes they’re newbies and don’t have the baggage but we think their auto-categorization, search and metadata approaches point the way of the future.

Missing in the SourceOne discussion are clear descriptions of things like auto-categorization and automated policy management and smokin’ enterprise search and the ability to manage unstructured content other than emails. And I’d like to see less talk about retention and more tools to get rid of stuff (GRS). I can’t tell if it’s in here. EMC certainly alludes to some of these capabilities but they don’t seem to be a centerpiece of this announcement.

Let’s look at it another way. For true ILM for Unstructured ESI you need four components with regard to content:

1.    Find it – search and categorize (in synch w/policies hopefully) – and create a structured metadata layer.
2.    Analyze it – i.e. leverage the content and metadata layer to gain insight.
3.    ‘Rule’ it – information policies applied to what can and can’t be done in an automated layer to direct an execution engine.
4.    Execute it – after finding and analyzing and understanding the policy you have to do something with it, like copy, delete, freeze, alert, etc.

This is by no means an architecture but it lays out the pieces that are important to ILM.

Typically vendors have taken the approach of jamming as much info as possible into a central repository to control it. It’s easier this way. Maybe this is possible for email (although what about email attachments saved locally?); but how do you do this for files which are distributed on laptops and desktops and blackberries and wikis? Centralizing those is not practical.

Even if you can control it, how do you automate ILM without auto-categorization? This is the only hope we have of catching up with volume growth. Auto-classification is the mainspring of scaling for the business.

Maybe I’m missing something obvious– if so, help me through the haze.

Share

, , , , , , , ,

  • dfloyer

    This is another entry into a crowded email archiving space. EMC has set up a services group with significant legal experience, and will probably provide a good point solution for legal and IT departments with immediate pain. However as you point out, the strategic challenge is to avoid the cost of legal review and the increased legal risk that runaway storage of data in general and email in particular is creating. One well known retailer is deleting all email records after ninety days. Another companies are deleting all .pst file more than 6 months old wherever they are in the organization. For many companies these could lower costs and risks.

  • versace

    We would all benefit from a debate on functionality — the functionality necessary to more effectively store, discover, and deliver electronic content (other functions, too), and then how this set of functionality might be referenced architecturally. An initial thought is that functionality can be grouped into services, or sets of services, including 1) policy, 2) configuration management, 3) content management, 4) security, 5) reporting/monitoring, and 6) physical infrastructure. And I suggest a focus on Unstructured Content, the fastest growing, uncontrolled form of content in the enterprise today.

    Today we have over retention rates approaching 50%, and duplication rates at 20:1. On the policy side, some of the largest enterprises are deleting ALL email after 90 days, while others haven't deleted a single email for 10+ years. And of course, a quick sweep of the tech market revealed 100+ vendors with solution claims. It is a crowded space, with many lookalikes. An reference architecture may help sort through the madness.

  • versace

    We would all benefit from a debate on functionality — the functionality necessary to more effectively store, discover, and deliver electronic content (other functions, too), and then how this set of functionality might be referenced architecturally. An initial thought is that functionality can be grouped into services, or sets of services, including 1) policy, 2) configuration management, 3) content management, 4) security, 5) reporting/monitoring, and 6) physical infrastructure. And I suggest a focus on Unstructured Content, the fastest growing, uncontrolled form of content in the enterprise today.

    Today we have over retention rates approaching 50%, and duplication rates at 20:1. On the policy side, some of the largest enterprises are deleting ALL email after 90 days, while others haven't deleted a single email for 10+ years. And of course, a quick sweep of the tech market revealed 100+ vendors with solution claims. It is a crowded space, with many lookalikes. An reference architecture may help sort through the madness.

  • Pingback: Can EMC Remain Independent? - Wikibon Blog