Users with substantial volumes of unstructured electronically stored information (ESI), some measuring in the half petabyte range, uniformly voice dissatisfaction with their present archiving, discovery and management solutions in one or more key areas most notably Auto-Classification, Indexing and Scalability.
These first generation solutions were not architected to handle the ever greater amounts of unstructured data enterprises create causing all manner of performance issues or worse, corrupted indexes, data loss and the inability to find or categorize data. In addition, users are expecting a lot more from the Next Generation of ESI management and discovery solutions than just storing and managing a variety of ESI types or mailbox and storage management capabilities.
While FirstGen providers upgrade their functionality, acquire new products and capabilities or seek partnerships and alliances to alleviate their system’s shortcomings, a new crop of well funded, innovative start-ups is coming to market to address these major problems.
One such firm is Digital Reef Inc., which this week announced general availability for its “massively scalable unstructured data management platform.” Emerging from stealth mode after more than two years of development, CEO and founder Steve Akers and company can already point to customer successes with major corporations in several industries including high tech, legal services, management consulting, publishing and also government.. While the Digital Reef platform has implications for improving the management and disposition of email, the product “discovers” all unstructured data and ESI while building its index or “virtual” repository.
Approach
Digital Reef “crawls” through any device attached to the network exploding PSTs and collecting document “descriptions” and metadata. As Akers explained, customers came to Digital Reef after experiencing insurmountable index build and corruption issues, especially with data stores larger than ten terabytes, that their existing archiving solutions were unable to address. To alleviate this problem Digital Reef’s index is actually a collection of federated indexes that create smaller, completed pieces as it builds. This index represents roughly 25% of the total raw data volume.
While this approach creates incremental storage requirements, the benefit is a dramatically streamlined search and discovery capability. Like some other top unstructured data management solutions, Digital Reef obviates the need to store metadata or it’s index in an SQL or other relational database dramatically improving search times while claiming file ingestion speeds of up to 4 terabytes per day (raw) and 1-2 terabytes per day auto-classified.
The infrastructure support includes a multi-tenant, role based security model (no LDAP just yet) and uses a proprietary multi-tiered grid-like computing architecture Importantly, the technology maintains chain of custody, email threads and parent-child relationships. Initial customers have chosen to crawl backup servers to minimize the complexities associated with investigating desktops, laptops and other de-centralized devices. This approach assumes these distributed devices are backed up regularly and successfully.
Once the data is collected, auto-classification can begin with no need to pre-develop taxonomies and no limit to the number of categories. Digital Reef uses its own vectoring methodology to quickly search the virtual archive to analyze, classify and prepare data by content value or to prescribe retention and alerting policies. At this point the user still needs an existing solution to delete data but presumably with much greater confidence that the data is not needed. Once ESI is classified, different departments or individuals can create their own unique views of the same data.
The benefits to this approach include, filling holes in the eDiscovery process such as more precise culling and preparation of large data stores, capturing corporate knowledge more accurately and improving data risk management.
Futures
While Digital Reef has many additional enhancements and partnerships in the works, Akers believes SharePoint is a key component of their technology road map, especially since growing numbers of unstructured ESI are populating SharePoint Farms - not to mention other collaboration platforms. Also in an upcoming release the platform will allow for existing or “supervised” categories to be populated along with dynamically creating new ones.
Bottom line
Digital Reef has come to market with what appears to be a powerful, true auto-classification capability which compliments existing data management and discovery solutions. If early indications prove out, Digital Reef will provide a valuable, scalable tool for enterprises with very large volumes of ESI to manage.
Action Item: Enterprises with the need to index and classify large volumes of ESI for compliance, data risk management, knowledge re-use, storage management and high volumes of eDiscovery preparation activities should consider taking a deeper dive into the Digital Reef platform to fill known data classification and manipulation gaps in unstructured data management solutions.
Footnotes: