For the moment, the world’s largest data repositories are measured in multiple petabytes which have already strained the limits of traditional IT architectures and infrastructures. The move to hyperscale computing, advanced by the likes of Amazon, Facebook and Google, promises to change the way large IT shops and cloud service providers manage and deliver their data in the near future when exabytes of data will need to be served up to users or customers in a fraction of a second.
Contents |
Need for Speed
The cost of commodity servers and storage are continually dropping. However, demand for additional compute power and, in particular, high performance storage along with operational expenses is outpacing any savings advantage. But the bigger issue is speed or time to first bid or byte.
Amazon estimates that just a one second delay in page-load can cause a 7% loss in customer conversions. Put another way, Amazon estimated that every 100 milliseconds of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues if their electronic trading platform is 5 milliseconds behind the competition.
Meanwhile, backup windows are shrinking – or no longer exist – in a world that demands 24x7 access to information and services not to mention backing up, replicating or mirroring several petabytes of data is not easy or cost effective with traditional approaches.
Information Dispersal Approach
During Wikibon’s Peer Incite focused on Commercial Applications and Hyper-Scale Storage, Russ Kennedy, Vice President of Product Strategy, Marketing and Customer Solutions for Cleversafe discussed the merits of their approach to economically managing very large object-stores.
Kennedy explained how Cleversafe’s software and appliances leverage existing storage assets whether in a single data center or geographically dispersed throughout the enterprise. “Storage utilizing Cleversafe technology is based on a simple or named object approach that efficiently stores billions of data objects in a single namespace and exposes the data through REST, a standard HTTP based protocol.”
Traditional storage file systems such as NAS expose data via NFS and CIFS protocols. This approach works well for provisioning space for individual users or calling up specific objects or documents but begins to run into performance bottlenecks when billions of objects are in the object store. While file system based storage is closely tied to location, object-based storage overcomes this limitation by decoupling data from its physical placement in the storage system.
Cleversafe also solves a well known big data reliability and scalability problem by pairing Hadoop MapReduce with its Dispersed Storage Network (dsNET) system on the same platform replacing the Hadoop Distributed File System (HDFS), which relies on 3 copies to protect data, with Information Dispersal Algorithms. In addition, according to Cleversafe CEO and President Chris Gladwin, “Current HDFS deployments utilize a single server for all metadata operations and the failure of a single node could render data inaccessible or permanently lost.”
Kennedy also referenced a Shutterfly case study where Cleversafe technology is helping to manage over 70 petabytes of information – so far. “Our solution is architected to handle Exabyte-scale object stores securing data in motion or at rest, and we can drive up to 90% of the cost out of storage needed in a traditional solution all while data and objects are always online, available and utilizing existing storage assets.”
Advantages of Object-Based Storage
Object storage as defined by Cleversafe would appear to have several advantages over traditional file-based approaches including:
Object-based Storage Implications
Assuming the Cleversafe approach - as well as similar approaches for deploying object-based storage from OpenStack Swift, Scality and Caringo among others – becomes pervasive, the implications would potentially include:
- Reduction or elimination of RAID, Replication, Backup for Petabyte-scale data stores
- Significant reduction in cost of traditional storage and storage admin costs
- Increased pressure on large enterprises to move to software-led hyperscale solutions
- Increased pressure on IT to cut costs and improve efficiencies or move to cloud
- Paradigm shift for hyperscale hardware providers to simplify offerings, go “barebones” obviating first generation value-added, purpose built hardware solutions
- Move to open compute architectures including Facebook OCP
- Increased service provider focus on industry specific, cloud-based offerings. Examples: FinQloud and Bloomberg Vault
- Managing and serving up metadata in the cloud for specific industries becomes an even more pervasive business model
Conclusion
Software-led hyperscale computing has the potential to revolutionize multi-petabyte-scale storage architectures for applications with billions of objects and may very well impact how servers, memory and storage are delivered at scale for cloud service providers and large enterprise data centers in the near term.
On the other hand, it would appear the impact on and implications for many traditional enterprise-class tier 1 apps including high performance trading systems, relational database solutions and other sub-petabyte-scale storage reliant solutions will be minimal for the foreseeable future. That is, until SSD flash memory and storage become so inexpensive and pervasive an enterprise can cost-effectively deploy multiple petabytes.
Action Item: Vendors need to determine the verticals in which they will compete and build an ecosystem within those verticals. Applications and data access patterns will determine what to do with the metadata. Many of the target customers may be in the business of marking up data. Industries such as Finance, Healthcare, Insurance, and Pharmaceuticals are already well down the path of marking up hyperscale data repositories. Storage vendors need to understand application requirements well enough to help the customer differentiate beyond what can be done using a more generic storage solution.
Footnotes: