Contents |
Data & Application Consolidation
The trend toward consolidation of big data is inexorable. NAND flash technology is moving data up from shared arrays towards the server layer. Flash-only storage is causing the cost of IOs and bandwidth to plummet, and data reduction technology is enabling flash storage to be or quickly become the low cost solution for all active data. The cost of moving data or accessing remote data comes proportionally much higher, making data consolidation strategies an imperative. Data consolidation will ensure that data processed together is physically close together. This will lead to different models of data consolidation, such as:
- Consolidation of data within an organization where possible;
- Consolidation of data in cloud services (e.g., log data shared between multiple organizations is significantly more valuable to mine than a single organization's log data);
- Consolidation by data aggregators that will buy data that “fits together” for different potential customers;
- Consolidation of data services in mega-datacenters, where organizations choose to co-locate close to other organizations with similar data requirements (for example sharing common supply chains) or physically close to cloud service providers. The Switch mega-datacenter is the largest co-locator that emphasizes this strategy.
The bottom line is that where possible data should be consolidated, as creating metadata, processing, and extracting value will be simpler and less expensive. The trend against this will be driven by applications that need to provide very low latencies to their users or customers (e.g., stock exchanges for supporting financial trading organizations).
Constraints of Traditional Storage & File Systems
Traditional IO is an expensive and slow commodity. Traditional block-and-file storage systems use SCSI protocols designed half a century ago for the multi-millisecond response times of very slow and cumbersome hard disks.
Modern file storage systems (e.g., DDN’s Exascale, EMC’s Isilon, IBM’s SONAS, HP’s IBRIX) have scaled out hold the metadata centrally and can support multiple millions of files. But as big data grows into billions and trillions of consolidated records, the overheads and internal data management become an inhibitor to scale.
Object Storage Systems
An object storage system is simply a collection of objects containing data and metadata. Individual objects in an object storage system are accessed by a unique global handle. The handle does not include information about the object. An object system is flat; there is no hierarchy, and all the objects are at the same level (e.g., objects cannot have other objects inside them). Objects can be different sizes and are accessed by simple verbs such as "get", "put", and "delete".
Object storage systems are simple and independent of the storage hardware technology. EMC's Centera archive system was based on object technology. The first very large object storage system was Amazon’s S3 cloud service, which made it extremely simple to purchase administer and manage. The largest object storage systems are in government and have trillions of objects.
Object storage systems are highly scalable and very easy to manage, but the very simplicity usually means poor performance compared with block or file systems. This has meant that object storage systems were limited to applications where data was written once and not accessed often, such as archive systems.
DDN’s WOS 2.5 Object Storage System
DDN’s WOS system has three main characteristics that allow much improved performance and resilience:
- The 232-bit object handle also determines the location where the object is stored. This allows the object to be located or written with a single seek (or equivalent for flash or other storage types).
- The metadata is stored in the object. This eliminates the need for a separate management system and allows the metadata to be copied to cache at write time if necessary.
- WOS 2.5 uses replicated erasure coding to provide synchronous or asynchronous replication at the software system level. The erase coding allows recovery entirely from local data in the case of (say) local media failure of a disk, and full automatic instantaneous “failover” to the second copy in the case of a disaster. The failover is in quotes because the object handle determines the location of both copies of the data, and the choice of which source to use depends on the WOS parameters (e.g., latency). If one copy is no longer accessible, the system will automatically choose the alternative. There is no storage hardware functionality required to provide replication.
The benefit of these features is that the WOS system has the potential to achieve far higher performance levels than the previous object storage models.
Conclusions
DDN’s WOS 2.5 release allows object storage to have the scale and performance to be used for both traditional archiving applications and for Big Data Analytics. This is a significant step towards building very large scale integrated operational and analytical systems. These integrated systems will have a profound effect on improving business productivity and improving business cycle time.
Action Item: Object storage systems such as DDN's WOS 2.5 are potential enablers of new lower cost ways of writing, maintaining, and exploiting big data applications, and of creating software-led storage architectures. At the moment, WOS has unique features which enable both scale and performance of object storage. CTOs and CEOs should take the time to understand the WOS technology and where it could be applied.
Footnotes: