On January 22, 2013, the Peer Incite gathered to discuss commercial applications and hyper-scale storage. Joining the community was Russ Kennedy, Vice President at Cleversafe, winner of the Wikibon CTO award for the Best Storage Technology Innovations of 2009.
Hyperscale storage was discussed previously in the July 26, 2011 Peer Incite: Cloud Archiving Forever without Losing a Bit, during which Justin Stottlemyer, the Storage Architect Fellow at Shutterfly, reviewed the drivers behind the company’s selection of Cleversafe. As companies such as Facebook, YouTube, and Shutterfly know only too well, the growth in the quantity and the increase in the quality of digital image content, both still images and video, are major drivers behind the continued growth in demand for hyperscale storage. At Shutterfly, the storage requirements have doubled in only 18 months, reaching 80 petabytes.
Demand for hyperscale storage is not confined to Web 2.0 social-media and photo-sharing sites, however. Hyperscale storage requirements can also be found within agencies that monitor weather conditions and forecast weather-related disasters, and governmental security and defense applications, including video surveillance and satellite imaging drive hyperscale storage requirements. Unfortunately, due to security concerns and compliance requirements, these applications and requirements are rarely discussed in the public domain. However, both social media and government security and defense applications are predictors of the coming demand for hyperscale storage in more traditional businesses.
Businesses have historically transmitted information in written form, such as emails and written documents, but increasingly are leveraging a combination of video and text to improve the communications and retention of information. Video has become a core component of the educational system, particularly for online corporate education and rapidly-growing online universities and trade schools. Even traditional universities with a large campus for in-person classrooms are migrating classes to video and making more of their classes available completely online through portals such as Coursera.
Multi-site collaboration has also become a core requirement for organizations with large product design and software application development projects. Tight deadlines and limitations in human capital availability make it necessary to collaborate across locations and timezones, while working on large repositories of design content or application code. Cloud-based service providers have begun to deliver these infrastructure on demand to support these follow-the-sun development requirements.
Once storage requirements reach hyperscale, however, traditional RAID-based and replication-protected storage architectures will become too expensive, too unmanageable, and too expensive, to meet the requirements of organizations. This has led Facebook to develop its own IT infrastructure stack, including storage, and to encourage others to join it in decreasing the cost of supporting and managing hyperscale compute and storage requirements through the Open Compute Project. Others have turned to companies such as Cleversafe to meet their requirements.
Forecasting the breaking point of traditional storage architectures may be more art than science, particularly when assessing the requirements in the multidimensional moving target of growing capacity, changes in data access density, increased write requirements, and limits on IT budgets. That said, the time to evaluate and prepare for transitioning to a new architecture for hyperscale storage requirements is best done well before the requirement has arrived, and if your organization is already managing petabytes of data, the best time to evaluate new architectures may already be in your rear-view mirror.
Hyperscale archival storage begets additional storage requirements with different performance demands. Particularly within the realm of digital image content, metadata is becoming increasingly important. From consumer focused, event-specific photo journals, such as memory albums, to news-driven content retrieval requests, and security and defense applications, second-order products are being built on top of massive archives. These applications increase the frequency and the variability of requirements to rapidly locate images and video. The need to search, sort, arrange, and aggregate content rapidly will drive both the demand for an increased volume of metadata and may demand separate architectures for managing data and metadata. For many hyperscale applications, large, read-intensive image and video archives are perfectly acceptable when delivering the first byte of data within a few 100s of a second, but the metadata search may need to be accomplished within a few milliseconds or even microseconds in the ad-tech industry, for example.
Action Item: There is little doubt that regardless of industry, modern enterprises will become increasingly dependent upon data-intensive applications for product development, customer acquisition, employee and customer education, and service delivery. Well before reaching hyperscale requirements, organizations should evaluate, test, and implement proof-of-concept installations to avoid crash-and-burn scenarios that result from waiting too long.
Footnotes: