How big data analytics impose huge challenges for storage professionals and the keys for preparing for the future
With contributions from David Floyer
The cumulative effect of decades of IT infrastructure investment around a diverse set of technologies and processes has stifled innovation at organizations around the globe. Layer upon layer of complexity to accommodate a staggering array of applications has created hardened processes that make changes to systems difficult and cumbersome.
The result has been an escalation of labor costs over the years to support this complexity. Ironically, computers are supposed to automate manual tasks, but the statistics show some alarming data that flies in the face of this industry promise. In particular, the percent of spending for both internal and outsourced IT staff has exploded over the past 15 years. According to Wikibon estimates, of the $250B spent on server-and storage-related hardware and staffing costs last year, nearly 60% was spent on labor. IDC figures provide further evidence of this trend. The research firm’s forecasts are even more aggressive than Wikibon’s, with estimates that suggest labor costs will approach 70% by 2013 (see Figure 1 below).
The situation is untenable for most IT organizations and is compounded by the explosion of data. Marketers often cite Gartner’s three V’s of Big Data –- volume, velocity, and variety -- that refer respectively to data growth, the speed at which organizations are ingesting data, and the diversity in data texture (e.g. structured, unstructured, video, etc.) There is a fourth V that is often overlooked: Value.
WikiTrend: By 2015, the majority of IT organizations will come to the realization that big data analytics is tipping the scales and making information a source of competitive value that can be monetized and not just a liability that needs to be managed. Those organizations which cannot capitalize on data as an opportunity, risk losing marketshare.
From an infrastructure standpoint, Wikibon sees five keys to achieving this vision:
- Simplifying IT infrastructure through tighter integration across the hardware stack;
- Creating end-to-end virtualization beyond servers into networks, storage, and applications;
- Exploiting flash and managing a changing hardware stack by intelligently matching data and media characteristics;
- Containing data growth by making storage optimization a fundamental capability of the system,
- Developing a service orientation by automating business and IT processes through infrastructure that can support applications across the portfolio, versus within a silo, and provide infrastructure-as-a-service that is “application aware.”
This research note is the latest in a series of efforts to aggregate the experiences of users within the Wikibon community and put forth a vision for the future of infrastructure management.
Contents |
The IT Labor Problem
The trend toward IT consumerization, led by Web giants servicing millions of users, often with a single or very few applications, has ushered in a new sense of urgency for IT organizations. C-level and business line executives have far better experiences with Web apps from Google, Facebook, and Zynga than with their internal IT systems as these services have become the poster children of simplicity, rapid change, speed, and a great user experience.
In an effort to simplify IT and reduce costs, traditional IT organizations have aggressively adopted server virtualization and built private clouds. Yet relative to the Web leaders, most IT organizations are still far behind the Internet innovators. The reasons are quite obvious as large Web properties had the luxury of starting with a clean sheet of paper and have installed highly homogeneous infrastructure built for scale.
Both vendor and user communities are fond of citing statistics that 70% of IT spending is allocated to “Running the Business” while only 30% goes toward growth and innovation. Why is this? The answer can be found by observing IT labor costs over time.
Data derived from researcher IDC (see Figure 1) shows that in 1996 around $30B was spent on IT infrastructure labor costs, which at the time represented only about 30% of total infrastructure costs. By next year, the data says that more than $170B will be spent on managing infrastructure (i.e. labor), which will account for nearly 70% of the total infrastructure costs (including capex and opex). This is a whopping 6X increase in labor costs while overall spending has only increased 2.5X in those 15+ years.
What does this data tell us? It says we live in a labor-intensive IT economy and something has to change. The reality is IT investments primarily go toward labor and this labor-intensity is slowing down innovation. This trend is a primary reason that IT is not keeping pace with business today — it simply doesn’t have the economic model to respond quickly at scale. In order for customers to go in new directions and break this gridlock, vendors must address the REAL cost of computing, people.
The answer is one part technology, one part people, and one part process. Virtualization/cloud is the dominant technology trend and we live in a world where IT infrastructure and applications, and the security that protects data sources, are viewed as virtual, not physical entities. The other three dominant technology themes reported by Wikibon community practitioners are:
- A move toward pre-engineered and integrated systems (aka converged infrastructure) that eliminate or at least reduce mundane tasks such as patch management;
- Much more aggressive adoption of virtualization beyond servers;
- A flash-oriented storage hierarchy that exploits automated operations and a reduction in the manual movement of data — i.e. “smarter systems” that are both automated and application aware -- meaning infrastructure can support applications across the portfolio and adjust based on quality of service requirements and policy;
- Products that are inherently efficient and make data reduction features like compression and de-duplication a fundamental capabilities, not optional add-ons, along with new media such as flash and the ability to automate management of the storage infrastructure.
From a people standpoint, organizations are updating skills and training people in emerging disciplines including data science, devops (the intersection of application development and infrastructure operations), and other emerging fields that will enable the monetization of data and deliver hyper increases in productivity.
The goal is that the combination of improved technologies and people skills will lead to new processes that begin to reshape decades of complexity and deliver a much more streamlined set of services that are cloud-like and services-oriented.
The hard reality is that this is a difficult task for most organizations, and an intelligent mix of internal innovation with external sourcing will be required to meet these objectives and close the gap with the Web giants and emerging cloud service providers.
New Models of Infrastructure Management
IT infrastructure management is changing to keep pace as new models challenge existing management practices. Traditional approaches use purpose-built configurations that meet specific application performance, resilience, and space requirements. These are proving wasteful, as infrastructure is often over-provisioned and underutilized.
The transformative model is to build flexible, self-administered services from industry-standard components that can be shared and deployed on an as-needed basis, with usage levels adjusted up or down according to business need. These IT services building blocks can come as services from public cloud and SaaS providers, as services provided by the IT department (private clouds), or increasingly as hybrids between private and public infrastructure.
Efforts by most IT organizations to self-assemble this infrastructure have led to a repeat of current problems, namely that the specification and maintenance of all the parts requires significant staff overhead to build and service the infrastructure. Increasingly vendors are providing a complete stack of components, including compute, storage, networking, operating system, and infrastructure management software.
Creating and maintaining such a stack is not a trivial task. It will not be sufficient for vendors or systems integrators to create a marketing or sales bundle of component parts and then hand over the maintenance to the IT department; the savings from such a model are minimal over traditional approaches. The stack must be completely integrated, tested, and maintained by the supplier as a single SKU, or as a well-documented solution with codified best practices that can be applied for virtually any application. The resultant stack has to be simple enough that a single IT group can completely manage the system and resolve virtually any issue on its own.
Equally important, the cost of the stack must be reasonable and must scale out efficiently. Service providers are effectively using open-source software and focused specialist skills to decrease the cost of their services. Internal IT will not be able to compete with services providers if their software costs are out of line.
The risk to this integrated approach according to members of the Wikibon practitioner community is lock-in. Buyers are concerned that sellers will, over time, gain pricing power and return to the days of mainframe-like economics. This concern has merit. Sellers of converged systems today are providing large incentives to buyers in the form of aggressive pricing and white glove service in an effort to maintain account control and essentially lock customers into their specific offering. The best advice is as follows:
- Consider converged infrastructure in situations where cloud-like services provide clear strategic advantage, and the value offsets the risk of lock-in down the road.
- Design processes so that data doesn’t become siloed. In other words, make sure your data can be migrated easily to other infrastructure.
- Don’t sole source. Many providers of integrated infrastructure have realized they must provide choice of various components such as hypervisor, network, and server. Keep your options open with a dual-sourcing strategy.
WikiTrend: Despite the risk of lock-in, by 2017, more than 60% infrastructure will be purchased as some type of integrated system, either as a single SKU or a pre-tested reference architecture.
The goal of installing integrated or converged infrastructure is to deliver a world without stovepipes where hardware and software can support applications across the portfolio. The tradeoff of this strategy is it lessens the benefits of tailor-made infrastructure that exactly meets the needs of an application. For the few applications that are critical to revenue generation, this will continue to be a viable model. However, Wikibon users indicate that 90% or more of the applications do not need a purpose-built approach, and Wikibon has used financial models to determine that a converged infrastructure environment will cut the operational costs by more than 50%.
The key to exploiting this model is tackling the 90% long tail of applications by aggregating common technology building blocks into a converged infrastructure. There are two major objectives in taking this approach:
- Drive down operational costs by using an integrated stack of hardware, operating systems, and middleware;
- Accelerate the deployment of applications
Virtualization: Moving Beyond Servers
Volume servers that came from the consumer space only had the capability of running one application per server. The result was servers that had very low utilization rates, usually well below 10%. Specialized servers that can run multiple applications can achieve higher utilization rates but at much higher system and software costs.
Hypervisors, such as VMware, Microsoft’s Hyper V, Xen and hypervisors from IBM and Oracle, have changed the equation. The hypervisors virtualize the system resources and allow them to be shared among multiple operating systems. Each operating system thinks that it has control of a complete hardware system, but the hypervisor is sharing those resources among them.
The result of this innovation is that volume servers can be driven to much higher utilization levels, thee-to-four times that of stand-alone systems. This makes low-cost volume servers that are derived directly from volume consumer products such as PCs much more attractive as a foundation for processing and much cheaper than specialized servers and mainframes. There will still be a place for very high-performance specialized servers for some applications such as certain performance-critical databases, but the volume will be much lower.
The impact of server virtualization on storage is profound. The I/O path to a server provides service to many different operating systems and applications. The result is that the access patterns as seen by the storage devices are much less predictable and more random. The impact of higher server utilization (and of multi-core processors) is that IO volumes (IOPS, IOs per second) will be much higher. Increasingly, few processor cycles will be available for housekeeping activities such as backup.
Server virtualization is changing the way that storage is allocated, monitored, and managed. Instead of defining LUNs and RAID levels, virtual systems are defining virtual disks and expect array information to reflect these virtual machines and virtual disks and the applications they are running. Storage virtualization engines are enabling the pooling of multiple heterogeneous arrays, providing both investment protection and flexibility for IT organizations with diverse asset bases. As well, virtualizing the storage layer dramatically simplifies storage provisioning and management, much in the same way that server virtualization attacked the problem of underutilized assets.
Conclusions for Storage
Storage arrays will have to serve much higher volumes of random read and write IOs with applications using multiple protocols. In addition, storage arrays will need to work across heterogeneous assets and virtualized systems and speak the language of virtualized administrators. Newer storage controllers (often implemented as virtual machines) are evolving that will completely hide the complexities of traditional storage (e.g., the LUNS and RAID structures) and be replaced with automated storage that is a virtual machine (VM) focused on providing the metrics that will enable virtual machine operators (e.g., VMware administrators) to monitor the performance, resource utilization, and service level agreement (SLA) at a business application level.
Storage networks will have to adapt to providing shared a transport for the different protocols. Adaptors and switches will increasingly use lossless Ethernet as the transport mechanism, with different protocols running underneath.
Backup processes will need to be re-architected and linked to the application versus a one-size-fits-all approach. Application consistent snaps and continuous backup processes are some of the technologies that will become increasingly important over time.
WikiTrend: Virtualization is moving beyond just servers and will impact the entire infrastructure stack, from storage, backup, networks, infrastructure management, and security. Overall, the strong trend towards a converged infrastructure, where storage function placement is more dynamic, being staged optimally in arrays, in virtual machines or in servers will necessitate and end-to-end and more intelligent management paradigm.
Flash Storage: Implications to the Stack
Consumers are happy to pay premiums for flash memory over the price of disk because of the convenience of flash. For example, the early iPods had disk drives but were replaced by flash because the device required very little battery power and had no moving parts. The results were much smaller iPods that would work for days without recharging and would work after being dropped. This led to huge consumer volume shipments and flash storage costs dropped dramatically.
In the data center, systems and operating system architectures have had to contend with the volatility of processors and high-speed RAM storage. If power was lost to the system, all data in flight was lost. The solutions were either to protect the processors and RAM with complicated and expensive battery backup systems or to write the data out to disk storage, which is non-volatile. The difference between the speed of disk drives (measured in milliseconds, 10-3) and processor speed (measured in nanoseconds, 10-9) is huge and is a major constraint on system speed. All systems wait for IO at the same speed. This is especially true for database systems.
Flash storage is much faster than disk drives (microseconds, 10-6) and is persistent -– when the power is removed the data is not lost. It can provide an additional memory level between disk drives and RAM. The impact of flash memory is also being seen in the iPad effect. The flash is always on, and the response time for applications compared with traditional PC systems in amazing. Applications are being rewritten to take advantage of this capability, and operating systems are being changed to take advantage of this additional layer. iPads and similar devices are forecast to have a major impact on portable PCs, and the technology transfer will have a major impact within the data center, both at the infrastructure level and in the design of all software.
IO Centric Processing: Big Data Goes Real-time
Wikibon has written extensively about the potential of flash to disrupt industries and designing systems and infrastructure in the Big Data IO Centric era. The model developed by Wikibon is shown in Figure 4.
The key to this capability is the ability to directly address the flash storage from the processor with lockable atomic writes, as explained in a previous Wikibon discussion on designing systems and infrastructure in the big data, IO-centric era. This technology has brought down the cost of IO-intensive systems by two orders or magnitude, 100 times, whereas the cost of hard disk-only solutions has remained constant. This trend will continue.
This technology removes the constraints of disk storage and allows the real-time parallel ingest of transactional, operational, and social media data streams and sufficient IO at low-enough cost that allows parallel processing of big data transactional systems at the same time performing big data indexing and metadata processing to drive big data analytics.
WikiTrend: Flash will enable changes in system and application design that are profound. Transactional systems will evolve, as flash architectures will remove locking constraints at the highest performance tier. Big Data analytics will be integrated with operational systems and Big Data streams will become direct inputs to applications people, devices and machines. Metadata extraction, index data and other summary data will become direct inputs to operational Big Data streams and enable more value to be derived at lower costs from archival and backup systems.
Conclusions for Storage
Flash will become a ubiquitous technology that will be used in processors as an additional memory level, in storage arrays as read/write “Flash cache”, and as a high-speed disk device. Systems management software will focus high IO “hot-spots” and low latency IO on flash technology and allow high-density disk drives to store the less active data.
Overall within the data center, flash storage will pull storage closer to the processor. Because of the heat density constraints mentioned above, it is much easier to put low power flash memory rather than disk drives very close to the processor.
The result of more storage being closer to the processor will be for some storage functionality to move away from storage arrays and filers and closer to the processor, a trend that is made easier by multi-core processors that have cycles to spare. The challenge for storage management will be to provide the ability to share a much more distributed storage resource between processors. Future storage management will have to contend with sharing storage that is within servers as well as traditional SANs and filers outside servers.
Storage Efficiency Technologies
Storage efficiency is the ability to reduce the amount of physical data on the disk drives required to store the logical copies of the data as seen by the file systems. Many of the technologies have become or are becoming mainstream capabilities. Key technologies include:
- Storage virtualization:
- Storage virtualization allows volumes to be logically broken into smaller pieces and mapped onto physical storage. This allows much greater efficiency in storing data, which previously had to be stored contiguously. This technology also allows dynamic migration of data within arrays that can also be used for dynamic tiering systems. Sophisticated tiering systems, which allow small chunks of data (sub-LUN) to be migrated to the best place in the storage hierarchy, have become a standard feature in most arrays.
- Thin provisioning:
- Thin provisioning is the ability to provision storage dynamically from a pool of storage that is shared between volumes. This capability has been extended to include techniques for detecting zeros (blanks) in file systems and using no physical space to store them. This again has become a standard feature expected in storage arrays.
- Snapshot technologies:
- Space-efficient snapshot technologies can be used to store just the changed blocks and therefore reduce the space required for copies. This provides the foundation of a new way of backing up systems using periodic space-efficient snapshots and replicating these copies remotely.
- Data de-duplication:
- Data de-duplication was initially introduced for backup systems, where many copies of the same or nearly the same data were being stored for recovery purposes. This technology is now extending to inline production data, and is set to become a standard feature on storage controllers.
- Data compression:
- Originally data compression was an offline process used to reduce the data held. Data compression, used in almost all tape systems, is now being extended to online production disk storage systems and is set to become a standard feature in many storage controllers. The standard compression algorithms used are based on LZ (Lempel and Ziv), and give a compression ratio between 2:1 and 3:1. Compression is not effective on files that have compression built-in (e.g., JPEG image files, most audio visual files). The trend is toward real-time compression where performance is not compromised.
WikiTrend: Storage efficiency technologies will have a significant impact on the amount of storage saved. However, they will not affect the number of IOs and the bandwidth required to transfer IOs. Storage efficiency techniques will be applied to the most appropriate part of the infrastructure and become increasingly embedded into systems and storage design.
Milestones for Next Generation Infrastructure Exploitation
Some key milestones are required to exploit new infrastructure directions in general and storage infrastructure in particular:
- Sell the vision to senior business managers.
- Create a Next Generation Infrastructure Team, including cloud infrastructure.
- Set aggressive targets for infrastructure implementation and cost savings, in line with external IT service offerings.
- Select a stack for each set of application suites.
- Choose a single vendor infrastructure stack from a large vendor that can supply and maintain the hardware and software as a single stack. The advantage of this approach is that the cost of maintenance within the IT department can be dramatically reduced if the software and hardware firmware are treated as a single SKU. The disadvantage is lack of choice for components of the stack, and a higher degree of lock-in.
- Limit lock-in with a sourcing strategy. Choose an Ecosystem Infrastructure Stack of software and hardware components that can be intermixed. The advantage of this approach is greater choice and less lock-in, at the expense of significantly increased costs of internal IT maintenance.
- Reorganize and flatten IT support by stack(s), and move away from an organization supporting stovepipes. Give application development and support groups the responsibility to determine the service levels required and the next generation infrastructure team the responsibility to provide the infrastructure services to meet the SLA. Included in this initiative should be a move to DevOps, where application development and infrastructure operation teams are cross-trained with the goal of achieving hyper productivity.
- Create a self-service IT environment with a service catalogue and integrate charge-back or show-back controls.
From a strategic point of view, it will be important for IT to compete with external IT infrastructure suppliers where internal data proximity or privacy requirements dictate the use of private clouds and use complementary external cloud services where internal clouds are not economic.
Overall Storage Directions and Conclusions
Storage infrastructure will change significantly with the implementation of a new generation of infrastructure across the portfolio. A small percentage of application suites will require a siloed stack and large scale-up monolithic arrays, but the long tail (90% of applications suites) will require standard storage services that are inherently efficient and automated. These storage services will be more distributed within the stack with increasing amounts of flash devices and distributed within private and public cloud services. Storage software functionality will become more elastic and will reside or migrate to the part of the stack that make most practical sense, either in the array or in the server or in a combination of the two.
The IO connections between storage and servers will become virtualized, with a combination of virtualized network adapters and other virtual IO mechanisms. This approach will save space, drastically reduce cabling, and allow dynamic reconfiguration of resources. The transport fabrics will be lossless Ethernet with some use of InfiniBand or other high-speed interconnects for inter-processor communication. Storage will become protocol agnostic. Where possible, storage will follow a scale-out model, with meta-data management a key component.
The storage infrastructure will allow dynamic transport of data across the network when required, for instance to support business continuity, and with some balancing of workloads. However, data volumes and bandwidth are growing at approximately the same rate, and large-scale movement of data between sites will not be a viable strategy. Instead applications (especially business intelligence and analytics applications) will often be moved to where the data is (the Hadoop model) rather than pushing data to the code. This will be especially true of big data environments, where vast amounts of semi-structured data will be available within the private and public clouds.
The criteria for selecting storage vendors will change in the future. Storage vendors will have significant opportunities for innovation within the stack. They will have to take a systems approach to storage and be able to move the storage software functionality to the optimal place within the stack in an automated and intelligent manner. Distributed storage management function will be a critical component of this strategy, together with seamless integration into backup, recovery, and business continuance. Storage vendors will need to forge close links with the stack providers, so that there is a single support system (e.g., remote support), a single update mechanism for maintenance, and a single stack management system.
Action Item: Next generation storage infrastructure is coming to a theater near you. The bottom line is in order to scale and “compete” with cloud service providers, internal IT organizations must spend less time on labor-intensive infrastructure management and more effort on automation, and providing efficient storage services at scale. The path to this vision will go through integration in the form of converged infrastructure across the stack with intelligent management of new types of storage (e.g. flash) and the integration of big data analytics with operational systems to extract new value from information sources.
Footnotes: