Introduction and Request for Input
In 2008 and 2009, Wikibon has awarded CTO awards for the best storage technology innovations. A list of the winners is given in the footnote below.
In 2010 Wikibon has broadened its coverage to include the enterprise infrastructure in general and Infrastructure 2.0 in particular. This includes server, networking, storage, hypervisors, security, database, data management, and archiving, with a focus on the infrastructure implications. In line with this expansion of coverage, Wikibon has refocused its Wikibon CTO award on the best enterprise infrastructure technology innovations of the year.
In 2010 Wikibon received over 150 vendor briefings from over 80 vendors, in addition to many project related briefings and other conversations. A list of these organizations is given in the footnotes below. From these interactions with vendors, we are able to distill some important trends within the industry as well as some outstanding technology contributions. These are listed and described below.
Wikibon is asking it members to participate in the process of selecting the Wikibon CTO award in 2010 by:
- Telling Wikibon what additional vendor technologies that were shipped in 2010 you would pick as outstanding in the Enterprise IT Space.
- Telling Wikibon what vendor technology(ies) you would pick as the best, and which would be your honorable mention(s).
Wikibon will take this input and publish the name of the winner(s) later this month.
You can write suggestions in the comments section of this post (preferred, you will need to register if you haven’t), or you can write to David Floyer at David.Floyer@wikibon.org.
Thanks in advance for you contributions!
Key Infrastructure Trends in 2010
2010 saw many technology and production introductions and announcements, and the breakneck speed of the IT industry has not abated. Wikibon identified some important trends from this maelstrom of innovation. These were “big data”, cloud services, simplicity, virtualization, NAND flash, and data efficiency. These trends and the key technologies that are highlighted are heavily influenced by the Wikibon focus on enterprise IT infrastructure and the vendors who have briefed us.
2010 is the year that “big data” became big time. IDC issued and EMC-sponsored report May 2010 called The IDC Digital Universe - Are you Ready?, predicting that data volumes would grow by 45 times by 2020. The largest percentage of predicted storage growth by far is in unstructured and semi-structured data. A large portion of this is the digitization of analogue information and retention of data that previously would be discarded. Big data is often also geographically distributed, and a large proportion of this data will reside within the cloud at some stage of its existence.
The result of this massive data growth is both a potential problem for organizations (because of its potential use in lawsuits and the cost of managing data), and huge potential value. The early users of big data analytics have been financial organizations and telephone companies, who are using it to understand the user experience with their products, including identifying what factors help to retain customers, or irritate and loose them. Every organization has data that resides as part of its own processes (e.g., manufacturing process systems, warehousing systems, and warranty claim systems), data that resides with partners, suppliers, and customers, and data that is available on the Web. Tapping into that data can potentially improve quality, efficiency, marketing focus, investment returns and many other areas within enterprises that previously relied on gut-feel and institutional knowledge to estimate.
Apache Hadoop™ became a leading open-source tool most associated and most used for enterprise big data analysis. Hadoop was originally developed by Doug Cutting at Yahoo!, who named it after his son’s stuffed elephant. Yahoo! and Facebook contributed significant open-source code to the project, and Yahoo! distributed an as-is version of Hadoop. Google and IBM collaborated in 2009 to promote and ship Hadoop as a research project with universities. In 2010, Cloudera distributed the first enterprise packaging of Hadoop, using the RedHat/Linux model of a single throat to choke. Doug Cutting moved to Cloudera in August 2010. The HadoopWorld conference sponsored by Cloudera attracted more than 1,000 attendees who a year earlier would not have known the difference between the database system and the toy elephant.
The impact of big data on enterprise IT and the IT infrastructure will be profound. Traditional databases that take a centralized approach cannot cope with the cost of transferring data across networks and the elapsed time to load data. In addition, the traditional licensing costs are prohibitive.
The key innovation of Hadoop and other associated tools is that analysis code is distributed to the data and run on the data node. This approach provides more efficient scaling and drastically reduces the amount of data that has to be transported across networks. New entrants into this database space such as Greenplum, Neteeza, and Vertica have specific capabilities (such as very fast load times) that make them potentially more suitable, although without the track record of traditional vendors such as IBM DB2, Oracle, and Teradata.
Early results of the use of big data are very encouraging. Early users are A9.com, AOL, Booz Allen Hamilton, eHarmony, eBay, Facebook, Fox Interactive Media, Freebase, IBM, ImageShack, ISI, Joost, Last.fm, LinkedIn, Meebo, Metaweb, The New York Times, Ning, Powerset, Rackspace, StumbleUpon, Twitter, Veoh and Zoosk. A mobile telecommunication company has used the big-data approach to understand user experience in much greater detail by analyzing billions of records, identifying changes to the customer processes that have radically improved the acceptance of automation by customers (8% to 80%), and radically reduced the number of times that customers switch services.
The IDC report referenced above predicted that by 2020 almost all data would spend at least part of its life in the cloud, either as the primary storage (e.g., for movies, surveillance records, shared photographs and movies), for tier 3 primary storage for dispersed file systems (e.g., Cleversafe Dispersed Storage™, DDN’s WOS), or as backup and archiving (e.g., Iron Mountain, Asigra and many others). 2010 has been year cloud service enablement in general; infrastructure products such as self-service capabilities for both service providers and internal cloud systems are being put in place through companies such as newScale, and the ability to provide "virtual private arrays" within a single storage array to provide secure administrative segregation of users, hosts, and application data is being provided by products such as 3PAR's (now HP) Virtual Domains Software. The variety of cloud services is increasing dramatically in some areas. Google Docs (which requires no client software outside the browser) and Microsoft Office Live (which not surprisingly requires Microsoft Office on the desktop) have emerged from nowhere in 2009 to ubiquity in 2010.
For CEOs, cloud services present an opportunity to reduce cost by outsourcing functions to cloud service providers and a benchmark cudgel to measure and improve internal IT services.
One of the underlying themes of 2010 was the drive to simplify systems. The design team of Jonathan Ive & Steve Jobs has set the standard with consumer products such as the iPod, iPhone, MacBook Air, and iPad. These products succeeded not because they had the most features but because of the integration of the features and services into a seamless customer experience. This experience includes automatic updating of software levels, the ease of finding and downloading of applications, the availability of content, the avoidance of malware though tight Apple control, and the impeccable and intuitive ease-of-use touch interface.
In the infrastructure space there have been two major simplicity trends that have continued in 2010:
- Simplified storage systems that Wikibon has called Tier 1.5 systems, started by startup companies such as 3PAR, Compellent, and XIV earlier this decade ( all now purchased by full system vendors). These systems and others such as IBM’s SVC system are fully virtualized and have very low user “click” requirements.
- The announcement of simplified total stacks, such as EMC/CISCO/VMware Virtual Desktop Stack and Oracle’s Exadata and Exalogic stacks. The major characteristic is that the total hardware and software is delivered as a single SKU, and the hardware microcode and system software updated as a single file. The result of this is significant savings in support and improvements in availability and security.
Virtualization of servers with VMware, Hyper V or Xen is moving beyond test and development. Many organizations are strongly pursuing virtualization with a few organizations approaching 90% in test and development and Web services. In addition, production applications are also seeing heavy adoption of virtualization in many organizations. The one holdout against virtualization is in the area of databases in general and Oracle in particular. Where the size of the database server was equal or greater than a single physical server, there is almost no take-up. Oracle has thawed its support stance with VMware by announcing formal support for Oracle 11g RAC in November 2010.
Storage virtualization has continued to gain traction, with almost all vendors providing support for some degree of virtualization in storage arrays. HP with its Virtual Connect technology and Next I/O, Virtensys and Xsigo Systems have continued to push virtual I/O as a way to reduce cabling and improve the utilization of server NIC and storage cards.
EMC announced flash drives from STEC in its DMX in 2008 and earned a Wikibon CTO award for this innovation. Since then, all storage vendors have now introduced SSDs using a number of different vendors.
However, the most profound IT system innovation with NAND flash in 2011 and beyond will be Fusion-io-like implementations that plug directly into a PCIe slot, and that provide a seamless extension to traditional RAM storage at a much lower price-point. Fusion-io announced its VTL architecture, which allows the data to be written to persistent storage in a single pass. This approach eliminates the protocol overheads of addressing external disk and solid-state drives of array-based SSDs.
While flash devices in storage arrays clearly have their place, their adoption is limited by the fact that SSDs are more expensive than traditional hard drives, and the adoption will have some friction. The use of flash technology as an extension of RAM will lead to very large memory systems at a lower price-point than traditional RAM can provide. There is a requirement for software within the OS and key file-system and database software to enable this approach. There is likely to be some enabling software for this approach from Linux systems in 2011, as well as from Oracle. IBM and Intel may also offer new frameworks, perhaps late in 2011. Software support will virtually eliminate friction for adoption of this technology and lead to an entirely new generation of applications that will provide much higher volumes of data to end-users to improve both functionality and productivity.
2010 saw a significant trend towards unified storage and data efficiency services such as compression, de-duplication and space-efficient virtual copies that can be applied to file- and block-based primary storage. IBM bought Storwize for its in-line Real-time compression technology, and Dell bought Ocarina. Permabit announced its Alberio technology for in-line de-duplication on an OEM basis.
Wikibon expects that these technologies will become standard offerings within storage arrays, and will be fully integrated.
A Review of Outstanding Innovations in 2010
Below are some of the outstanding technologies that were introduced into the marketplace by vendors in 2010. These reflect Wikibon’s bias in focusing on enterprise IT infrastructure, and the bias introduced by the specific companies and individuals that have briefed Wikibon.
Apple – for introducing the iPAD and the MacBook Air, both of which are important to the enterprise data center because they represent new breakthroughs in ease-of-use (with a multi-touch screen for the iPad), and the use of flash technology to improve response time and to provide always-on capabilities for both devices. These devices are bringing in a revolution both in the clients that enterprises have to support and the amazing richness of function and quantity of data, and ease-of-use that users will rapidly grow to expect from all applications. Categories: NAND Flash, Simplicity
Asigra – for a new model of backup and recovery which is user-friendly both for clients and VARs. Categories: Cloud, Simplicity
Avere – for combining all the layers of storage (RAM, Flash, FC disk and SATA disk) both locally and remotely to create a very efficient global file system. Categories: Data Efficiency, NAND Flash, Simplicity
ClickFox – for developing a comprehensive set of big-data software and services to process the billions of data elements required to piece together the total customer experience for an organization from all the touch points with customers and enabling action to be taken on a quarter-by-quarter cycle instead of a yearly cycle. Categories: Big Data
Cloudera – for the first enterprise packaging and distribution of Hadoop™ , an open-source product used to store and process “big data” (complex, large-scale data with petabytes of information), often distributed across thousands of servers. Hadoop is in production use at most of the world’s largest Web companies, including Facebook, Google, and Yahoo!. A fundamental innovation behind Hadoop™ and other components such mapReduce, Hive, and Pig is that the application code is moved to the data rather than the data moved to the application code, making the technology highly suitable for distributed data sources. Categories: Big data, Cloud
Clustrix – for the first enterprise SQL database appliance with a completely different approach to achieving scale-out database architecture designed to meet the needs of fast-growing internet companies like PhotoBox. The appliance has identical nodes including software, processors, InfiniBand, battery-backed NVRAM and storage, the data is spread in hashed slices across the nodes, and the code pushed to the node to ensure that almost all locks are local. Clustrix has shown in TPC and other benchmarks that the database scales extremely well for "big-data" transactional systems that need the ACID SQL properties, but don't need the high costs of traditional databases or the high cost of bespoke "sharding" that is currently used by almost all social media and other transactional sites. Categories: Big data, Cloud, Simplicity
Compellent – for continuous innovation in Tier 1.5 storage subsystems, the first to implement both automated tiered storage (data progression) and live volume, as well a pragmatic innovative solution to jump start remote replication with portable volumes. Categories: NAND Flash, Data efficiency, Simplicity, Virtualization
EMC – for introducing a single stack solution with VCE hardware and storage and VMware hypervisor and VDI software. Categories: Simplicity, Virtualization
FalconStor – for introducing the first fully functional read-write flash/cache implementation in a storage controller using technology from Violin, giving the same performance as tiered storage using one third of the flash storage and operating in real-time. Categories: Data Efficiency, NAND Flash, Simplicity, Virtualization
Fusion-io – for continuous innovation to enable flash as an extension of the processors main memory rather that a storage disk-drive. Its VSL architecture allows applications and operating system functions to execute a persistent write to flash in a single pass, thus improving performance and latency by a factor of 10 over disk solutions and heralding levels of performance, richness of data, and ease-of-use than are impossible in disk-based systems. Categories: NAND Flash, Simplicity
NetApp – for introducing in-line compression for primary storage as a complementary free function to its previous de-duplication functionality as an integral part of its storage controller ONTAP operating system, and making the efficiency functions available both to block-based and file-based storage in a very high function unified storage array. Categories: Data efficiency, Simplicity, Virtualization
Next-IO – for simplifying the implementation and cost of top-of-rack virtual I/O and allowing virtual graphics cards and network cards that can be shared by an entire rack of servers using native drivers. Categories: Simplicity, Virtualization
Oracle – for introducing a complete stack of server, storage, operating system, hypervisor and database as a single SKU, with a single update version, Categories: NAND Flash, Simplicity
Permabit – for the introduction of Albireo, a blazingly fast in-line de-duplication for primary storage, which has scored the highest rating on the Wikibon CORE methodology. Categories: Data efficiency, Simplicity
QLogic – for implementing by far the richest set of CNA functions and architecture in the industry, and jump-starting the general acceptance of CNAs. Categories: Simplicity, Virtualization
XIOtech – for introducing an order of magnitude improvement in the growing problem of data corruption on disk drives with the introduction of its ISE (Intelligent Stage Element) “brick”. Categories: Simplicity, Virtualization.
Remember the invitation to nominate any additional vendor technologies that have been introduced in 2010, and nominate your pick as winner(s) and honorable mentions.
Action Item: What additional vendor technologies would you pick as outstanding in the Enterprise IT Space? What vendor technology(ies) would you pick as the best, and which would be your honorable mention(s)?
Footnotes: Formal Vendor Briefings were receivedin 2010 from the following organizations: 3PAR, Actifio, Aprigo, Aprius, Arista Networks, Asigra, Atrato, Attivio, Autonomy, Avere, Bocada, Broadcom, Caringo, Cirtas, ClickFox, Cloudera, Clustrix, Cisco, Cloud.com, Cofio, CommVault, Compellent, DataDirect Networks, Dell, Digital Reef, Dot Hill, Egenera, Emulex, eSilo, Exagrid, FalconStor, Fusion-io, Greenplum, IceWeb, Imprivata, Index Engines, Interlock, Isilon, Kroll Ontrack, Mellanox, Nasuni, NetApp, newScale, Nexsan, NextIO, Nimble Storage, Nirvanix, Oracle, Overland Storage, Permabit, QLogic, Sonian, StoredIQ, Storwize, Syncsort, VCE, Veridity Software, Virsto Software, Virtual Instruments, VM6 Software, VMTurbo, XIOtech, Zetta.
In 2008 and 2009, Wikibon was focusing mainly on the enterprise storage space. The CTO awards that were given as follows:
- Wikibon CTO award for the best storage technology innovations of 2008
- EMC’s 2008 introduction of flash storage drives
- Axxana’s 2008 Introduction of enterprise data recording (EDR)
- Wikibon CTO award for the best storage technology innovations of 2009
- Cleversafe 2009 Introduction of Dispersed Storage
- Storwize 2009 Introduction of Inline (now IBM) Real-time Compression™ of file-based data
- Unisys 2009 introduction of Stealth for securing cloud networks and data.