Posts Tagged Hadoop
The Wikibon / SiliconANGLE team is excited to be broadcasting live coverage of this year’s Strata Conference in Santa Clara, California, February 28th through March 1st of this week. Co-hosts John Furrier and Dave Vellante will be bringing coverage live on #theCUBE, the flagship telecast from SiliconANGLE.tv, with original content and analysis developed exclusive by the SiliconANGLE and Wikibon teams.
With a full schedule of guests and in-depth coverage of the key moments at this year’s Strata Conference, the Wikibon / SiliconANGLE team will be covering all of the angles.
EMC announced today it has integrated the Isilon scale-out network attached storage (NAS) platform with an Apache-based Greenplum Hadoop distribution to make the Big Data framework more palatable to enterprises with strict SLA requirements.
The move is designed to address a number of Hadoop’s enterprise-level shortcomings by applying Isilon’s backup and recovery capabilities and more efficient storage to the open source Big Data framework, according to EMC.
EMC will make Isilon an optional module of its Greenplum Data Computing Appliance, which also includes an Apache-based Hadoop distribution called Greenplum HD, the standard or high-capacity Greenplum database, and the Greenplum Data Integration Accelerator.
Oracle added a twist to this morning’s announcement regarding the general availability of its Big Data Appliance and related Big Data connectors. Rather than shipping the appliance with its own Hadoop distribution or the vanilla Apache distribution, Oracle has partnered with Cloudera to include its Hadoop distribution and management software instead.
Originally announced at Open World in October, the Oracle Big Data Appliance is a preconfigured hardware-software bundle running Oracle Linux. It is available in a full rack configuration of 18 Oracle Sun servers and includes the community edition of Oracle’s NoSQL database, an open source distribution of R, and Oracle HotSpot Java Virtual Machine for running MapReduce jobs, in addition to CDH and Cloudera Manager.
- 2012 Will Be the Year of Big Data Applications. Thanks to the intense competition between The Big Three distribution vendors, Hadoop developed rapidly in 2011 and is, by most accounts, enterprise-ready (there are always areas for improvement, of course, notably around Hadoop’s single point of failure issue.) This, along with readily available capital, will result in significant innovation from both existing and new start-up Big Data Application vendors now confident that Hadoop is for real. Expect to see new vertical Hadoop-based Big Data Applications for healthcare, retail, financial services and manufacturing in the year ahead, as well as horizontal applications focused on human capital management and enterprise resource planning. Adoption will start slow, but for traditional enterprises, Big Data Applications are the key to realizing impactful business value from Hadoop. 2012 should be a good year on this front.
At the start of 2011, there was only one commercial Hadoop distribution vendor on the market, virtually no Big Data application vendors with products ready for primetime, and Data Scientists were considered little more than propeller-heads working on some wacky experiments.
Well, a lot can happen in a year. As of December 2011, there are three viable commercial Hadoop distribution vendors doing battle for market supremacy, a slew of start-ups as well as stalwart software vendors getting into the Big Data application game, and Data Scientists are the new rock stars of the IT world.
Hadoop World 2011 was bursting at the seams last week. As Cloudera CEO Mike Olson put it, the Sheraton in New York City was “fire marshal full.” The official count was 1,400 attendees, but I suspect that number was even higher. Word is Cloudera had to turn away hundreds who just showed up at the door for the conference.
That’s a good sign for Cloudera as a company and Hadoop as a whole, which leads me to the first of my five key takeaways from Hadoop World.
One application that impacts the design of Data Center Ethernet Fabrics is Big Data. Hadoop runs on a shared-nothing architecture, defined as a collection of independent, possibly virtual, machines, each with local disk and local main memory, connected together on a high-speed network. This means that storage is DAS, not SAN (even from EMC’s Greenplum solutions – as discussed towards the end of this video). Even with storage out of the mix, there are special networking architectural considerations for big data environments like Hadoop. The Wikibon and SiliconAngle teams had full coverage of Hadoop World 2011, including discussions of networking with Cisco and Arista Networks.
I’ve noticed more than a few Tweets lamenting that fact that Hadoop World 2011 is sold out. Luckily for those Tweeters and anyone else that couldn’t land a ticket, we’re bringing theCUBE to New York City next week for two days (Nov. 8 and 9) of live, continuous coverage of the show.
This is our second year at Hadoop World and as you can see from the above infographic we have a great line-up of speakers and guests ready to take their spot inside theCUBE. In addition to those mentioned above, also joining us:
The Big Data vendor landscape is developing rapidly. A number of vendors have developed their own Hadoop distributions, most based on the Apache open source distribution but with various levels of proprietary customization. The clear market leader in terms of distribution is Cloudera, a Silicon Valley start-up with an all-star line-up of Big Data experts including Hadoop creator Doug Cutting and former Facebook Data Scientist Jeff Hammerbacher. A new entrant to the market is Hortonworks, which was spun out of Yahoo in June 2011 and released a completely open source Hadoop distribution of its own in November 2011.
Traditionally, data processing for analytic purposes follows a fairly static blueprint. Namely, enterprises create mainly structured data with stable data models via enterprise applications like CRM, ERP and financial systems. Data integration tools extract, transform and load the data from enterprise applications and transactional databases to a staging area where data quality and data normalization (hopefully) occur and the data is modeled into neat rows and tables. The modeled, cleansed data is then loaded into an enterprise data warehouse. This routine usually occurs on a scheduled basis – usually daily or weekly, sometimes more frequently.