Posts Tagged Big Data
A Massachusetts company called Prelert released a new application yesterday that combines machine learning and predictive analytics to detect and report anomalous behavior emanating from IT infrastructure. If that sounds a lot like what Splunk does, you’re right.
As data is continuously collected and created, companies have difficulty just storing it, missing any opportunity to leverage the information. The wave of big data has the potential to flip the burden of data management into the opportunity of new value creation. Yesterday’s solutions don’t accomplish this today and will be even less effective tomorrow.
While the volume of data has grown exponentially over the last few decades, the fundamental and underlying technology on which we store data hasn’t. Sure, we’ve had improvements in densities (to store more data) and connectivity (to provide better access to data), but the pace of data growth has overwhelmed the benefits of these technological advancements.
We all know there’s lots of excitement and buzz surrounding Hadoop, but talk to some CIOs in “non-web” industries about moving mission critical apps to the open source Big Data framework and you’re bound to hear a little fear in their voices.
They’re worried that Hadoop is not ready for primetime because it has a single point of failure. That is, if the NameNode in a cluster goes down, the entire cluster goes down. Spinning clusters back up into working order following a NameNode failure takes time and, by definition, mission critical applications can’t go down … ever. Until the SPOF is solved, more than a handful of Fortune 500 companies will continue paying Oracle through the nose rather than risk a disruption to critical apps.
EMC has been touting its “Cloud Meets Big Data” messaging for nearly two years now, and today it took a major step in transforming that message into reality.
EMC announced that it is forming a new “virtual organization” focused on Big Data and application development in the cloud. EMC is calling the new organization the Pivotal Initiative and it will include 800 employees from EMC’s Greenplum and Pivotal Labs divisions, and 600 employees from VMware’s vFabric, Cloud Foundry, GemFire, SpringSource and Cetas organizations. EMC owns over 80% of VMware, where former EMC COO Pat Gelsinger joined as CEO earlier this fall.
Former Republican congressman-turned-TV pundit Joe Scarborough doesn’t buy Nate Silver’s numbers. For Scarborough, they just don’t add up.
Speaking on Morning Joe on Oct. 29, when Silver’s FiveThirtyEight blog put Obama’s chances at reelection somewhere around 75%, Scarborough declared: “Both sides understand [the presidential election] is close, it could go either way, and anybody that thinks this race is anything but a tossup right now is such an ideologue they should be kept away from typewriters, computers, laptops and microphones for the next ten days because they’re jokes.”
The fear (or is it disdain?) is sometimes justified. No developer wants to get locked in to a platform that dictates which tools she can use, which data sources she can integrate, which hardware she must deploy or that makes switching to a competing platform too costly to justify.
Next week theCUBE is back in action, this time covering two Big Data conferences in one. Strata Conference + Hadoop World on theCUBE kicks off live Wednesday (10/24) morning at 10 am ET on SiliconANGLE.tv. We’re broadcasting all day Wednesday and all day Thursday (10/25) from New York City with virtually non-stop live interviews with the smartest nodes at the conference.
We’ve identified the most compelling news and trends that will be developing at the show and programed our coverage to flesh them out in great detail. Among other trending topics, you’ll get full coverage and analysis of the emerging Big Data application development market, the state of real-time analytics in Hadoop environments, and new ecosystem partnerships, as well as some great advice for Big Data practitioners from Big Data practitioners.
It’s Oracle OpenWorld this week and that means more colorful if factually questionable statements from everybody’s favorite egomaniacal billionaire CEO. And, not surprisingly, Larry Ellison’s target was archrival SAP.
“SAP has an in-memory machine, you know, that’s a little bit smaller than what we offer,” Ellison said at OpenWorld yesterday, referring to SAP HANA and Oracle’s own all in-memory database Exadata X3, which debuted this week. “We have 26 terabytes of memory; [SAP offers] 0.5 terabytes of memory.”
In case you missed his point, Ellison put it as succinctly as he could: “The HANA in-memory machine is, like, really small.” (Hat Tip to eWeek)
If you’re interested in Big Data and you find yourself in Boston tomorrow you owe it to yourself to head over to the Hyatt Regency in the Financial District. That’s where The IE Group is putting on its Big Data Innovation Summit and the list of speakers and sessions is impressive.
Here’s just a small sampling of the speakers:
- Facebook’s Mohammad Sabah, a Data Scientist with previous experience at Netflix, will discuss “applying scalable machine learning algorithms for applications ranging from ranking to search to matching.” Sabah, who joined us on theCUBE at Hadoop Summit, will further explore why Hadoop is particularly effective on large and otherwise inaccessible data sets.
If you’ve been unable to keep up with all the competing NoSQL databases that have hit the market over the last several years, you’re not alone. To name just a few, there’s HBase, Cassandra, MongoDB, Riak, CouchDB, Redis, and Neo4J.
To that list you can add Accumulo, an open source database originally developed at the National Security Agency. You may be wondering why the world needs yet another database to handle large volumes of multi-structured data. The answer is, of course, that no one of these NoSQL databases has yet checked all the feature/functionality boxes that most enterprises require before deploying a new technology.