Ed Note: The following is a near-transcription of an interview with Richard Snee of Greenplum (now the Data Computing Division of EMC) with David Vellante, co-founder of Wikibon.org, and John Furrier, founder of SiliconAngle, on Siliconangle.TV. I did this for my notes for articles for Wikibon.org and CloudAngle.com and am posting this at David Vellante's request for the use of others with an interest in this area. Note that the speakers are identified by initials. I apologize for any typos.
DV: You guys had a big announcement today with CloudEra.
JF: SiliconAngle's office is actually inside CloudEra's, and I am very impressed with what they are doing. These guys have all seen the future around big data. They all worked either at Google, Yahoo, Facebook, and they've seen the future. They're now in the present, & they are leading that journey. It is a whole new datatype. The traditional enterprise is not used to that; they are used to transactional data. The enterprise will soon be like Facebook. They will have huge amounts of data.
I am really pleased to have Richard Snee, VP of Marketing for the startup Greenplum, which is now part of EMC. So the first question is tell us what it's like to work at EMC? Your startup gets acquired. Pat Gelsinger was very high on this acquisition, he loved Greenplum. So tell us a little about the fit at EMC, and why Pat is so excited about you guys.
RS: It's been amazing. We hit the ground running. It is now 60 days since the acquisition closed. We just haven't stopped. The integration has gone very smoothly. The evidence is the announcement today with CloudEra; we aren't stopping for anything. We have full support & all the resources that EMC brings to bear for what used to be “little old Greenplum” -- or some people considered it that – but now with the support of EMC the possibilities are endless for us to realize the complete Greenplum vision around large scale data & analytics.
JF: Talk about what CloudEra and this big Hadoop movement is all about.
RS: The Hadoop movement is in some ways the wave of the future. At Greenplum we look very carefully at trends based on feedback from our customers. In our core customers we have seen more and more use of Hadoop and CloudEra's distribution of Hadoop. So it was a natural transition for us to work closely with CloudEra in support of what our customers are doing. It really is the marriage of both worlds – unstructured data which is Hadoop and structured data, which is Greenplum. The two of those working together will provide some powerful results for our customers.
JF: Mike ???, the CEO of CloudEra talks about the tsunami and that customers are awash with new data. How do you see that evolve? How do you talk with customers who recognize that the tsunami is coming and those that don't?
RS: I think Mike is right that the tsunami is here, & it's only getting bigger. I like to tell an anecdote from one of our very large customers, that when they talk in terms of their traditional EDW they really talk about that being only 10% of the data that is useful to them. The other 90% outside that EDW represents business innovation to them. So companies that are not looking at their data in that manner are going to lost traction and not have the competitive edge they need in today's business world.
DV: So is that what you see going forward? People will have what some people call that 360° view? That's really nirvana for the big data world isn't it?
RS: We believe now at EMC that we are building the data system of the future. And that is about all of your data, all of your tools, & all of your people. So whether it is a solution provider or technology company if you are not providing your customers with the ability to leverage all of that – all of their data & all of their tools – that is a mistake.
DV: Dave Floyer & I were meeting with Wikibon clients recently & talking about big data,. & he said “It's like I'm a snake swallowing a basketball.” How do you solve that problem? Obviously you guys have massively parallel architecture, & now you have the largest storage company in the world. Where do you see that going? Is it an appliance, is it a bunch of flash, is it architecture, is it a new process? How does it all come about?
RS: It starts with relationships like CloudEra & EMC. This is stepping back and looking at the entire world, the entire challenge & the list of opportunities associated with that. And then making thoughtful, incremental steps in providing solutions to your customers. That's how we look at it. When we step way back & think about the data center of the future, there is no doubt that Greenplum and EMC are completely aligned in the journey to the private cloud. We think in the data warehousing & analytics space there is great value in pursuing that new architecture in terms of what we call the Enterprise Data Cloud. Within that you will see the opportunity for integration between things like Cloud Era's distribution of Hadoop and an analytic database like the Greenplum database. I think you will see a diverse set of tools & form factors, whether it is software that runs on commodity hardware or an appliance that fits within that environment, I think you will see that within that enterprise cloud environment.
JF: What is the new operating system? We saw the notion of cloud-in-a-box floating around Oracle. No one can really do that out there. So when you say data system of the future, these are operating system components. What is the operating system that is now out there that will be the platform?
RS: When we think in terms of the data system or even what our division is called – the Data Computing Division. That is a far cry from data warehousing that has the implication of static. Data computing has the implication of taking action, speed. The fundamental principles of what data computing is are: It is bringing the processing closer to the data & closer to the people. The technology for bringing the processing closer to the data is MPP architecture and in-database analytics. Now bringing processing closer to the people – that's where virtualization and collaboration come in. Those are all of the things we are working on.
Specific to your question about operating system – your guess is as good as mine. But the writing's on the wall.
JF: It's not owned by any one person. Microsoft had that system software PC-centric ownership. Mac made some mistakes, obviously.
DV: What do you mean by in-database? Can you elaborate?
RS: There's a wave of the – the new wave of parallel processing of analytics within database. This is at Greenplum's core. There are other analytic RDBMS's that are some would say light years & others would say incremental steps beyond OLTP – traditional systems.
DV: So having come from Oracle Openworld, Oracle knows a little about database. So is Oracle friend or foe?
RS: What's the word I hear over and over? Coopitition.
DV: You've been hanging around Joe Tucci.
RS: So one of the interesting things about being an employee of EMC & meeting all my new colleagues is seeing the amazing work that's done with Oracle & the amount of cooperation & large amount of business that's done with EMC & Oracle. That's very exciting.
At the same time in the world I live in specifically – data warehousing & analytics & now data computing – we look very carefully when we hear about the new Exadata machine and so forth. Fundamentally we just have a different view of the way the world is going than Oracle does. We think there is this new approach that is not the view of other companies.
DV: We've talked a lot about this, haven't we John. Oracle really represents the old way of doing business. It is trying to make some changes, but ....
JF: Oracle is like the IBM in the mainframe days or the telco. They have a ton of cash, great business performance, they're everywhere, but they really aren't innovating. So what we've been trying to tease out is where is the innovation from Oracle? They have Java – JavaOne's going on. They're massive. Oracle's a monster. Oracle and EMC have a lot of joint customers. As does Oracle and HP for that matter. So you have to deal with Oracle. So we'll see how they handle the partnering going forward. There is the commercialization. But the data is the critical lynchpin in my mind in the cloud.
RS: I'm focused on this one area. You guys have a much broader view. But the challenge for any company, with all the consolidation that's going on, and all the friend or foe. These are issues that are not specific to Oracle or EMC. It's everybody's challenge these days.
JF: We've heard from several folks. Oracle wants to be #1 or #2 in the areas where they're focusing. Everything else is cooperatition and is partnering. One thing we are bullish on at Silicon Angle is this new movement with people like Facebook building their own systems, you see people filling their own solutions with Open Source. There's not a lot of proven solutions out there. So we are seeing a lot of proof of concept. So can you share some of those things you've see that are proof-of-concept and big proof-of-concept?
RS: I just go back to our statement before. Our development is inspired by our customers and the proof points that we see in the real world. That has driven the relationship between CloudEra and EMC. One thing I thought was remarkable in customers is how they used Hadoop as a data staging environment. You can almost think of it as the equivalent of the ELT component or tool of data integration, to get that into the relational database management system. So now you have this situation where there are these massive amounts of data and the use of Hadoop to prepare, stage, prepare that to import it into an RDBMS, in this case Greenplum, to run queries on that in a massively parallel nature. So that's one example.
I will also say we have been amazed at the adoption inside the federal government. I can't talk about that specifically, but we've been inspired there.
DF: Another emerging area – because you are in this new ground you are seeing all these new trends. One of these is applications. We all know Apple, the App Store, but in the old world you have a few applications – a finite amount. Now you have a long-tail situation where there could be thousands of apps out there. So talk about the application world, because Hadoop really came out of Yahoo!, which has tons of apps. So these little apps require a lot of data interaction. And a lot of the innovations around these new apps require access to structural data, behavioral data, little data chunks that by themselves don't make sense but can be a real value add in the aggregate. So can you talk about the app trend and how data plays a role in innovation?
RS: First we like to say data is the killer app. People say killer app for what, and we say killer app for everything. For private cloud computing, cloud computing, enterprise computing for that matter. But specifically what you made me think about our work currently on collaboration and where there is this connectedness in all these different processes or applications. This is back to my commenting about bringing the processing closer to the data and closer to the people. We have analyzed the whole workflow inside this process of analysis of data, & we have been driven by this new community of hard core data scientists – data rock stars – that are intense about doing deep & powerful analytics, where it's no longer a singular task. And we have a responsibility to provide this framework, whether it is for applications or for people, where there is this connectedness and the ability to leverage one another and leverage one another's work.
DV: It is kind of a new model. We talk to customers who are doing DI and what we call data warehousing initiatives. There's a lot of legacy infrastructure built up. Is that how Greenplum and EMC extend Greenplum's initial work is those new vision? It couldn't have been easy to crack that nut as Greenplum, & it still won't be easy as part of EMC. You have to put forth a new vision, don't you? And is that the new vision?
RS: We do and that is. And it comes back to our vision of the enterprise data cloud. As Greenplum with that massive vision to take in, you could be skeptical about that. But now that it's part of EMC, it's going to become a reality. That's the path we are moving down to be sure.
The other thing, I think companies & individuals just behave in a manner where sometimes they don't want to take a step forward. What they have is comfortable or they have to do it, or what have you. But when we think about our position currently is that while we have this large, powerful vision of the new way to do data warehousing and analytics, we also have these what we call low-risk entry into that space. It comes back to the core EDW. We love that, and whatever our customers have, great. We can replace that if you want, but fundamentally the next big step for companies going into the data cloud is to look at a net new and complementary analytics infrastructure that sits side-by-side with the existing infrastructure. And more often than not that is how we get started with our customers.
DV: That vision of real time is key, because most of the time that EDW is not real time. Most of the time they are looking in the rearview mirror, looking at patterns and trying to project forward. But that notion of realtime analytics tied to the business is critical.
JF: If data is the new development kit – and I would agree 100% that data is everything – how does someone become a rock star in data? What's out there for developers who want to be data rock stars? Are there development platforms that are more prone to working on these problems and creating value?
RS: It's something we look at very carefully. And our take on it is there are so many tools & diverse recipes in how individuals approach this opportunity or problem. We decided that we need to nurture and invest in the community – not just focus on the data rock stars, not just focus on the universe of SAS analysts – but to look at the entire community and learn from them what kind of resources, toolkits, support do they need to find their way to this new vision?
JF: Is there are roadmap there?
RS: One thing we did, and this is another example of the powerful and unified support we received at EMC, where we have an absolutely free and fully functional version of the Greenplum database called the Single-Note Edition. That is something we position and consider a power tool for power users. But we found when we launched that exactly one year ago – and we will bring out a new version soon as part of EMC – that this is an opportunity for everyone in the community to download, utilize one of the premier analytic platforms in the industry. So that is just one example.
JF: Big news today about partnering with CloudEra, the brain trust around Hadoop.