Notes from Dave Cahill at the Hadoop Summit
NetApp is taking a very pragmatic approach to Hadoop. Bercovici is knowledgeable and speaks with credibility.
Hadoop & NetApp
Val Bercovici, Cloud CTO
- Creative tension between NetApp over past 4-5 years trying to establish as a big iron storage vendor yet big iron is the antithesis of Hadoop and the open source community
- NetApp very much an engineering driven company
- Netapp supports open source with equipment, money and code…free BSD, Linux, NFS
- Still think they have leading NFS server implementation in the industry, but needed a client for this as well, no motivation for OEMs to develop a robust NFS client, so Netapp hired Linux developers with NFS expertise and earned the right to be lead maintainers of NFS subsystem in Linux kernel
- Plan to do exact same things with Apache Hadoop community around all things storage and file systems
- Not about NFS vs HDFS, all about HDFS as next generation network file system to handle big data in ways that 30-40 year old posix file system was never intended
- Big data
- All about bandwidth (go fast), analytics (get insight) and lose nothing (content)
- How can storage literally and figuratively not be a bottleneck
- Beyond app layer, at workload layer you can dissect big data into analytics, bandwidth and content
- Gain insight = real time analytics for extremely large data sets
- Go fast = critical data and computational workloads
- Lose nothing = boundless secure scalable storage
- There will always be a superset of data that you will never want to lose
- Engenio deal has changed NetApp ideology significantly, it is the storage line behind every single high performance line exclusively
- Cited Teradata and Asterdata
- E-Series from Engenio = storage platform ideally suited for big data workloads, specifically Hadoop
- NetApp essentially bought leading market share in structured analytics space, fully intend to grow lead in this space
- Hortonworks partnership
- Really impressed with their vision, strong, transparent and powerful
- Perfectly aligns w/netapp vision
- NetApp will contribute to apache, will help Hadoop build unstructured lead, will also enhance HBase and other apache products
- NetApp Product
- Very simple and modular, not OnTap
- HDFS optimized modular storage appliance
- Benefit of some of hw offload is separating data protection fro jb or query completion, not monolithic separation, gives flexibility of rep count based on load, might want high or low rep count
- Rapid deployment and simple scale
- As facebook open storage becomes more public “it will have a striking resemblance to this architecture”
- Up to 160% increase is usable capacity
- Hw raid offloads compute servers
- Reduced cluster network congestion
- Strategy is to be open, very open, vibrant and diverse community
- Removing config uncertainty
- Declustering top of rack bottleneck by separating storage from query completion