This is a set of detailed notes on a briefing given by DataDirect Networks to Wikibon focused on their geo-distributed hyperscale object storage solution for very large volumes of immutable data. The notes cover extensive discussion of issues between Wikibon CTO David Floyer and DDN’s Jeff Denworth and Bob Murphy. Wikibon’s Nick Allen and I also attended the presentation.
JD: What we've complied is a brief deck that incorporates three things. One is a corporate update.
NA: <chuckling> I'm sorry, 50 slides is not brief.
JD: Fair enough. But we'll get through it briefly because slides 6-19 we won't be covering, they are simply a reference.
Last week we announced a new array product, the SFA10KX. It can deliver 15 GB/sec from two appliances & 850,000 IOPs. In practice it's the fastest midrange storage system available today. But we won't be talking about that today. I'll just do a quick corporate updated & then move to slide 20. I will turn that over to Bob Murphy, who recently joined us from Oracle and is product manager for our WOS (Web Object Scaler) scale-out distributed object store. He was Open Storage Product Marketing Manager at Oracle.
Slide 2: I just have some quick details. We are now annualized at over $200 Million in revenue. We have been & continue to be very profitable. The company is growing at about 30% annually. We have 400 people worldwide with requisitions to hire another hundred, so the company is embarking on a very substantial growth plan.
We continue to focus on the same markets – HPC, life sciences, cloud (managed service providers, scaled up Web content organizations), rich media, which is an intimation of our focus on post-production broadcast. We have a big practice in the federal market around geospacial & signal intelligence, & finally video surveillance. You see a common thread across all of these of very large data sets & often large files & a performance & scalability requirement that tips the scale in our favor vs NAS players or other players in the marketplace.
The real news is the announcement we will make Oct. 11 of a dramatic increase in capabilities to our WOS geo-distributed object store product designed for hyperscalability. In 2010 we doubled down on our investment on object storage. We saw that as a key bridge to the future of both private and public data storage services. WOS 2.0 is the first step in a long journey we plan and the first of a series of announcements over the next 12 months.
The 4 largest pure-play storage companies are EMC, NetApp, Hitachi, & DDN. The landscape for privately held storage companies over $100M is populated only by DDN, which is a point of pride. Two years ago we were talking about the 11th largest. We've grown 83% since 2008.
BM: WOS 2.0.
Slide 20: Big data, unstructured data in particular, is doubling every 18 months. Big file data, images, all sorts of things in domains we know well at DDN from the scientific and HPC field and in cloud storage both for service provider POV and social networking. These are the types of data & data growth patterns that our object storage architecture is designed to address.
Slide 21: We have experience in both HPC and cloud (social media, photo sites, etc.). We have worked with customers in both domains, analyzed their needs for storage going forward & designed WOS around that. It has specific characteristics and features around data protection, particularly performance, & its ability to geographically distribute & access data that specifically address the needs of both our HPC customers and cloud service provider customers.
Slide 22: Requirements of cloud storage: hyperscale, file access by many people from anywhere on the Internet, very high resiliency, easy to manage and scale, and has to be affordable. Those last two are particularly high priority requirements with the customers. Data growth correlates directly to storage cost unless technologies are used to manage that growth.
Slide 23: Existing architectures are very expensive & problematic to manage at these huge multi-petabyte scales. That is why we believe the object storage is becoming the de facto standard for these environments with immutable and unstructured data. What is your take on object vs other kinds of storage for these very large environments?
DF: I believe so, but I'm looking for more traditional systems to advance in this area. I wouldn't be surprised at either direction. I believe that encapsulating data and metadata will be a prerequisite, but I am not completely sure that object data will be the path. If it is going to change it will take a long while. And I would say the comparison was the Wankle engine versus the traditional piston engine. A superior design, and if it had started earlier it almost certainly would have taken over. But it couldn't get traction within the ecosystem to become a general-purpose solution.
BM: Slide 24: A few disadvantages of traditional file systems. A lot of overhead in them that encumbers scalability, whereas object storage is so simple there is virtually no overhead. Traditional systems are optimized for small IOs, linear efficient scalability. The way we do data protection has assumptions about the type of data we are protecting that also simplifies the approach and makes it more scalable. When you try to build out the traditional file system designs to increase scalability you're always running into the complexity of the file-system approach. Because they were designed for more different types of data there is more complexity in file locking, etc. they obviously have their place for heavily accessed databases. But when dealing with largely immutable data that complexity to scale out doesn't have to be there for a lot of types of data.
NA: What about state?
BM: Exactly. If you assume that the data doesn't change much, you can remove that decision point in the process & simplify. That's what allows object storage to be deployed in these large environments at lower cost.
Slide 25: A primer on file systems & objects. The file system was designed to run on a computer and NAS and unified storage to share files among a limited number of users. File locking & a number of amenities to handle quickly changing data is important in that environment. On the right side shows how object storage handles data, simply stored in a container with metadata to describe what's inside the data, where the data is stored, and how it can be recalled based on various applications that request that data.
Slide 26: Our object storage approach: Work with both HPC and Web storage domain customers to understand their data usage models. Focused on high scale, easy to manage, but also on a collaborative environment where immutable data is shared. Our domains create huge amounts of data & then share that data in a workflow in a geographically distributed environment. Labs around the world create data and then comment on that data. Just like in the 1990s when NAS file systems came out, one of their biggest values for customers was their ability to have various people in a workflow share that data. In the past that has been within an office, building, or campus. Now we can bring that same value on a global scale. And these types of organizations work on a global scale. It is a very simplified data access system, which is the trade-off. It gives us huge advantages in performance, cost, & management. We eliminate all the conflicts in a file system and reduce the instruction set to put, get, and delete. The big advantage we have is the concept of locality for lower-latency access of data around the world.
Slide 27: A diagram of our Web Object Scaler. On the left is how the product is packaged: A plug & play appliance. Each 4U chassis holds 60 disk drives. The total rack holds 2 PB of data. On bottom the picture of the globe shows how each of these modular nodes can be added to a point in the organization and receive data from places within that specific location. Then we have policies that replicate that data around the world. This provides two advantages: Data protection – as data is stored on a node it is also stored remotely.
In addition to DR – if you go back to the current NAS approach with a primary system & a standby system where data is replicated to, we use that replicated data productively. So when data is distributed around the world, it's then available locally at faster access. So customers in various domains in media/entertainment, HPC, organizations that share data & create big data around the world, can distribute that data around the world and have various parts of the workflow access it locally at high speed and modify the data in WOS 2.0 to provide this global workflow that is so interesting to our customers.
DF: A lot of work is going on in erasure coding and divvying things up across the network and having enough nodes that you can reconstruct from that with a lot less overhead than traditional ways. How would you compare that approach to yours?
JD: I think what's happening in the marketplace is these dispersal code-based data protection mechanisms are effectively a response to organizations with a multi-data-center strategy that also have a high level of data dormancy or low frequency of access in their strategy. So in that environment what I'm really targeting is tape. Organizations that would have to pull data from tape that has about the same response times as dispersal codes. If you look at what Cleversafe's advertising, their latency is a second for object retrieval or the first instantiation of an object retrieval versus WOS with ObjectAssure that's in the milli-second range. We can come up with exotic architectures that can save money across the Web. I don't think we will ever want to have to pull pieces of a file from five different data centers to do an assembly, because that isn't what our customers are about.
DF: You're putting yourself firmly in the milli-second response time category as opposed to the lower-cost, infrequently accessed data.
JD: I think you're dealing with fractional cost optimizations going from 8 plus 2 to say 20 plus 6 or something like that.
DF: It never ends up as just two, does it.
JD: With WOS it absolutely does. We don't even have any metadata within the system. So our system capacity replication rates, if you are just doing data replication, it is nearly 100% with two copies of the data. But if you use ObjectAssure its very brute force – 80% to start, but that's a soft limitation that we can adjust to met customer requirements. But we are not NetApp-style RAID 6 where you are burning 40% just to get out of the gate.
DF: Could you talk more about that? What is the equivalent system you have for protection locally or in one instance?
JD: It would be a 9900 platform locally which is also built on erasure coding. We've been shipping erasure coding for a dozen years now.
DF: I agree with your position. I expected that answer, and I'm comfortable with it. There are different marketplaces out there, and I agree with you completely that for example Cleversafe is very much an archiving environment. And that's not your heritage.
JD: We started active archiving and extend up to being the fastest storage system in the world, which just happens to be object store. We do have customers who use the product for archiving. There's incremental cost gain from moving away from erasure coding plus possible dispersal coding. If you look in the Web space you have Melinox right now burning up the Internet. These data centers are IO hungry, so they are all measured on response times & I don't think waiting a second for each HTML....
DF: Peace. I'm over that. You talked about the efficiency of your data protection. There's 2 levels of efficiency, one of which is having copies in two places, which we were discussing before, the various ways of doing that, and the second is what is the efficiency within the object itself. You said it is close to 100%. I just wanted to understand a little more about what your data protection philosophy is within one of your instances.
JD: There are 2 dimensions to that question. One is what is the file system overhead & the other is our data protection overhead. File system overhead is about 1%. That's basically space for things like internal system policies, some internal system metadata. But that is not inclusive of user-defined metadata, which could be – we have customers with Kbyte files with multi-Mbytes of metadata associated with it. We have customers doing all sorts of goofy things. If you move to Object Assure the first instantiation is 8+2, so it is 20% minus 1%. But that's just where we started. It's a big leap up from RAID 6 style data protection, because it is cluster and it is an object-based system, so we just have to rehydrate the data, rather than full-disk rebuilds.
DF: So you have a mechanism for spreading the data within each object and recovering – an XIV type of thing I presume.
JD: It's actually similar to that architecture. So if we lose a disk, we don't do a full disk rebuild. We just redistribute everything around the system.
BM: Slide 33: This is actually covered in one of the things we are introducing at the launch. This is our ObjectAssure single copy data protection based on erasure coding, which we're introducing for the first time on WOS. It is the first erasure code data protection for hyperscale, high performance, storage. The trade-off we made is the erasure coding is within the node for performance. Our customers drove this requirement. It enables a single-copy environment to reduce the cost of usable storage and improve the usable storage to raw storage ratio.
Slide 34: Some of the advantages: We only rebuild data, not whole disks, and we rehydrate data to all available resources. It's locally available, speeding access. That's what our customers wanted. That performance is the main differentiator vs. other approaches. The incredibly nice feature is you can mix-and-match policies. For instance you can replicate data that requires fast access and erasure-code data that requires less frequent access. That's all policy-driven within WOS. So it's tremendous flexibility, best of all worlds.
DF: So the erasure coding – is that within an object or cross objects? How is that split over several disks?
BM: Slide 35: That's actually the next slide. With ObjectShareWorks you put an object and that object is spread across multiple drives within a node. So its various parts of an object go off to parts of different drives. Then as you have a problem on one of the drives that part of the object is recreated on another drive based on erasure coding, and then you retrieve that object based on the reconstituted data.
DF: To repair it do you have to read it all?
BM: As soon as a problem is detected, it's repaired. So you don't have to read to create the repair process.
Slide 29: These are the key WOS 2.0 announcements we're making. We talked about the erasure coding already. On the top is the flexible cloud storage platform. We're providing S3 interfaces in addition to our WOS-native API that allows us to provide a complete platform for service providers & Web content companies to create their own service for their own end-users. That service can be differentiated from other things out there by the capabilities of WOS – geographical distribution, higher performance and other features that allow these targeted service providers to provide a better service, better SLAs, etc. We provide a multitenancy and billing support and an iPad and iPhone client access for drop-box files.
In addition we provide a NAS interface. Previously WOS has been an API-accessable product. The NAS interface allows instant integration of existing applications and greatly expand the market applicability of this product. We wrote it ourselves.
Then finally in addition to erasure coding we are introducing async replication. We write to a local WOS node, and then for that object to be compliant with the data protection policies it would have to be copied to a remote WOS node. We called that synchronous replication. That took time to copy and to respond back to the application saying this object is now in two places & therefore is compliant. What we introduced here is the ability to do that within a local node, creating a compliant local copy that increases performance. We then in the background do the remote copy and then eliminate the local copy.
Slide 30: Overview of the cloud storage platform. Again it provides our differentiated service based on better performance, cost robustness, and SLAs. It has multitenancy support, it is compatible with S3 interfaces and Web APIs, it has full CDMI compliance, it integrates with billing & provisioning systems, it takes advantage of WOS's geographic distribution capabilities, and it supports smart client access.
Slide 31: In addition to service providers there is great interest in enterprises to provide a ubiquitous storage platform. So all the drop-box applications, etc., that people use even in these 100,000+ employee companies with limited Exchange file size controls. This can all be brought in house now using the WOS platform to provide a storage cloud for ubiquitous access, again using the geographic capabilities and efficient management attributes.
DF: Most of the claims I've seen to provide a dropbox type capability miss the mark by a long way. Can you talk more about usability of this type of system?
JD: There's a few things. The client support is not as rounded out as we would like. There's no Android support, so basically its iPhone/iPad. Second in this first instantiation there's no sync. But if you read into that statement, we are trying to address that very quickly.
DF: Those are the kind of things that will drive people crazy if you try to move them from where they are to a ?? system.
JD: The nice thing about this is we do white label it, so if you are Joe's Hosting Company your customers would never see mention of DDN or WOS in your app store.
BM: Slide 32: Overview of our access capabilities. On the right the WOS Data Object Interface is what we had previously. That is our native API-accessible interface. Now we provide the S3 Web API in addition to all the other cloud storage management features listed there, and finally on the left we provide a WOS NAS interface with the NFS protocol. So we have a wide choice of access points into WOS now that we expect to greatly increase the applicability of this product and ease of bringing application onboard.
DF: What would dictate a user's choice between them?
JD: Our experience has the WOS native object interface, that type of user is developing a newer application that requires all the performance and features in WOS. He will write the application to the interface. That type of customer typically provides an integrated solution. For example in defense & intelligence & surveillance where they bundle WOS with satellite analysis and other applications. So a highly differentiated product system with major requirements for performance and scalability.
On the far left is another example, existing applications – NAS systems, etc. – that are tapped out. For instance in the healthcare space that has a lot of inertia in design & technology. If they are tapped out on their platform, they just want use that NFS interface and port everything to WOS. Then in the middle is the WOS cloud interface where people are developing new business models & revenue streams, the whole service provider space in particular based on S3 to provide a service to a third party. In addition to that enterprises that are looking about building private clouds. And we're engaging with all three. It's really exciting.
BM Slide 36: The one area we haven't talked about yet is asynchronous replication. We talked about how WOS uses multiple copies distributed distantly to provide DR. That has a hit on usable storage. And for a lot of customers synchronous replication is a performance issue. What asynchronous replication lets us do is write two copies locally to provide the data protection policies we are looking for an then replicate a remote copy in the background. This gives us much faster performance and allows us to provide better response time to these applications, especially with big files.
Slide 37: The performance characteristics of WOS are based on 2 areas: performance of reading & writing to a disk, which is much more efficient than SCSI systems. The entire system is an end-to-end object solution, so for example we only have one disk seek operation versus 10 or more seeks on a SCSI-based system. And we had that from day one.
JD: This is a ground-up decision we made in response to requirements from social networking sites who said, “We're throwing disks at a file system problem. Give us something better.” Out of that came an object-based model.
We need to be clear about object protocols vs. object disk architectures. Those are two layers of value proposition when you're talking about objects.
DF: Nobody sees the object here as far as you're concerned. It's just a more efficient way of doing it.
JD: Right. We lay everything down contiguously on disk up to 1 MB. Because we aren't using Linux file systems or SCSI or anything like that we can do in one disk stroke what, let's say Open Stack, would take as many as 10 to do.
DF: So what does the NFS interface do to that? How does that work.
JD: The NFS stores the files as objects or chunks of objects. There's a database that sits basically across the NFS cluster that has its own style and method of updating. Over time I would guess we will have to accelerate that with SSDs, but for now we aren't pitching NFS as the fastest of the interfaces.
DF: What is the benefit of it then?
JD: Well, take retail cloud service provider X. They know this is the direction they're going in, they know they need to start scaling up new applications with an OO interface, but they have legacy codes they want to bring along for the ride. And they want to play on one single, scalable, autonomously manageable infrastructure.
DF: So it's a legacy and a way to get from A to B.
JD: It's a legacy that's pretty tough to shake. One use case we're seeing – and we haven't yet committed to this – but imagine a Hadoop load via NFS whereas you map reduce natively.
BM: Slide 39: We're running low on time, so I'm going to skip forward to Slide 39. This is how the ?? is packaged and delivered. On the left is the WOS 6000, our high density package. That provides up to 2 Pbytes of storage in a rack. On the right is the WOS 1600, a smaller, more flexible system, allowing us to put different types of media in the package for different types of applications including SSDs.
Slide 40: Summary of WOS
Slide 41: Then I just wanted to use the last minutes we have to talk about some of the use cases. Slide 41 talks about geospacial intelligence. People are using the WOS API natively to develop systems for end-users. Another point about WOS is this ecosystem of integrators & partners are being built up. Some of those are shown. The other thing about this space is it is a typical hyperbolic data growth chart. And if you look at the far right that's actually a logarithmic chart. That's a tremendous amount of data to acquire & manage. We won some deals in this space competing against traditional architectures where we provided 2X orders of magnitude performance & reduced latency response time by 5X, with much lower costs. In this type of system some WOS nodes are in Asia where data is being acquired, and that data is replicated to the U.S. for mission planning & exploitation. Then that is sent back to Asia for action.
Slide 43: Another interesting opportunity. Here we have WOS in Asia, & the bandwidth of a fully-loaded T17? is taken advantage of here.
JD: Something important to see is we have a bunch of nodes here that are Ethernet connected vs a metro storage architecture that is all Infiniband, using a parallel file system. The benchmarking of the system defied all conceptions that the government had. They figured Infiniband's faster, it's lower latency, etc. At the end of the day the object disk architecture we have resulted in a much lower latency of request for files because we weren't dealing with Linux file systems. So that was an interesting one where we really surprised the government with basically a different deployment paradigm.
DF: So the SCSI overhead was really holding everything back.
JD: That plus metadata management for a system that's largely persistent storage overwhelms a brute-force file system.
DF: What about encryption?
JD: Eruces is “secure” spelled backwards. They are an encryption company we partnered with. They are basically the flavor de jour of some of these three letter agencies. It's object-level encryption, not block-level or something like that.
BM: Just to wrap up, a few more areas where we have experience:
Slide 44: IP surveillance on the cloud. As everyone knows, cameras are everywhere now. But the way they are deployed is a management-intense process using SCSI disk. We've been deploying systems with partners – this one is March Networks, another example of the WOS ecosystem. By using WOS in these thousands of camera deployments in cities & countries, we're able to reduce the cost of solution & total CCO by upwards of 50%.
Finally in the media & entertainment space, where we have a lot of experience, we have another partner that puts together content distribution networks to get content out to various locations around the world, both for production and services that need to pull stock footage and incorporate it in a package. And finally distributing content through various Netflix-like distribution networks.
So that's WOS. The last slide is a summary.
DF: So what is your traction in this space? Is it still hard work? Any large customers you can talk about?
JD: DoD is one we've announced. I'm chasing down four target customers we'd like to add to the release, and we'll see who signs up.
Aside from that the product is very interesting for us. It's transactionally not as high-running as some of our RAID products are, as you might expect. But where we hit, the projects are very large. The discussions are, “Okay, I'll try a Pbyte today, but this really needs 10 to 50 to 100 Pbytes that I need on the floor.”
So this is a discernible portion of our revenue now firing across all the three business units we have in place, and it's doing okay. It's basically tracking to where we would expect it to be.
DF: Very interesting. It might actually turn out to be real.
JD: A discernible portion of DDN's revenue is not a discernible portion of the overall storage industry trend. But it's getting real for us. The slide we had earlier about HPC combining with cloud, we have some real proposals into the government right now where they could decide to scrap everything and start from the ground up with an entirely new object storage paradigm. This is for programs where they're buying a half billion to one billion dollars in computers annually.
DF: That's very interesting indeed.
JD: So this notion started off with Posix is broken, and we realized we didn't need it. So WOS allows us to go where those organizations want us to be. We think we have a pretty good path.
The story that's most under-appreciated with WOS is why we've done it. We hate being lumped together with EMC or Cleversafe. What we position WOS to be is an application-oriented distributed hyperscale object store that just happens to have interfaces appropriate for meta-service providers. The front part of that story is one that nobody really grasps.
For organizations that have a notion of data persistence or immutability, I want to be crafting the discussion at the application level, not just be lumped into cloud storage or object storage. Ultimately we expose this very enabling, very efficient and self-managed product though a number of ways. One of them is appropriate for cloud storage.
DF: And the others are for large-scale applications where the overheads of existing architectures just get in the way?
JD: Exactly. For example, a mass-market Web site that is into mass marketing and deployed on Isolon for the longest time and the preponderance of their objects were 4K (?) in size. And like some of the other Web services I mentioned, they're basically hitting the wall to performance effectiveness not in aggregate but in a single object retrieval because they are doing things like sub-block storage management when the customer just wants an object as quick as they can possibly get it. And they don't ever want to change it. So that was an opportunity for us to say, “Who cares if we call this cloud or whatever. You have an application problem, and we have the solution.”
DF: Immutable big data is your sweet spot basically.
JD: I think so. Persistent big data.
DF: And in that space because you have streamlined ti so much you are able to provide it at much lower cost per access.
JD: That's a good metric actually. Cost per object retrieval or object placement.
DF: What's the lowest entry point here? Why doesn't it work for say 50 Tbytes?
JD: It does. The lowest entry point is basically one 16-drive node. There's a lot of labs, universities and stuff, that drop them in locally and then they're part of a consortium like the University of California system. Each of those campuses has a lab that purchased one, and they hook them up and have a shared access pool. It maps to the local organization's budgeting – I have to buy this amount of storage but the consortium can share all that storage. So it's both a technical & economic fit.
DF: Is there an analogy from that to departments in a large organization?
JD: Absolutely.
BM: So the individual departments would buy the nodes, out of their budgets. When you are talking about a group of departments, getting one cat to move in a particular direction is tough. Getting a herd of cats to move in one direction is really tough.
DF: That was my fear. Most things start small and grow. It is hard, historically, to go the other way around. That's why I was asking if you have a business case for going in small.
JD: The small use case for us is big data storage in most of the markets we are targeting with the exception of the enterprise. The meltdown in security drop box basically has opened a market opportunity as CIOs reconsider how they protect corporate data that sits outside their firewall. CIOs are looking for a way to pull that in, and what we are offering basically is a personal cloud storage appliance that you can put on your network that is totally active directory secure while providing relatively flexible ease of storage and retrieval.
In some ways it's our first enterprise product made for a part of the enterprise that's just now really starting to think about what to do about this problem. Before we put this into our system we had people using several different drop box services. So we can assume that every technology company has this problem. The competition is providers looking to shore up their security and befriend CIOs. That space is just starting to heat up.