Real Time Data Compression by Reducing NAS Costs

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Reducing NAS Costs with Real-time Data Compression

Currently 5/5 Stars.
1
2
3
4
5

rate this

Last Update: Nov 01, 2010 | 01:56

Viewed 9681 times | Community Rating: 5

Originating Author: Wikibon Daemon

We know this is a long piece, so if you’d like to watch the recorded webcast of this article please go to Reducing NAS Costs with Real-time Data Compression webcast Thank you)

DAVID: Hello and welcome to the IBM Real-time Compression Webcast. “Maximizing your storage investment with real-time compression.” My name is David Gitner and I will be your host for today’s event.

Our speakers for today’s Web cast are Steve Kenniston, Global Storage Efficency Evangelist at IBM Real-time Compression and John Power with the Systems Technology Group at IBM.

Steve and John will be discussing storage technologies directions today and new technologies heading into the future.

And now I’d like to turn it over to Steven and John who will take us through today’s presentation.

Steve?

STEVE: Thanks David, and I’m excited to be here with John Power from IBM. We want to talk to you today as David mentioned, as maximizing your storage investment with real-time compression.

So I realize that most storage vendors put up a number of different charts in front of you. They show a number of different things related to storage with regard to storage growth, capacity, cost and energy, all going up in the light. It’s no secret to us that all these things are costing us a lot money.

But what I find interesting about this chart is the chart in the centre of the graph. If we think about the amount of savings data deduplication has given us over the past three or four years in the backup space, and we think about how large the primary storage space is, it’s pretty clear that newer technology such as compression can have a dramatic effect inside of your environment, as a part of reducing the overall cost and infrastructure, managing growth, energy, and keeping more data online with regards to information mobility. And kind of help you address some of these tough challenges that you have had over the course of the past few years.

And John I’d like to maybe turn it over to you. How does IBM feel about this?

JOHN: Thanks Steve, you’ve clearly highlighted that businesses are facing the challenge of hyper data growth that’s fueled by a multitude of things. New applications, the spreading of these data generators more and more so today. New data consumers. I’m sure everyone in our conference today has one, two, maybe three smart phone-like devices on their belt or in their purse. And those are actually demands upon IT and upon storage and then retentional requirements that we’ve heard about for over the last ten years or more.

And so all these things are fueling the data growth, which as you look at this chart is the blue upper line. I think we’ll all agree based on the facts that you had in your previous slide and our own experiences in our own data centers and applications, that data is growing at a compound rate. Well over 62% and some cases 100% a year.

And as IT professionals managing and administering these solutions, we’re challenged to minimize the effects of Capex and Opex.

So let’s take a look at that on this chart here. So the capital acquisition work requirements are forever as given to an IT manager, asked to be held flat or to be reduced. So I doubt any of us today are given a larger IT budget to go out and manage the new application requirements that are thrown at us.

So Capex across the board in general agree that we’re going to say that’s probably held flat, and there’s some pressure to have that move up slightly.

Opex, though, is really where the differences can be made, and where the larger costs are incurred. The operating expenses, as we know, include more expensive functions today such as the power -- the physical power to bring the device on and the power to cool the device -- as well as the management and the maintenance and the software licensing on all these infrastructure solutions. So we know data’s growing, Opex is growing at about the same rate as the data’s growing, and Capex we think is holding reasonably flat.

A large number of IT analysts have pointed to data reduction as a way to manage this conundrum that we’re in, and we’ve seen in multiple vendor technology shipments, ways to manage that data reduction, through data deduplication, through thin provisioning, and for today’s topic - through compression.

Let’s take a look at what’s going on in the industry and the change in what I call the aerial density, and this is industry-wide. We’re now looking at the suppliers of the disk drive itself. So it’s not the NAS filers or the storage array. It’s the physical multiple components.

Historically from the mid-1950s, when disk was invented, the vendors were able to increase data density by 30% to 35% a year. This was great news. It gave the vendors, a means for profit that could be re-invested into functions on the platforms while driving down costs to pass that value on to their customers.

So in the 1990’s in the beginning of the new decade, we had great improvements in the ways in which we could compact data onto the small physical disk drive, and actually part of that was just going to the smaller form factor. So we went through a number of transitions, and we’re going to go through those. Those allowed us to actually improve the rate at which we could compact data, thus giving you a way to reduce what looked like your storage capacity footprint. It gave you a way to keep up with that data growth, and we could keep costs down as we move forward.

That's all changed. In the decade we are entering, we now are on a new curve that has flattened so that improvement is actually slower than the beginning of this industry, almost 50 years ago. So the aerial density, the increase in the amount of data that we can pack onto a physical disk drive today, is actually dropping. It’s down to as low as 25% on a year-to-year basis: the rate at which we can improve the compaction of the data.

So looking forward, the vendors must find a new means to deliver value to reduce that acquisition cost and improve the operational cost of managing that data.

So let’s take a look, a little bit further, a little bit deeper into what are some of the challenges that all of us in the industry are facing at the filer and the disk level.

And so here we see maybe a little confusing chart, I’ll walk you through it. We’re going to take a look at drive access time, and what I want you to take away is that drive access times are improving, but they’re improving slowly - only at a rate of about 5% to 8% a year. And this would be forecast I would say in my opinion for at least the next 10 years. But you’re not going to see a radical change in what we could do in the technology or what we could do with the manufacturing specifications.

We’re going to leave outside of this for a minute. Someone I’m sure is scratching their head and thinking about 08:26.4 disk. Good question, 08:28.7 disks will become a new tier in the search hierarchy that we will be able to exploit, but as you’re well aware it has a cost associated with it. And in the storage world it’s always a balance of price, performance and cost.

So back again to the HDD, the Hard Disk Drive. What we’re faced with is can we improve the access time? Can we improve the seek time? Can we improve the RPM? And using those three levers are we able to get back to that more traditional curve of reducing the cost of storage and packing more data onto the disk. And the answer is no. As we look at the trends for access, RPM and seek time, seeing how quickly we can move the arms to the right track.

The access time being, think of the logic and the time it takes us to get to the logic and the buffers and the software overhead stack managing the disk and then the RPM probably the one component that most of us think of. On the next chart, the performance capacity chart you’ll see RPM highlighted here as the means by which all of us in the industry have actually moved forward, as we tried to and we’re successfully putting more data onto the platter, we were then faced with the challenge of maintaining performance and either writing the data on or getting the data back off at a reasonable rate. And so increasing the amount of data that flies under the head was an easy means let’s say of achieving that goal.

And so we, if you think back now you know, twelve years or so, started out with the industry small disk form factors spinning at about 5400 rotations per minute. And went from generation to generation. 72 and 10K, and 15K. I would forecast for you that you probably won’t see in the next ten years a 20K or 25K RPM drive. Not that it couldn’t be manufactured by disk drive manufacturers, but that the general market place is not going to demand that. The enterprise array disk vendors and filers would love that, but we need to ride on the curve of the disk drive technology that’s made available to the general market for cost reasons. So there won’t be produced in the market in my view, higher RPM drives in the near future.

So we are out of ideas in the industry to radically improve the seek time. To radically improve the rotation, the RPM. Or to radically improve the logic to allow for access. We’ve run through the cost take downs of billing from fiber channel to theta. From theta to theta, from theta to 11:14.2.

We’ve used thin provisioning as a means of reducing the amount of which the capacity we need to allocate and we need to look for something new at this point.

And so, essentially we are losing ground in the performance capacity curve. The dollars per gigabyte that we can extract value out of the storage sub-systems. And so as we look forward, the one key area that you might say has been around for quite some time, compression comes to the forefront. The trick with compression has been and will continue to be, can you compress data and can you compress it reliably and deliver performance that is not degradate that which you would expect from a primary storage subsystem. So IBM Real-time Compression has brought this to the market and has figured that out. IBM Real-time Compression has a way with real-time data compression. To compress our data on the way to a primary storage device and so Steve why don’t you help me understand and take us through your company and how you’ve done that?

STEVE: John thank you very much and thanks for enlightening us. I think a lot of folks are so close to the solution that sometimes they don’t step back and look at some of the individual components and the picture that you painted with regard to the individual drives really highlights some interesting areas within the storage industry. So I thank you for that picture.

John also points out there are two primary characteristics why end-users buy their storage platforms today: availability and performance. What IBM Real-time Compression's core values are really all about is how do we optimize your storage environment without any compromise to the existing characteristics for which you buy that storage. So I want to take you through that today.

Our company was founded in 2004, our headquarters are based in Marlborough, Massachusetts, we have offices around the world and our technology – our thirty five patents that are either pending or issued are all around our Random Access Compression Engine or our RACE Engine. And interestingly enough our technology, our intellectual property is not wrapped around compression in and of itself. We leverage industry standard LG compression technology that’s been around for years. Our secrets are, how do we take technology and to Johns point, make it both real time and Random Access such that we don’t sacrifice any of the characteristics for which you require your existing storage today.

So if you take a look at our core values and what an environment would look like for IBM Real-time Compression we believe that we want to maximize your existing, if we’re going to do compression, maximize that compression and we can do that anywhere between 50% and 90%, depending on the data type. We want to ensure that there is no performance degradation. We want to ensure that your environment with us involved is still fully transparent. The hardest thing to change in IT is not technology, it’s process. So we want to make sure that not only is your performance, that there’s no degradation in performance, but you don’t have to change anything in your applications, you servers, your networks or your storage and even more importantly, no change to your existing back end processes.

So how do you plug in seamlessly into your environment, and finally we’ll talk about high availability. Our dedicated appliance and the fact that we run today in a CIFS and NFS environment. So the big question that folks always say is, “Well if you’re an appliance that sits before my storage, right, you must be slow. There must be some performance impact”. What we’ll try to demonstrate here that there is no performance impact.

So as the IBM Real-time Compression appliance sits in front of your storage array. It does a compression before it reaches your actual device and because we’re able to do that, there are a number of key benefits that you get because we’re doing compression before the actual array.

So as John pointed out right, one of the slowest moving parts in your existing storage array today is the actuator arm that actually moves to go get data and do the reads and writes of the data. Because we’re compressing the data before we actually get to your storage device, the amount of movement from that disk arm is actually smaller. Your I/O is actually smaller, which increases or which actually removes CPU cycles allowing your CPU availability to do more work and whatever the compression ratio is you receive on disk. So let’s say like standard Microsoft or home directory environments you get 4 to 1 compression. You’re going to get 4X larger cache in your storage cache. So any of the work that is done in the IBM Real-time Compression appliance to perform the compression capability is more than made up for, by all the downstream benefits by compressing your data before you actually get to the storage.

Again, larger cache, less I/O, less movement in your disk arm allowing you to have more CPU cycles in order to do more work on your data.

And we’ve actually validated this with IBM in the lab and John is going to talk a bit more about the tests that we performed that validate the fact that we were able to maintain performance while compressing data in real time.

So let’s take a quick look at the implementation and the installation of a IBM Real-time Compression appliance. In most enterprise environments your storage is configured in a high availability configuration, so what would happen during an implementation of the IBM Real-time Compression technology is we would flip over the wire and install the IBM Real-time Compression appliance behind your storage switch, in front of your existing storage. We would put the wire back passing through the IBM Real-time Compression appliance and do the same thing for the other side of your storage and then flip the wire back again.

So now IBM Real-time Compression is installed between your storage switch and your existing storage arrays and before we actually perform any compression techniques, we actually have customers continue to write data directly to their storage to ensure that they understand that we really are seamless and transparent within the environment.

Once we’ve done that and performed a series of fail over tests to ensure that failover works exactly as it would as if we were not in the wire, we then begin to perform compression and we, anywhere between 50% and 90% compression capability with our technology. Again depending on the different data types.

In addition to compressing data in real time and on the fly, we also provide a technology called Compression Accelerator that allows you to reach into your existing storage and compress the data that is there. So its one thing to be installed in line and start compressing data on the fly, but where you end up getting the real return on your investment and the maximum bang for your buck once we’re installed, is the ability to go in and compress the data that already exists giving you a lot more capacity to use for the upcoming future and perhaps defer some of your initial storage or your next storage costs.

We also talked about how IBM Real-time Compression saves your data throughout or saves you in your Capex as well as Opex costs throughout the life cycle of that data. So not only do we perform data compression in your primary and your replicated disaster recovery or remote offices by, through the 10X. Additionally wherever that data is copied, again through replication, you secondary storage or to your archive storage. We’re additionally saving data both in the data transmission costs, as well as allowing you to keep to the 10X, more information online in your active archive and this really hits home with folks in regulatory environments where they need to be able to get at data very quickly.

In addition, because we perform our data compression using random access technology, we actually have the ability to add benefits to even downstream data deduplication capabilities in the nature of two to 10X.

So we fully understand that we don’t want to impact any of your downstream processes and because of our random access nature in which we do compression, we are actually complimentary to downstream processes in the back up environment. Especially in data deduplication where a lot of money gets spent, been spent over the last few years and you can continue to leverage that technology with the IBM Real-time Compression technology.

So what I’d like to do is pass it back to John because again as we talked about performance is always king in the storage world. A little bit about the validation report that we did and how we don’t impact performance. John?

JOHN: Thanks Steve. Truly innovative technology real time compression in front of primary storage. I know it can deliver tremendous value. I’ve learned that from speaking with many, many of your customers around the world. But being inquisitive as an IBM’er I wanted to look under the covers and see how he did that, so the two of us IBM Real-time Compression and IBM went into the laboratory to do just that.

So this charts a little bit of a template to let you know what the next ten or so will look like. There’ll be some diagrams and some supporting bullets for you. What I want to do is be able to paint a picture for you in each case, of the lab configuration that was laid out and you’ll see the specifics. I may not go through all that detail with you but you’ll know the exact release start level of the software, you’ll know the switches, the storage infrastructure, the code level of the IBM Real-time Compression STN. So we’ll pass on to you the details, but you’ll get a high level picture of the configuration and then the key salient points that we want to make.

And so we’re going to go through a discussion of a sampling of the topics that are outlined in the Whitepaper that Steve’s going to talk about a little bit later. Steve has already covered for you how the IBM Real-time Compression SPN can go in and be fully transparent, that he has tools, graphically user interfaces and command lines to manage this. So ease of installation, high availability and ease of management: Steve’s covered mostly for you.

I’m going to jump to the compression rates and these bold claims that Steve has laid out and determine if those were realized in the lab, as we saw in customer environments and what performance impacts may have been observed in that. The impact on the application itself and the impact within the storage sub systems. So specifically we’re going to have only time today to take a look at a number of the tests we ran but I assure you all of the tests are outlined in the Whitepaper that Steve can point you to later. So the performance environments we’re going to look at are the copy test and the TPC-C database like environments.

Let’s jump forward and get started with the claims and the description of what we want to take a look at here. So we need to go out and validate IBM and IBM Real-time Compression for you that these compression ratios of 50-90% can be achieved across multiple workload types.

We will for discussion today, take a look at data base environments or well TP like data base environments. We’ll take a look at server virtualized environments. We’ll take a look at what exists in everyone’s environment: your back office data. And we’ll take a look at in this case, one specific industry sub-sector, some CAD CAM files. I assure you that we’ve also looked at many, many other data types and customer references and performance benchmark information is available on them as well. These look like a good general category that I think will encompass the audience that I think we have here today Steve.

So let’s take a look then at how we’re going to verify this as we jump forward. So the compression rate, essentially here I’m going to teach you, I’m going to give you the answer first and then I’m going to prove to you that we actually saw this in the laboratory. In the data base environment this was an Oracle data base running the TPC-C Benchmark so industry standard benchmark, industry recognized robust data base. We achieved with the FTN real time compression appliance in front of the filer 89% compression: an outstanding compression ratio. Let me translate that for a couple of you on the phone.

89% compression is 10X compression. You could acquire ten times less storage or if you compressed your existing database you would realize ten times more storage for growth of your application data. What a tremendous value if you would be the one to implement that in your architecture in your IT data centre.

In the VMware virtualized server environment on EMDK files depending on the work load we achieved 73% to 93% compression again 73%, lets roughly translate that to four times improvement in the amount of capacity, the amount of data that you could write to the same existing storage array and 90% again, a ten X improvement.

For all that back office data we achieved a 64% compression ratio. So we’re well over a 2X. We’re 2 ½ X value there. And then the specific case of looking at CAD CAM files for specific file data set that we had using copy commands. We achieved in that case 46% compression. Looking at general industry files of that nature, we see another benchmark result getting up to 60% compression.

So let’s take a look then and prove to you how we got to these. First in the VMware environment with the effects files we wanted to validate that we didn’t have to change the configuration. That we could actually hold or improve the performance of the application writing data and reading data to the primary storage device and we could do this with reduced affect upon the storage array. In other words would we in any way encumber the storage sub-system or could we show that we actually improve the functionality of the existing axis of the floor. And you’ll see we’ve done that in all three cases here.

So go ahead and jump to the next slide for the viewing audience here and let me just walk you through the picture. We’re going to set up an environment that is 2 VMware environment and we’re going to run one of the tests in uncompressed form and so for all these tests essentially, we’re going to run them 1X 26:56.2 without the SPN real time compression appliance in the path and then we’re going to run the compression routine again this time using the SPN real time compression appliance. It’s going to be a standard “copy file” command that we’re going to issue simultaneously across ten clients and we’re going to record the utilization each time we do that.

And so let’s see what happens when we take a look at that. What we realized was approximately a halving of the copy time. So as we take a look at the VMware environment and the…what I would call the wall clock time, we can improve the copy time from a hundred and four seconds down to fifty four seconds. So we can execute the operation more quickly.

Even, more important if you take a look at the two charts below, we can execute the operation more efficiently in the storage sub systems because we compressed the data before it got there, in the SPN real time compression appliance. So the bottom left, the filer CPU utilization graph, if you take a look at that one at the moment you’ll see on the Y-axis the virtual axis there, either CPU utilization percentage rate and you’ll see highlighted in the red line that is the run without the compression appliance and then when we insert into the data path the STN real time compression, you’ll see the blue line that’s been plotted.

So what this is telling us that we have demonstrated efficiency in the filer, in the utilization of the filer CPU. So we’re using less CPU cycles because we’ve essentially less data imported to compute upon. When we look at the disk utilization we see the same thing. That without compression the red line we are actually using a greater filer to disk utilization and it’s taking us longer, longer being the X-axis (the X-axis down there on the bottom).

So we can execute the right functions more efficiently and more quickly because we compress the data before it got to the filer. So the nett here was, we could actually execute the copy operation more quickly and we could do it more efficiently, showing better CPU and better filer utilization. So that’s a VMware example of delivering value not only to you and the time it took to conduct the operation. And to you in the amount of space required but actually getting more out of the existing storage sub system.

So let’s jump forward again here and take a look at the next environment this being the online transaction processing environment, the industry standard TPC-C Benchmark it’s an Oracle data base.

So many of you probably run in your environment these transaction processing performance council benchmarks so I want to assure you it is industry vanilla benchmark that was brought down from the website and executed. There is no unique configuration or some sleight of hand that said we’ve got a benchmark that makes things look good. Industry standard benchmark executed against the configuration that you see there below you. And so let’s take a look at how well we did in this environment. Can we deliver compressed data and can we actually do it with equal or better performance and at the same time hold efficiency in the back door.

And so jump forward I we’ll take a look then at the results of this benchmark. What we’ll see here for the Oracle TPC-C on the left chart on the Y-axis there, you will see the…trying to read my own chart here. The response time in seconds. And in this case as you read this performance graph small is good news. And so IBM Real-time Compression STN real time compression is again in blue and without compression has been noted in red.

So for each one of the TPC Benchmark like workloads, you’ll see that when we run the routine without compressed data you get the red bars indicating that it’s taking, we are getting worse response time for each one of those transaction types. When we run the benchmark with the SPN real time compression in the data path we achieve better response times and significantly better response time.

Taking a look just at the first stock level of that benchmark you’ll see we go from a little over ten seconds down to just about two seconds. The ratio’s change slightly on the different workloads as we ripple through that TPC-C benchmark.

Another metric that we can derive from this benchmark is the transactions per minute; so we’ve shown in the left chart that we can improve response time by compressing the data. Looking at the bottom right hand chart what I would like to show you is that we can improve the number of transactions, the throughput that we can drive through the filer by driving compressed data into it and so here we’ve seen improvement.

In this case, okay these are performance charts so you have to always think you know which way does the curve go? Is the histogram supposed to be big or smaller? In this case bigger is better. Bigger means I’ve got more work done in a shorter amount of time. So I was able to achieve what’s that, about 970 TPMC transactions for those seven hundred users in the same amount of time when it was compressed versus if I was not compressing, I could do fewer transactions. The red histogram on the right, down 920, so I got better response time and more work done by compressing the data.

What a statement of value to the end user and statement of value to the IT assets [00:33:15] that you’ve already been in place.

We could jump forward and take a look at were we actually doing that efficiently in the storage sub systems and we sure again can hear that, that is the case. That the blue line being the SPN real time compression in both cases when we look at filer CPU utilization and filer disk utilization the blue line, down is good here…is actually showing we more efficiently were using the processor in the backend write and read function within the filer. So we can demonstrate value to the end-user, more work done in a shorter amount of time and we can demonstrate value of efficiency in the storage hierarchy utilizing the assets more efficiently with compressed data.

Steve those are the two I have time to take a look at. I know that in the Whitepaper we described the many other tests that we did together in the laboratory and I’m going to let you talk about how do we achieve these? What part of the portfolio family do we apply to these and how can I learn more information about how to find this value?

STEVE: Sure thing John and first let me just say, we really value our partnership with IBM, we see this test performance in the field pretty much on a daily basis in trying to drive value to our customers. But it means a lot to us that IBM was able to come in and help us substantiate some of these and as John pointed out you can find this and more information in our Whitepaper which I believe at the end of this call or this Webinar, Dave Gitner will explain exactly how to get your hands on it.

But if you want to get your hands on a product to test, this slide reflects how you would do that. So let me kind of explain it. On the far left we have our STN and STN is our IBM Real-time Compression product appliance to deliver in the field. Each one of our appliances is an IBM to use, server with Nehalem processors and the interesting thing about our technology is that we do not charge on a per terabyte basis today.

What we do is we believe that customers have already architected their existing storage environment or storage solution based on the throughput that they’re looking to achieve today. So what happens is when you take a IBM Real-time Compression appliance and put it in front of your storage, you unplug the connections going into your storage, plug into the IBM Real-time Compression appliance and come back out of the IBM Real-time Compression appliance and plug back into your storage.

Now in theory and given John’s tests and our tests in the lab, you should actually be able to see a performance increase without having to change your existing configuration, because you’re going through a IBM Real-time Compression appliance. We deploy in high availability pairs and let you see in our STN six thousand line is what we like to call our small, medium and large variations and its really again reflective off your throughput that you desire in your existing environment. So we took a step back and probed a number of our customers because we’ve seen such a performance impact to the higher end of our technology.

Let’s take a step back and ask our customers what, you know “Why do you really buy our technology?” and the number one answer from all of our customers really came down to the immediacy of the ROI they were able to achieve with the IBM Real-time Compression technology.

So in a quick customer example, on the left hand side you see kind of the text that outlines what you see on the right hand side. On the right hand side you see the ROI calculator which is free for anyone to go to the IBM Real-time Compression website and click on the capacity or the predictive modeling tools for an ROI for your environment and just plug in some basic numbers and see what you would save.

But for this particular example we had a customer who was at a hundred terabytes of capacity. There were approximately 90% full and you can see the different data types that they were storing in that particular environment and at an average price per terabyte of about five thousand dollars for this customer with an immediate need of thirty terabytes they were looking at spending about a one hundred and fifty thousand dollars right away for existing storage.

So I came in and had a couple of conversations with them about their existing storage environment and storage needs and because we stayed in the system NFS environment and that was predominantly where they needed to add capacity; we talked to them about what their throughput requirements were and selected a particular device. For a pair of those devices it was approximately sixty-two thousand dollars.

So for a sixty-two thousand dollar investment if you look back over to the right, you can see we were able to compress their initial storage on day one more than fifty percent. And over the course of the next three years we could defer them having to buy new storage because they already had the existing capacity. If you look at the red line, the red line shows what they would have had to of purchased over the course of the next three years with the 30% storage growth rate. So you can see the immediate savings and then the savings over the course of three years. And in this case almost half a million dollars for our customer.

So let’s get started, right. Where would you get started and I mentioned before you would want to go to the predictive modeling tool on the website. Get a feel and an understanding of what we might save you and when you jump into that tool you can see it is fairly simple. We ask you some basic questions about your different capacities and the average compression that you would probably get for those particular environments and then at the bottom an overall average waiting to decide what the value would be for you.

In addition, if you would like to see some more information about your particular environment or that might relate to your particular environment.

If you go to the resource center on our website, we have a number of Whitepapers, customer case studies, analyst reviews, some folks who’ve tested our product in the lab, in addition to IBM and our solutions section.

So with that I think what I want to do is turn it back over to David Gitner who can tell you a little bit more about how to get more information.

DAVID: Thank you Steve and thank you John very much for that terrific discussion. I’d like to say two things. Number one there are a number of places that you can go for more information. Starting with the very top of that slide is Steve Kenniston's email address for any additional questions you may have.

But in addition to that there is a lot of information and resources available at IBM Real-time Compression. Steve mentioned a few of them there - the URL for the TCO calculator that you can try out and put in specifics about your environment. There’s a lot of product information in the Resource Center as well as the specific, all of the results from the testing that John discussed in today’s presentation is also available, and that’s the URL where it’s available at.

What I’m going to do is leave this slide up while we take questions from you the audience. So again just to remind you to ask a question please feel free to type in your question on the lower right hand side on the Q&A box and hit the send button. So I’ll give you a few seconds to type in your questions, I’ll leave this slide up for the moment and look we have a question already.

The first question Steve and John is about the Oracle database. The reviews during the testing and the questioner wants to know was it file system based or was it live disk based in terms of the testing?

STEVE: I believe it was file system based. I’m going back to a couple months ago when we ran it. But file system based.

JOHN: I believe that is true. It was a file system based test.

DAVID: Another question. This questioner wants to know if they need the IBM Real-time Compression clients to access all of their data. How does that work Steve in terms of…do you need the IBM Real-time Compression appliance in place to always access your data?

STEVE: That’s a good question. Sure, what’s interesting about the transparency of the appliance was that appliance sits between the application and the storage. So what the application actually sees is what would look like a file as if it were a regular file that was stored on disk.

So if you had a one megabyte or one gigabyte Power Point presentation you would see that on disk it had a one megabyte Power Point presentation from the Power Point application level. Stored on disk you might have I’m sorry one gigabyte. On disk it might show 500 megabytes of…you might only be utilizing 500 megabytes of space.

So in order to be able to be able to see the data or read the data that’s compressed on disk you do need to have a IBM Real-time Compression appliance that sits in front of your storage that’s been compressed with the IBM Real-time Compression technology. This is why we deploy in high availability pairs should anything happen to the network or the given storage system, we have the ability to flip over or fill over the wire and be able to still have access to your data.

In the event that you had some type of catastrophic failure in your environment and you pulled your storage out and wanted to read that data in another location we actually have a tool on our website called “Revert” that allows you to be able to pull data out of the existing storage from compressed format in order to read it again. That’s a valid question.

DAVID: Okay great. Thank you very much for that answer Steve. Again if you have questions please feel free to type them in, in the lower right hand side of your screen and we will get to them.

Questioner wants to know something about, “You use the term “real time compression” and this questioners asking how is that different from you know what we think of as compression what we thought of as compression up to this point. How is real time compression different?

STEVE: Another great question. So when I think of real time compression, a lot of folks will say “what’s the converse to real time compression?” And the converse of the real time compression is what I call post process compression. And the easiest way to think about this is with a simple tool called WinZip. If I zip up a word document on my desktop and I point Microsoft Word at that zip file I can’t read that file. So in order to use that file I need to uncompress it in order to use it. That means I actually physically need the disk space there in order to be able to have access to the data.

So traditional technologies that might compress data in order to use that data, leveraging traditional compression, I have to compress and uncompress. That’s why you find the a lot of the large vendors they do this by policy. Once data hits this particular age the assumption is its stale or old …compress that data. Then if someone needs it, you need to get IT involved in order to rehydrate or use that data.

With IBM Real-time Compression what we do is we compress that data before it actually gets to the disk. We have no right cache in our device so we preserve the high availability given by the storage vendors’ product. Once the data hits the device it’s compressed and there’s no need to uncompress on disk.

So if you think about an IT persons thought process they will always want to plan for a worst case scenario, I might store the data on disk then compress it, but if I have to have that capacity there to uncompress that data I actually not really saving any disk space. Because IBM Real-time Compression compresses on the fly in our device before it hits the storage or on the way out of the storage -back to the application - you actually don’t need that extra physical disk space so there’s quite a savings on physical disk. That’s a big difference between real time compression and traditional compression.

DAVID: Okay that’s a great answer. It clarified it for me that’s for sure. I think we have time for one more question.

STEVE: Actually David, I see there’s a question. The D Dupe on the N series work with compressed data? And I hear that a lot so I’d like to take that if it’s possible.

DAVID: Okay. Please feel free.

STEVE: Sure. So one of the things that we see with some of our customers and we really feel that deduplication has its biggest bang for the buck in back up environments. And the reason is because you continuously send data through the back up appliance, from the back up appliance to your D Dupe disk platform and it sees a lot of this repetition day over day.

The interesting thing is we have done a number of tests and you can get some of these tests…specific to, not for the N series but for data domain as an example. Because the way IBM Real-time Compression does its compression through what we call Random Access Compression, we actually preserve the compression ratios for deduplication.

So if I have to read or write or edit a file I actually only need to pull out the segment of the file. This is the random access nature of the technology that needs to be updated and update only that segment. So much like 47:32.6 deduplication, really only have to in the deduplicated format the changed block. That’s exactly how IBM Real-time Compression does its compression so we are very complimentary to deduplication. Both on the N series and with other back up technologies.

DAVID: Okay great. Again if you have any other questions, feel free to submit them at this time.

I think we are out of questions and basically even time.

I want to thank John Power and Steve Kenniston for this their participation in this Webcast and again for more information, there’s a lot of URL on the screen this time, where you can find more data around IBM Real-time Compression products as well as the testing procedures to prove that performance benefits can be deemed.

I want to thank everyone for their participation for today’s event and have a great day!

Thanks.

Thanks Steve!

Audio Length: 00:48:53

Word Count: 7801

Comments on 'Reducing NAS Costs with Real-time Data Compression'

There are currently no comments. Be the first!

Revision ID	Author	Timestamp	Comment
31259	Wikibon Daemon	10 Nov 01 13:56:20
31258	Wikibon Daemon	10 Nov 01 13:53:38
31257	Wikibon Daemon	10 Nov 01 13:24:29
28674	Wikibon Daemon	10 Apr 12 14:24:15
28234	Wikibon	10 Mar 15 08:05:42
28233	Wikibon	10 Mar 15 08:03:50
28228	Bert Latamore	10 Mar 14 11:30:02
28164	Wikibon Daemon	10 Mar 12 15:23:09
28163	Wikibon Daemon	10 Mar 12 15:21:44	Created page with '(We know this is a long piece, so if you’d like to watch the recorded webcast of this article please go to [http://storwize.com/IBM_Webcast_Feb2010.asp?d=Storwize%20homepage%20...'

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Comments on 'Reducing NAS Costs with Real-time Data Compression'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information