Ed. Note: This is a transcription of an interview of Records Management Consultant and Lawyer Randolph Kahn of Kahn Consulting Inc. in the Cube with Wikibon Co-founder David Vellante.
DV: Information, is it an asset or a liability? That depends on who you are talking to in an organization. With today's trends in Big Data, most organizations are looking at Big Data as an opportunity. At the same time most IT organizations are struggling with data growth & flat budgets. So how do you reconcile the opportunity of Big Data with the challenges of data growth and the implications to the organization in terms of risk management? We're here today with Randolph Kahn, who is the principal of Kahn Consulting. Randy is an expert on legal, compliance, and policy issues related to business information, electronic records, & information technology. Randy, welcome to the Cube.
RK: Thanks so much, Dave.
DV: So we're here today to talk about a new book that Randy is in the process of writing – should be out in a couple of months – called Chucking the Daisies, How Companies Deal with Big Data. So first congratulations on almost having the book done. And this is, I think you said, your sixth book, which is fantastic. So tell us about the book. What's the premise behind it?
RK: Just the way you started the conversation, this idea that information is an asset. Well, that's true. But if organizations cannot actually find the stuff, it's no longer an asset. It's a liability, it's a cost, it's a pile of risk, it's a pile of inconvenience, it's a pile of inefficiency. So Chucking Daisies is a bible for IT professionals with simple rules that say you IT professional shall do the following things to manage information and right size your information footprint, keep the stuff that you need to keep, get rid of the crud. It's really a book to help IT professionals to walk through that problem. I have this big pile of stuff, what do I keep, what do I get rid of in a legally defensible way? That's really what Chucking Daisies is all about.
DV: So where'd you get the name?
RK: If you think about a flower, right. We're going into fall, the end of the growing life-cycle. These beautiful leaves, that's just a couple of weeks away from looking like death. Take a look at the life-cycle of information, right? That daisy in spring looks beautiful. It's grabbing sun from the sky & it's crisp, and the crimson leaves are brilliant. In the course of its short little life the value of that thing declines.
I don't think most IT professionals think of information as having a life-cycle. In fact if you look at most organizations today, most folks are keeping everything without regard for what it is. It's costing them huge amounts of dough, right?
So this idea of the daisy, this idea of chucking the daisy at the end of its useful life, when its value has declined, when it's no longer attractive. That's the same with information. At the end of the information life-cycle, the idea of parking it somewhere in a repository, a shared drive, it's an expense. It's a liability. And I think IT professionals need to understand what's in that life-cycle.
DV: You also use Big Data in the title, and as I was saying upfront a lot of people are looking at Big Data as an opportunity, and many organizations don't want to throw away that data because there might be some diamond in the rough that they can find years down the road. Are you suggesting that that's the wrong strategy, & can you give us some insight in that regard?
RK: Absolutely. So the idea of Big Data is that you have this pool of information that you're able to harness to abstract the trends to understand your business in a deeper, more significant kind of way so that you can be more efficient in business & you can also plan the future.
The problem is that for the last 2 or 3 decades organizations have been applying information technology in systems to all kinds of business processes. Today everything is electronic, & for every system there's electronic information. The pool grows & grows & grows & grows, & at some point big companies have hundreds of Tbytes of extraneous stuff, or even Pbytes of extraneous stuff. So Big Data's this idea that I can take tools & I can understand my business and be a more efficient business if I can harness that stuff. Well, actually most businesses today have so much stuff, and don't have the tools that they need, that that pile of data is wholly underutilized.
They can't find the records they need. Organizations today regularly have to recreate information because they can't find the stuff that they have. Litigation response tells us over & over & over again that companies don't have their act together. So I would tell you that harnessing their data in a Big Data context is aspirational for most organizations. I would say ubiquitous mismanagement is much more the theme of the day.
DV: So, Randy, when the Federal Rules of Civil Procedure in 2006 came to the fore, the general counsel in most organizations had a lot of power to essentially dictate the policies of organizations in terms of records retention and deleting data & the like. Have they not been successful in your view, & is the marketing pendulum swinging back to Big Data becoming this opportunity? And what risks does that pose to organizations?
RK: So you asked a whole bunch of stuff. Let me start with the general counsel, & let me also address records management. So the records managers are not going to what to hear this, & if you've heard me speak before you've heard me say it before. But this is the reality. Records management for most organizations is a total, utter failure. The belief about records management is: I will create rules that will tell me how long I am going to keep stuff. And then at the end of its useful life that stuff will go away. Well I've got to tell you something. For most big organizations today that stuff is not going away. Those rules, if they exist, are not being applied to the vast majority of electronic content. The rules are way too complex. Policies are way to complex. So to the extent that there was a huge push for records management, & I have a consulting business that is nothing but records management day in & day out. And I can tell you for most organizations, even big ones, it's not been particularly successful.
DV: So what qualifies you to talk about this topic, & what can an IT executive learn from a lawyer who writes books with interesting names?
RK: I've spent about 20 years in the information management space, helping businesses & governmental agencies get their information management act together. Last year I started a business called Delve, & what Delve does is help big organizations operationalize that chucking daisies thing. We have hundreds of Tbytes of stuff, we don't know what to do with it, it grows unfettered, & at some point we need to get rid of that stuff. So what Delve is doing as a new business actually quite successfully is going into big businesses & helping them in a legally defensible way clean their storage bins of the old information that doesn't have any legal reason to be around, doesn't have any business utility to be around, it's just digital junk, right. So we're going to help them get rid of it.
Beyond that, I've been thinking about this problem for many years. I started my career as a litigator. If you look at most businesses today – and I'd say the litigation response problem is a symptom of this – most large organizations have failed to manage information. Most of them are not following retention schedules for the vast majority of electronic stuff, & when a law suit happens or an investigation happens, the idea of finding everything and anything that's potentially relevant, not only is it a financial burden & expense & a gargantuan pain in the butt, but in real life the thought that they could grab anything & everything that's potentially relevant – if you think about the LANs, the WANs, the SANs, all the places that are potentially where information can be parked today – the idea that they can be successful, the idea of finding a record when you need it for business purposes, and being successful in that.... To the heart of your question, I spent my life helping organizations figure this problem out, and I just think about it slightly more than the next guy.
DV: Randy, talk about why it is so hard to chuck daisies.
RK: That's a really interesting question. A couple of things are driving organizations to the wrong place. The first is this fallacious belief that storage is cheap. Oh, don't worry about it. Storage is cheap. And I hear this with clients every single day. “Why should I worry about this, storage is cheap. We'll just keep dumping this in the storage parking lot, & who cares.”
Well, here's the deal. In most organizations information is growing at 20%-50% per year. The actual storage cost is going down slightly every year. In real dollars what they are spending to store stuff is so much more this year than last year. The fact is storage is not cheap. Storage is a huge cost. I'll give you an example. We're working on project right now for a large insurance company. We're helping them chuck the daisies, right-size their information footprint in a legally defensible way. Doing that project they saved millions & millions of dollars a year just on storage savings on a net basis. So the ROI makes sense, Delve makes really rapid sense.
So the first thing you have to understand is storage is not cheap. The second thing is nobody owns the information any more. I can say, “It's my repository, it's my box.” It's not my information. I don't know what to do with the stuff inside there. The lawyers say, 'We have licenses, we have investigations. Be very hesitant about what you do.”
All of a sudden between the storage is cheap and it's not my problem what's in that system, and be very hesitant say the lawyers, it becomes, “Okay I'm not going to touch it.”
Now there's a fear. And the information grows & grows, & from a legal defensibility perspective the only way you're going to clean house and not worry about being nailed for destruction of evidence or exfoliation is to have some methodology that says I withdrew that stuff, I know that stuff is not a record, I know that stuff is not use for an investigation, it must be digital junk because we have no business utility with that stuff any more. Let's get rid of it. But there's got to be some diligence around that. Otherwise you run substantial risk in getting rid of it.
DV: You talked about storage is not heap. Lawyers aren't cheap either. So I wanted to ask have you found with your clients that no only have you saved them storage costs, but what about legal discovery costs. Discovery's a volume-driven activity if you've got less storage you're paying less to discover, aren't you?
RK: No question. As it relates to the way in which Delve makes the business case for our clients. We never go to the issue of risk mitigation or litigation cost and response avoidance. They're absolutely real, but they're soft costs. Some can be quantified. A chucking daisies project, a defensible disposition project for big IT departments makes sense purely on storage.
Now having said that, is the cost of a lawyer a significant cost in terms of information review in the context of information investigation in litigation? Absolutely. Lawyers are incredibly expensive. They love the big piles of information. In fact I would say there's no question that defensibly disposing of stuff makes you a more efficient business without question.
When you talk about Big Data, it's Big Data that's actionable information that you need as opposed to trying to find that information nugget or the needle in that information hay stack if you will. As it relates to the information response and inconvenience cost there's no question that that's a gargantuan cost, & no question that you will find substantial benefits by getting rid of the crud.
DV: So you've used this term “defensible disposition” a couple of times. Can you talk a bit more about what that is – define that for our audience?
RK: Think about it this way: The issue is I have a share drive, an that share drive has hundreds of Tbytes of information on it. I don't even know what that stuff is any more. It sits, and somebody babysits the system or not, & somebody may be asking for some of that stuff or not, & what you find is it just sort of sits. Well for an organization to go in & look at hundreds of millions of files today – we have clients with billions of files. There's such a substantial volume there's no way you're going to have your employees do it. First, they're really bad at classification. But even if they weren't bad at it, compared to technology, it's an incredibly bad use of their time.
So really to chuck the daisies or defensibly dispose I need to have a methodology that says, “I know that the stuff I'm looking at is not a record. It's not needed any longer for business purposes. Also I know that it's not otherwise used for audit litigation investigation.” And I have to find a way to do that in different systems with different kinds of content, in different kinds of business arenas or environments, so that I can say when we dispose of content that that content wasn't otherwise needed. It means defensibility. I dispose of it without having my people looking at it. I use technology. One thing Delve does very efficiently is use these technology tools the machine learning tools, to do the heavy lifting. It is very efficient. But in the end I need to do that in a way that is going to make their lawyers comfortable, that will make their compliance people comfortable, their business people comfortable. Otherwise at the end of that process they're not going to want to pull the trigger to get rid of that content.
That defensibility & that technology allows me to evaluate content in a way that at the end of that analysis I can in a legally defensible way dispose of that information and not cause the lawyers heartburn.
DV: It sounds like this is records management 2.0. Is it?
RK: Yes, it's funny that you say that, because that's exactly how I think of it. If you think about it this way, records management fails because the rules are too complex, they're too voluminous, there's nobody there to apply them, technology can't take a thousand rules and apply them to anything, it just fails.
Really what Chucking Daisies talks about, what Delve does day in and day out, is take that full retention schedule, simplify it, rationalize it, and apply it to technology. That's it in a nutshell.
DV: So let me come back to defensible disposition. You've talked about essentially technology's gotten us into the problem, you say you & your clients use technology to help us get out of this problem through classification, & it sounds like you're helping them automate that classification. Talk about that a little more.
RK: Sure. So when Delve goes into a client, you can be sure of two things. They are going to have structured content & they are going to have unstructured content. The structured content, the stuff in databases, sits there & typically it's the kind of content that business users don't interact with on a regular basis. Someone needs to go in and determine what that stuff is & determine what business rules or retention rules apply to it so it can go away. There are clients that Delve has who have huge volumes of structured content where they've never applied any archiving technology. So simple compression, simple ways to manage that content irrespective of the end-of-life disposition rules, they need to come in with tools & technology to understand what's out there & what can be done with this gargantuan storage footprint just for the structured stuff.
On the other side of the equation is the unstructured content. Unstructured content sits on all kinds of systems. There are certain kinds of content that auto-classification technologies are pretty good at discerning what that is. There are some file types because of the uniqueness of the technology or the file type, that makes it much more challenging. That said there are technologies today that are able to discern what something is, able to apply a business rule to that, and those kinds of technologies need to be harnessed more often by more kinds of companies because volumes are so great people simply can't do that any more.
DV: So can you actually go back & classify with machines an existing corpus of data or are you suggesting moving forward from day zero let's auto-classify? How do you deal with that?
RK: So when Delve goes into a client, two things are happening. The direction you're going in is correct. The first thing they need to do is ask, “How do I know what something is?” What Delve will do is take their retention schedule & simplify it, take their content, will teach their content to a computer, so that when it crawls through hundreds of Tbytes of stuff it actually has learned what that content is & can apply the business rules. If I'm in an accounting record or a JAR (?) document or a contract, whatever it is, we're able to actually go in and teach, what your records are, so that instead of people doing it at night when the system is not being utilized or whenever, this system can crawl through Tbytes or Pbytes of documents and makes business judgments, business decisions in classification against your rules and the content that you brought in. It cleans up the past.
On the other side of the equation, again auto-classification just to clean up the past, if you're a small company and do not have a lot of data, you're not going to do it. Too complicated, too much money. To take it on, I want to minimize the upfront exercise, actually train the software is & your rules. It's not an undertaking without cost or without inconvenience and expense. But if you have a lot of content it makes a lot of business value. The business value and idea at the end of the exercise is we can legally defensibly get rid of very vast quantities of stuff. In a macro sense it reduces that storage footprint, which equates to millions of dollars a year.
Once these rules are built, then of course the question is why don't we use that as a new information paradigm going forward. So we clean up the past, & then once you've built your rules you might as well use them on a going forward basis. You will be a much more efficient business and actually apply retention rules in a totally different way than you have before, and doing it in real time seamlessly.
DV: So could you talk a little about the impact of mobile and BYOD. The risk is becoming very decentralized by its very nature. How do IT organizations ensure that when they think that something is deleted it actually gets deleted?
RK: Let me deal with both parts of your question, because they are both really interesting. One of the things that Chucking Daisies does is lay out a series of rules. One of the rules is that for every new technology there is a chunk of informational output, and you need that policy upfront & a way to manage that output before you implement the system. What that tends to do is force the business question: Do I need this content as a business record? And if I do is there a system where we can keep it and store it and access it and does it make economic sense to do it this way? And if we can't actually retain this stuff and access this stuff, how are we going to meet our legal requirements or how are we going to meet our business needs. Forcing that policy discussion up front forces you to deal with the business issues.
At the end of the life cycle the issue is: What am I going to do about all this content that exists that I otherwise have to get rid of? From my perspective organizations have to build in a policy for new technologies up front that says at the end of its useful life who is going to own the disposition. Who is actually going to effectuate the disposal of this content? How are we going to do that in a legally defensible way? And if organizations & in particular IT groups build that into the process up front, you'd have a great deal less inconvenience and expense, inefficiency, litigation response bloodshed happening. And that's where IT organizations, and in particular IT executives, start wrapping their head around.
DV: So talk about who in the organization cares or should care about defensible disposition.
RK: In our Delve experience the people who really care are IT executives. When the mid-level storage guy says, “Ya, this seems to make sense”, when you take it all the way up to the top of the food chain and say, “We can save you $20 or $30 or $40 million per year, are you interested?” It's a real easy sell. And by the way, your lawyers are going to like it because litigation response is going to be a lot easier, and it's going to be a lot cheaper. And your privacy guy's going to like it because you're going to reduce your PII information footprint. And your business execs like it because they spend 10% to 25% of their time looking for stuff. That will be reduced, so they are more efficient. And my customers who can get answers from me much more readily. Selling defensible disposition is incredibly easy on storage alone. There are some other major benefits, but the people who find the greatest value and understand it immediately are the CIOs. Because I can hand them a whole chunk of money that they can use, especially in this economy, to buy new technology, hire new people, find other efficiencies. Why not chuck the daisies and let some of that go?
DV: So the book is Chucking Daisies: How Companies Deal with Big Data by Randolph Kahn. So tell us more about the book. How did you organize it? What can people expect to see when it hits the stands?
RK: The book is structured as simple rules. There's 18, 19 simple rules that help organizations, and again primarily IT professionals, understand. So a rule might be, for example, “Never implement a technology unless you have a policy first.” We talked about that. Now that might seem like a no brainer, but the number of organizations that technologies sneak into and all of a sudden – social networking's a perfect example. All of a sudden you wake up one day and you have a big insurance company and you realize that all your sales people are using Facebook to sell policies. Wonderful for business. But we have compliance requirements, we have retention requirements. What if there's litigation? We have privacy concerns. This is their Facebook account. Every single organization that looks at Social Networking and says, “Hey, there's some value here”, needs to stop first an build that policy construct that we talked about.
So the rules are really very straightforward, very pragmatic, very easy to understand, and we use real life to actually make it come alive.
And I should probably point out to you that I have a co-author. Galina Dotdoski is my co-author. So I'm not doing it by myself.
DV: And the book will be out roughly when?
RK: In the next couple of months.
DV: Randy, really appreciate having you on the Cube. Love having you share your best practices with the community. Love to have you back. How to see you at IOD. The Cub will be at IOD next week on Monday and Tuesday in Las Vegas to talk about these and other issues.