Contents |
Introduction
“[The Cloud]”. It sounds like a magical place where CIOs can place their workloads and then go home for the day and turn off the smartphone. After all, once you move to the cloud, all your worries disappear, since everything just works.
Except when it doesn’t. There’s that.
Build for failure
The future is really all about failure. It’s not about constantly suffering failure but, rather, accommodating failure as an inextricable part of the IT equation. As one looks at future trends and architectures, this becomes clear. For example, [software-defined storage] includes [software-based constructs] that are designed to accommodate – to work around – failure when it occurs. Notice that I use the word “when” in that sentence. This is not an “if” situation. And, I think this kind of thinking is really important. There are far too many examples of data centers built on sheer hope, whose design forces IT staff to be reactive in the event of a failure, because the organization simply hopes that failure don’t occur.
So, ‘’when’’ failure occurs, it must be handled just like any other situation. The system must recognize the failure and be able to take appropriate action. Perhaps that means automatically migrating workloads from a failed node to an operational one or automatically bringing down one node in a storage cluster to be rebuilt due to the loss of a physical disk.
My goal here is not to sound depressing – “Yes… everything we do is going to fail.” Instead, my goal here is to always be realistic – “Yes… these systems will fail, but they are built to survive as many failure types as we can think of.” This is saying that “Sure, some pieces might fail, but we’ve built this thing in a way that the failure of a single component doesn’t bring the whole thing down.”
Some things are simply unavoidable
Architecting for failure should be a key driver in any new environment, but some circumstances will still force CIOs to react. For example, suppose you’ve built a state-of-the-art [hybrid] data center all based on software and your cloud partner ceases operation.
It’s already happened.
Cloud provider crashes to earth
Nirvanix customers were given two weeks' notice that their cloud provider was shutting its doors and customers needed to move their data to a new provider. To its credit, Nirvanix appears to have partnered with IBM so its customers have a way to get this done. But how many of their customers do you think budgeted for this – what could be a pretty big expense? This is a case of needing to plan for failure but not in the traditional sense of working around failed hardware or software components.
A different provider gives away a user’s account
In a separate case, while it’s not necessarily an enterprise use case, provider Box.com handed a user’s entire account over to a complete stranger, who then proceeded to ‘’delete the entire account’’. In the meantime, that new user had complete and total control over this person’s account. To [Box.com’s] credit, the company was able to figure out exactly what happened and has implemented new procedures to prevent it from happening again, but this is certainly a case of human error – a mistake – that created a potentially dangerous situation for the user in question.
Don’t avoid the cloud, but...
I don’t share these stories to scare people away from the cloud. In fact, I’m a huge proponent of turning to cloud providers where it makes sense. However, too often, what appears to be a really easy decision that can reduce costs and operational complexity actually introduces significant risk to the organization if approached without considering all possibilities.
In going back to the need to plan for failure in everything, when adopting the services of a cloud provider, make sure that those services can fail over to a secondary provider on short notice. While cloud providers individually often have the ability to provide availability by spanning data center or geographic zones, using just a single provider for mission-critical services leaves a lot of risk. At the very least have a hot standby if it’s cost-prohibitive to have a dual stack.
Action Item: So, you’re going to the cloud, eh? Have a safe journey! Make sure you watch your back, double-check your contracts, and keep an eye on the health of your providers. But, more importantly, make sure that you ‘’plan’’ for failure. It’s going to happen, and your worst-case scenario planning will certainly save your company both time and money when the inevitable happens. Know exactly what steps you will take when your cloud provider suffers a failure or just decides to shut its doors. Make a plan, check it twice, and make sure everyone on your staff knows how to execute.
Footnotes: