Storage Peer Incite: Notes from Wikibon’s September 4, 2012 Research Meeting
Recorded audio from the Peer Incite:
A forklift upgrade of an entire data center is a very rare luxury in IT. A combination of high growth and the need to move to 24x7x365 operations to support an online service-based business model, however, drove The Resolve Group to do exactly that, according to CIO Robert Reeder during the September 4, 2012 Peer Incite. Rezolve provides financial planning and associated services, including help filling out the Free Application for Federal Student Aid (FAFSA), to families of students applying for college. It provides these services with a combination of online self-service and call-center operations.
Reeder said that one primary goal of the complete data center replacement was to maximize server virtualization. Today Rezolve's data center is 80% virtualized, and it is moving toward 100% virtualization by the end of October.
Because downtime is a major concern and backup is becoming an increasing issue, a second goal is to move from a single active data center on the U.S. West Coast with a passive backup site on the East Coast to an active-active architecture. This will allow an automatic cut-over, if the main California site goes down or loses connectivity. Eventually, the Rezolve Group would like to load-balance between sites to improve QoS.
The challenge is creating a near-real time link between to data centers 3,000 miles apart. Accomplishing this requires a combination of high bandwidth and advanced technologies, but it can be done, he says, within a reasonable budget. Rezolve is testing its implementation now and hopes to go live within the year.
The articles below examine the key strategies and lessons that Reeder learned from the experience that other CIOs can apply to their operations. Bert Latamore, Editor
On September 4, 2012, Robert Reeder, CIO of The Rezolve Group, joined the Peer Incite community to discuss designing infrastructure for continuous business uptime. The Rezolve Group provides a variety of services to families of college age students, including applications and services that help families complete their FAFSA - Free Application for Federal Student Aid. The company also provides a variety of services for colleges and universities. For The Rezolve Group, Reeder had evaluated both the cost of downtime and the cost of slow response time. He found, that at a minimum, they resulted in deferred revenue and a damaged reputation. In some cases, downtime led to lost revenue.
Three elements characterize the demands on the IT infrastructure at The Rezolve Group:
- The number of clients is rapidly growing, which drives increased transaction load on the systems.
- The number of software applications and services continues to expand.
- Six times per year, the company experiences transaction peak-load spikes of 10X typical days.
Many companies with growing and view occasional slow response and downtime from unpredictable demands as unavoidable conditions. Reeder determined that both were unacceptable and avoidable. And, while downtime has not yet been completely prevented, The Rezolve Group is well down the path.
Getting to its nearly-always-on current state required moving applications from physical to virtual servers, enabling dynamic movement of applications from one virtual server to another, implementing a storage area network, virtualizing the storage, and going through all applications to eliminate specific IP and server calls, replacing these with DNS lookups.
The next step in the journey was to move Rezolve's two data centers from an active-passive to an active-active architecture. This required that the staff re-architect the method for transferring data from one location to another. After evaluating a variety of methods, Reeder selected the Infineta Data Mobility Switch, since it could accelerate communication between the data centers without requiring application-specific modifications. By implementing Infineta, Reeder can minimize the degree to which data in the two data centers are out of synch. In addition, the Infineta DMS can scale dynamically to support continued transaction volume growth and variability. Without WAN acceleration, The Rezolve Group would at best have a warm disaster recovery site, and more likely a cold disaster recovery site, as the data would be too much out of synch.
When asked about future directions, Reeder discussed the possibility of geographic load balancing between the two data centers, but at this time, he is content with his active-active sites with rapid disaster failover capabilities. And for The Rezolve Group, the benefit has been enormous, as Reeder can continue to support his dynamic, growing environment with an extremely lean team.
Action item: Organizations should listen to the valuable lesson that Robert Reeder learned from his coach in high school: "On game day, you play like you practiced." Organizations that start with an assumption that they will have downtime are guaranteed to have downtime. Organizations that start with a culture that says, "All applications must be available all the time," have a true shot at delivering continuous business uptime.
On September 4, 2012, Robert Reeder, CIO of The Rezolve Group, joined the Peer Incite community to discuss designing infrastructure for continuous business uptime. Reeder’s company is focused on consumers of higher education services, assists these students in understanding the total costs of a higher education, and provides assistance in helping families complete their FAFSA - Free Application for Federal Student Aid.
As the CIO of a company experiencing significant growth, but with a relatively small IT staff, Reeder began to look at the company’s IT infrastructure and services in an effort to streamline IT operations and provide significantly improved business outcomes, including less downtime, an environment that can scale to meet continued growth needs, and the ability to continue operations in the event of a disaster.
The factors that hold us back
As CIOs, what often holds us back from being able to make the changes that are necessary in order to meet all of the goals we’d like to meet?
- Perhaps you’re being held back by legacy hardware on which single, mission critical applications reside. That makes it difficult to build an IT infrastructure that’s as flexible as necessary to meet new business requirements.
- Perhaps IT staff members are invested in existing development processes that require significant downtime for software updates. That makes it difficult to deploy new software functionality in a timely manner.
- Perhaps the organization tolerates a development environment that is not as robust as production. When it comes to workload productivity, an unreliable development environment should be unacceptable. Lost time in development translates directly to slowdowns in time-to-market for new products.
- Perhaps existing network services are slow and unreliable. This makes true disaster recovery and site redundancy very difficult, if not impossible.
One step back equals four steps forward
There are many different ways to apply "bandages" to the myriad of legacy problems to get to an acceptable place. But, the one surefire way to make sure that established goals get met is to throw it all out and start over. For Reeder, this was a road he was willing to travel, and it’s one that I heartily recommend if it makes sense and it’s possible. There are, however, some challenges that must be addressed before this can happen:
- Have a vision: Although most organizations replace IT hardware on a cycle, major fork-lift upgrades require a clear, articulated understanding of the desired end-state.
- Document and understand every infrastructure component: Just because something will ultimately go away doesn’t mean that it’s not important. The function of everything needs to be understood in order to make a determination about that service’s place in a new environment.
- Change the organization's mindset to one of continuous improvement with no downtime: Reeder was successful in modifying the IT culture to one of what he calls a “fast cut” methodology whereby new code is deployed very quickly with little or no downtime. This has allowed his organization to deploy features and updates more quickly and with less overall impact. The “cut” has simply become routine as opposed to being an event. To be fair, his staff was already pretty progressive, so this wasn’t a battle.
- Analyze legacy code: Reeder said his team had to analyze old code to remove hardcoded references to systems, so that changes could be made without damaging services. Without this step, his efforts would have had a negative impact on the business.
- Emphasize all aspects of the business: Even the development environment needs to be robust so that developers can continually test and build new code designed to improve the business. Don’t just focus on production.
Support is key
Reeder was extremely fortunate to have the support he needed to take a “throw it all away” approach in order to build the IT environment necessary to meet modern business demands. He took his organization from one steeped in old physical servers and legacy infrastructure to one that will be 100% virtual by the end of October 2012 and that is nimble and allows his company to take a continuous improvement approach to business.
Action item: CIOs need to continually adapt and improve their environments in order to meet evolving business demands. Sometimes, this might mean throwing in the towel on the current environment and building a new environment that meets current demands and that is flexible and nimble enough to meet new business demands that arise. CIOs should not shy away from such challenges and should strive to have a deep enough understanding of the entire environment in order to be able to drive a smooth change from legacy to modern with as little disruption as possible.
On Wikibon’s September 4, 2012 Peer Incite, Rezolve Group’s CIO Robert Reeder shared how his company is looking to use new generations of technology to tackle the challenges of managing two primary locations on opposite coasts of the United States. In a typical configuration, Rezolve Group has its active location on the West Coast and a passive disaster site on the East Coast. In this configuration, Reeder expected recovery to take up to 24 hours.
Like many companies, Rezolve Group is finding that its 30%-50% annual growth makes keeping up with backup processes difficult. The environment is already 80% virtualized (all VMware) and expected to be 100% by the end of the year. Reeder is currently testing a solution with Infineta and is seeing greater utilization of resources and considering moving to active-active deployment between its two locations.
The challenge of connecting long-distance is much more than a bandwidth issue. Rezolve Group had a 1Gb pipe between sites, but the IT group found that it had a limited transfer rate due to the response time latency over the long link. Adding additional bandwidth would not solve this as transfer rates topped out at about 130Mb for Rezolve Group’s configuration across 3,000 miles. To overcome this issue, he evaluated application-specific software solutions for WAN optimization, but found that these required that application owners change processes and introduced the added complexity of maintaining a different communications process for each application.
He was looking for a solution that adhered to Occam’s razor - the simplest approach is the best approach. An appliance-based WAN optimization solution, like Infineta, would work across all applications while being transparent to application owners and users. Additionally, software solutions and many older WAN optimization appliances were not designed for today’s high bandwidth applications. Infineta’s solution was designed for virtualization and Big Data applications that require scalability of 1Gb and up. Rezolve Group can add bandwidth to its WAN optimization deployment (going from 1Gb today to at least 8Gb in the future) without having to add any additional boxes.
One test that Reeder did was a SQL Anywhere database replication of 100,000 records. The remote site is ready to use in five seconds. He said that with the size of his environment, growth of data, and distance between his sites, backing up the environment would soon become untenable without Infineta. Rezolve Group is taking a slow, deliberate path towards adoption of this technology and is learning a lot about how to improve utilization and adjust resources with the new capabilities available using Infineta’s WAN optimization solution.
Action item: CIOs have long struggled with the challenges of maximizing resource utilization between dispersed locations. Virtualization and Big Data increase the volume and change the patterns of traffic between locations. Solutions that are designed specifically to address these challenges, such as Infineta WAN optimization, should be on the short list of technologies to consider.
In 1992, a startup company called Network Appliance (now known as NetApp) created a new category of technology product. The "appliance" was a specialized product that did one thing (in this case storing and serving files) better than general purpose machines (e.g. x86 systems that could process, store, and route data). The analogy co-founder David Hitz used with VC's when raising money was that you didn't want to make toast in an oven. The startup spawned the creation of a new class of specialized technology products that performed specific tasks such as traffic routing, security, backup, data warehousing, etc.
In general terms, this point in the history of the IT business coincided with a prolonged and fundamental movement toward best-of-breed solutions winning out over fully integrated suites of products. Competition occurred along very narrow lines (e.g. microprocessors, disk drives, PC's, Unix servers, disk systems, databases, applications, etc.), a phenomenon that created a tail wind for so-called appliances.
With the trend toward cloud computing and converged infrastructure, many are asking if the pendulum is swinging back toward more fully integrated and multi-function systems. At the September 4, 2012 Wikibon Peer Incite Research Meeting, Robert Reeder, CIO of The Rezolve Group, talked about the attributes of products that are enticing to his company. Specifically, he cited:
- A complete solution that is integrated,
- A product that can scale with his peak requirements,
- Solutions that support TRUE non-disruptive upgrades and live migration,
In particular, Reeder said that if he can do more than one task with one device it’s better, because it’s simpler to manage. Indeed, the main complaint about appliances has been they are another point of management for IT professionals and add to complexity of IT operations.
Reeder cited two examples of solutions he deployed that do more than one thing:
- Infineta’s Data Mobility Switch, which his firm is using not only to speed up inter-data center workflows, but also manage block-level replication traffic, database replication traffic, and VMware Site Recovery Manager (SRM) based traffic;
- Actifio, which offers an integrated, in-band storage virtualization solution based on IBM’s SAN Volume Controller, which allows Reeder to avoid having to buy multiple other technologies for snapshot, backup, and data protection.
Specialized appliances as a product category are not disappearing. Rather they are evolving to do more to support cloud computing that is less specialized by silo and more integrated across the application portfolio. Like multifunction printers that not only print but copy and scan, more integrated enterprise solutions are hitting the market that provide value over a wider range of the enterprise solutions stack.
Action item: The trends toward cloud computing and converged infrastructure are evolving product solutions. Increasingly, buyers will adopt products that trade doing one thing at the very best level of performance for solutions that can offer greater integration and perform multiple functions well. Enterprise vendors, particularly startups, must remember that high quality support remains table stakes. Suppliers to the enterprise must demonstrate trained technical support staff are readily available to support these more robust products with professional that have skills sets across more disciplines. Technology can help win the deal, but support keeps it.
Maintaining continuous, 24x7 business uptime involves all parts of IT, not solely the production environment. Specifically, said Rezolve Group CIO Robert Reeder Robert Reeder in the September 4, 2012, Peer Incite Meeting, the development group, which in many businesses is seen as non-critical, needs to work under the same business uptime requirement as production. This means that investments in infrastructure for the dev group are just as important as those for the operational side of the house.
Regarding it any other way is a serious error, he argues. The little that a company saves by running its development on older, sub-optimal systems is more than lost in the cost of the productivity lost when those systems run slowly or crash. And treating dev as a lower tier also sends a message to the development staff that can only result in lower morale and productivity when those systems are working optimally. It also means that in some cases the dev staff has a different experience than production and may produce code that does not run optimally on production systems.
This is taken to its extreme in the dev/ops approach used by the Internet giants, the ultimate 24x7x365 companies. There development staff is part of operations and runs on the same systems. Reeder has not moved to a dev/ops model – yet at least. But he does treat his development staff as equal in importance to operations and says that pays dividends.
Action item: Reeder says his high school coach taught that “you will play the way your practice, so practice the way you want to play.” Applying that life-lesson to IT requires that all parts of the organization, and specifically development, need to be prioritized for 24x7 uptime. CIOs should give development equal priority to mission-critical operations in both treatment and equipment allocation. Anything less will have a negative impact on productivity which will cost the organization much more that it will save on equipment costs. And it instills a constant uptime mindset throughout the organization.
On September 4, 2012, Robert Reeder, CIO of The Rezolve Group, joined the Peer Incite community to discuss designing infrastructure for continuous business uptime. Reeder’s company is focused on consumers of higher education services and assists these students in understanding the total costs of a higher education and provides assistance in helping families complete their FAFSA - Free Application for Federal Student Aid. As a former higher education CIO with great interest in the complex financial aid process, I was particularly interested in this discussion.
September 4 Peer Incite: Long-Distance Data Replication for Continuous Business Uptime Watch the whole Peer Incite discussion to learn about all of the specific initiatives undertaken by the Rezolve Group to modernize their information technology architecture.
Reeder relayed his experience in modernizing the company’s IT infrastructure, describing an eventual epiphany that the company needed not just an evolutionary upgrade but a full revolution in the data center and beyond to achieve its goals.
Some of Rezolve’s upgrades have been taken by a myriad of companies, but Rezolve has jumped in with both feet and is planning to take its environment to the limit. Today, rather than managing a bunch of aging single application physical servers, Rezolve’s IT staff is managing a modern, almost fully virtualized environment which Reeder indicates will be 100% virtualized by the end of October 2012. Even for those who believe strongly in virtualization, a 100% penetration rate is a lofty goal and has enabled Rezolve to be more efficient and think about the rest of its infrastructure in new ways.
When it comes to having the ability to simplify an environment and get rid of the old, nothing beats virtualization, especially with a penetration rate of 100%. Through these efforts, organizations can dramatically reduce the amount of hardware they manage and move from a server-centric to an application-centric mindset, which was another of Reeder’s goals. Now, IT staff focuses on the needs of each application rather than the needs of each server. A mindset change that jettisons old ideas in favor of new ones can be a powerful thing. After all, “getting rid of stuff” doesn’t just have to mean reducing the amount of equipment one has; it can also mean ridding the organization of paralyzing mental baggage.
At the same time, Reeder realized that the shared nature of the SAN and the virtual environment, while it allowed the group to eliminate some legacy services, also allowed it to address ongoing DR concerns through the implementation of two active data center sites that remain in active communication using newer technologies such as Actifio and high bandwidth. Rezolve depends on its two data centers for operations with the assumption that one will remain on if the other becomes unavailable.
Reeder prefers to take a simplistic approach to his IT infrastructure, a direction that all CIOs should consider. Today, building complexity into the IT environment is not nearly as necessary as it once was. Through the implementation of a modern infrastructure, Reeder and his relatively small IT staff can more easily manage the environment without having to worry about attempting to implement new services while trying to support a slew of legacy ones.
Action item: Although it can be easier said than said, CIOs need to expel complexity from their environments and embrace simplicity as much as possible while, at the same time, consolidating and reducing legacy support needs in order to support future endeavors. When eliminating the legacy hardware and software from an environment, CIOs also need to find ways to eliminate legacy thinking so that the company as a whole can embrace the future.