Introduction
Leumi Bank has grown to be one of the largest commercial banks in Israel and is growing fast internationally. Itzik Reuven is the storage administrator and manages the data center storage for over 12,000 employees in 21 countries. He managed a broad smile when Wikibon spoke to him about his XIV experiences in Montpellier, France.
The strategic problem Itzik faced nearly three years ago was reducing the cost of storage in a very high-growth complex commercial banking environment. In 2005 the Leumi data was held on tier 1 storage supplied by EMC and IBM, and tier 2 storage from a number of vendors. Each array was managed by different management tools and procedures.
Storage Requirements The key requirements for storage arrays for the Leumi commercial banking applications were defined as the following:
- Good I/O performance for the majority of applications. A few large applications require very good I/O latency performance. A handful of applications require high throughput (bandwidth) performance.
- High storage availability for nearly all applications. The recovery point objective (RPO) is aggressive for all applications in the bank (loosing a decimal point for a banker is not an option).
- Rapid recovery from any data incident for most applications. The recovery time objective (RTO) for the bank is also aggressive. The majority of the data problems are caused by software, and fast recovery from any data problem (software or hardware) is a very important business requirement.
- Rapid recovery for business-critical applications with large Oracle databases. These applications are particularly challenging, requiring the ability to recover from application or system failure from an earlier clean version of the data, re-running the logs and restoring production very quickly.
- Storage provides no impediments to business growth. Leumi is growing fast and is implementing new applications and upgrading existing ones to adapt to a fast-changing international marketplace. Storage change has to be implemented in hours, not months.
- Lower total cost of storage (including acquisition cost, commissioning costs, software costs, maintenance, decommissioning costs, operational costs and energy costs). The cost of tier 1 storage and storage maintenance software was viewed by IT and business management as prohibitively high.
Storage Array Recommendation Itzik Reuvin was responsible for recommending a storage strategy that would meet these objectives, and reviewed most storage platforms from the major vendors. The Nextra XIV was a new array with a completely different approach to storage, built from commodity Intel processors, gigabit Ethernet, and trays of commodity SATA disk drives. Itzik recommended the XIV and the IT executives signed off that XIV offered Leumi Bank the best business solution to the most pressing storage problems:
- Performance tuning is completely automatic. No IT staff or management software is required to manage performance. No effort is required to manage individual application performance, as the data and I/O are spread out evenly across the drives. The only management required is to monitor overall I/O rates, latency, and bandwidth. If these metrics start to go out of line, the management action is to stop adding new application data to that XIV array.
- High availability is achieved with RAID 1 implementation. All the 1MB pages are duplicated on a different XIV storage module. Leumi applications with very high RPO requirements use synchronous mirroring to provide additional resiliency.
- Rapid recovery from hard drive failure. Recovery from a 1 terabyte SATA disk failure takes about 30 minutes, as the recovery I/O and controller processing is shared across all the disks and controllers. This contrasts to many hours on traditional RAID implementations.
- Rapid recovery time from system or application failure. This could be enhanced dramatically by the extensive use of writable snapshot storage. This allows a consistent copy to be made within a consistency group very quickly (<< 1 sec) and with little additional storage. Itzic implemented 28 snapshot versions of volumes for critical applications. A new copy is made every six hours, and kept for one week. In the case of data corruption problems from systems or applications, this allows application maintenance to go back to an earlier version of data instantly (data does not have to be recovered from the tape or VTL), rerun the logs and restore the application.
Itzik had earlier tried to implement a slimmed down version of this rapid recovery procedure on the tier 1 arrays which also had similar functionality. However, the array controller overheads were unsustainably high for effective production use. The only alternative solution available was making a complete clone of the data, which would have been prohibitively expensive and time consuming.
- Simplest storage management environment possible. The effort required to thinly provision a volume and connect it to an application is minimal. The ease-of-use is much better than any of the arrays from the leading vendors. In Wikibon’s opinion, only 3PAR has a platform that approaches XIV on ease of use, and that was not available in Israel. The system administration requires ½ person to manage 700 terabytes of storage.
- Dynamic change at any time. The virtualization architecture allows an exceptionally easy GUI storage management interface that allows dynamic change at any time.
Strategic Risk
Itzik Reuven readily agrees that the initial purchase of the first system was a strategic risk as XIV was a small storage start-up in Israel. Leumi took the risk for the potential gains that could be accrued if the system performed as advertised. The risk was mitigated by the quality of the development team at Leumi, Leumi's confidence in Moshe Yanai as a visionary leader and the closeness of the development team to his data center.
Itzik reduced the risk as much as possible by repeated testing of failing a disk, disk modules and other components, and extensive performance testing. As a result of the growing confidence in the XIV, Leumi now has eight (8) XIV systems installed with over 700 terabytes of storage. It is managed by less that ½ FTE.
Strategic Fit Itzik Reuven would be the first to recognize that no single array will be perfect for every workload. He has kept the IBM DS8000 for applications that actually have low latency requirements, and for applications where data access is completely random and bandwidth is the constraint. Archiving applications that that have a very low probability of being accessed repeatedly can benefit from power saving spin-down but are not suitable for the XIV architecture as data is spread over all devices, making spindown impractical.
However, at Leumi there is a 'fat middle' of tier-1b and tier-2 applications that are a perfect fit for this type of storage array, and Leumi has chose the XIV as its strategic platform and the default storage for all applications.
Business case Table 1 illustrates the thinking that went into the development of the business case by Leumi.
Future Plans Itzik Reuven is looking forward to the availability of FAT to thin functionality in release 2.0 of the XIV microcode. This is similar to the technology release by 3PAR in its T-series, and will allow the identification and elimination of free space on the disk when volumes are copied onto the XIV. He is also looking forward to the clustering of multiple XIV frames for some applications where very large consistency groups are required, but will probably keep the majority of XIV frames free-standing. He is intrigued by the potential of SVC to provide the ability to move application data dynamically to the XIV, but is unlikely to keep the data virtualized by the SVC; he believes that using the raw management tools directly on the array is a safer and more effective way of storage management.
Conclusions Leumi's risk of evaluating a completely new array architecture from a start-up is minimized because of the physical proximity of the design team. The IT executive management and storage professionals did an outstanding job of understanding the potential of the new XIV architecture to provide business value to their IT operations. In Wikibon’s opinion, the choice of the XIV storage as the default standard for tier-1 and tier-2 storage at Leumi Bank is sound. Equally pragmatic is the understanding that there are specific workloads that will require other arrays on an exception basis.
The bottom line is that the XIV provides “good enough” storage for the majority of tier-1 and tier-2 storage at Leumi Bank. The elimination of storage tiers, the exceptional simplicity of storage management and the potential for aggressive pricing from IBM because of the use of commodity components all place XIV as the best strategic fit for Leumi Bank. Leumi will benefit from setting the standard that all applications (either internal or packages) should run on XIV hardware. Exceptions should require additional justification and executive review from senior IT management.
Action Item:
Footnotes: Legal: © Wikibon 2009. This document is copyright protected by Wikibon and does not fall under the GNU general license terms for Wikibon.org. Links to this article from external sources are allowed, however any other re-distribution of this content for commercial purposes is strictly prohibited. Please contact Wikibon for more information.
The cases cited herein are real. Wikibon case studies are developed independently and their development is not initiated for or funded by any single company. Wikibon reports actual customer experiences and results with no attempt to emphasize any one vendor’s strengths or weaknesses. Read the full disclaimer.