Originating Author: David Floyer
The origin of NAS file systems was to support user computing. As a result, each workgroup usually has one or more NAS filers supporting the F through Z drives of the users. Whereas the traditional block-based applications have been largely consolidated in most enterprises, NAS systems have remained largely distributed. The storage management cost of these systems has been user time, and IT has not had much incentive to be seen “meddling” with user data.
A number of business drivers are changing this scene:
- In an increasing number of organizations, IT have been given a mandate to ensure compliance of all data in the enterprise, as part of Sarbanes/Oxley and other compliance initiatives. In particular many historical email records are often found on NAS filers.
- Many new applications are naturally file based, and are requiring very large file sizes and high performance requirements. Examples are:
- Entertainment systems
- Electronic discovery or e-discovery applications
- Archiving systems
- Life sciences
- Seismic & oil exploration applications
- Internet services applications
- There is an increasing requirement that file based applications be shared across the enterprise
- The very large increase in the number of servers, particularly virtual servers, that have to be connected to the file systems
The result is that IT is being required to rationalize file based systems. As in SAN block-based storage, consolidation is a logical way to go. The key technology challenges that this type of consolidation present are:
- Scalability – to be able to consolidate and integrate very large numbers (100s) of filers and types of files
- Availability – the requirement for file-based applications to have much higher levels of availability and recoverability
- Manageability – the requirement is to provide common processes and procedures to managing the provisioning, allocation, backup, archiving, cost allocation, and budgeting for file-based systems.
- Compatibility – it is essential that the users environment be maintain as is. Requiring user departments to change the application set-up to accommodate an IT desire to consolidate is a bridge too far.
This article investigates the capabilities of NAS clustering to provide solutions to these business problems.
Contents |
NAS Clustering capability
The availability of NAS clustering technologies have reduced the cost of large-scale and high-performance file systems, and have significantly increased the size of file systems and files that can be managed. Many of these file clustering systems have been built round low cost standard components (Intel/AMD processor technologies and standard storage arrays). Others have used very large memory systems to reduced the overheads of file systems and radically increase the I/O throughput.
File virtualization for file based systems is very different that block-based virtualization. Effective file virtualization technologies have to deliver two important capabilities:
- Back-end Virtualization – the ability to provide backend virtualization of the storage resources, so that a single large pool can be created with a single logical directory. The provision of logical volumes in the backend enables techniques such as thin provisioning to be used to provide the storage that has been asked for, rather than what has been allocated. Dramatic storage savings are possible in many NAS installations.
- Front-end Virtualization – the ability to create multiple virtual file servers mapped on to one very large consolidated file pool, so that the users can see exactly the same file structures as they see at the moment. This enables the user to be protected during any migration from having to change access procedures to his files and retest them.
The traditional NAS vendors (NetApp and EMC at the high-end) have continued to improve the filers in traditional ways, but have not yet provided integrated clustering & virtualization solutions. The clustered solutions have been provided by newer companies like Isilon, BlueArc and to some extent HP PolyServe. In the near future, NetApp and EMC will be forced to integrate clustering into their NAS offerings. If they execute this well and cost effectively, users will have a growth path, and NetApp and EMC will retain market share. If they stumble, new companies will have the opportunity to set new standards for consolidated and virtualized file-based storage.
Specific operational goals of implementing NAS Clustering
The likely investment required when implementing a NAS Clustering & consolidation initiative is between $0.5 million to $2.0 million, with an elapsed time of between 9 to 12 months. [Note: These figures assume a Standard wikibon business model organization with $1B in revenue with 4,000 employees and an IT budget of $40M per year. The scenario assumes that 300 terabytes are installed, of which 100 terabytes is NAS storage. A successful NAS clustering implementation will:
- Reduce the cost of NAS storage by 10-30% over a three year period
- Reduce the IT cost of managing NAS storage by 30-50%
- Improve the productivity of the users by improved performance and availability
- Improve the productivity of the users by reduction in time to implement new applications or new versions of file-based applications
- Improve the productivity of users through higher availability
- Improve the productivity of NAS users by simplifying the management and backup of filers, and making it easier for users to share files across the enterprise.
- Take advantage of the change to look at some of applications that are greatest consumers (directly or indirectly) of NAS storage. One of biggest is email systems, particularly Microsoft Outlook. Up to 50% of NAS storage can be consumed with .pst files as users are “directed” by email storage administrators to make copies of their own emails to ensure that they have a record of them.
Notwithstanding that a large percentage of the saving will come from end-user productivity improvements, an analysis of the need for NAS Clustering is likely to find that a business case will be good (ROI and IRR>200%) for the right types of organizations.
Risks of implementing NAS Clustering systems
The major risks to a NAS Clustering initiative are:
- Not understanding the current NAS file systems, how they are managed, and how the applications interact with the NAS filers. The greater the technical capabilities of the user population are, the higher the risks of finding problems are.
- Not providing the users with a way to use the new consolidated filers in the same way as it is at the moment. Demanding changes will lead to users just putting in their own filers “under the table” to preserve compatibility
- Not having good representatives of the major user groups as an integrated part of the design and implementation team. They will need to have the confidence of both user management (being frugal) and the technical users (to be able to sell technical change when required)
- Not having a very clear understanding of any complex database applications that use NAS. These need additional design and testing time, and should not be the first to be ported over to the new systems, unless there is a clear performance requirement to do so.
- Not having expertise available from the email system that can investigate and advice on email use of NAS storage.
NAS Clustering initiative
The NAS Clustering strategy will be implemented when the NAS cluster system has been designed, built, tested, implemented and successfully handed over to operations so that it can run as specified without external support.
Expectations (out-of-scope)
The following factor that is not within the scope of the NAS clustering initiative is very important. If this factorsis not in place or addressed, the probability of a successful NAS clustering outcome will be significantly lower:
- An effective budgetary system is in place which would allow allocation of costs for centralized NAS storage services out to user departments
Analyze phase
Acceptance test considerations
The analyze phase will be completed when the initial business case has been accepted by the sponsor, and agreement has been reached to proceed to the design phase or kill the project.
Key analysis milestones
This phase should take about 4-8 weeks and 20-40 person days of effort.
- An effective sponsor of the initiative is identified
- It is important that the sponsor can resolve any organizational issues, can represent both the user and technical communities, and has a familiarity with file based systems
- Data collected
- Determine all the key NAS filers that are currently in use, the key applications that use each filer, the protocols used, and the registered users for each filer.
- Determine the costs of managing the current environment, including IT people cost, the amount of time end-users take to manage file-based applications,
- Agree which NAS filers to exclude from NAS Clustering (e.g., special applications with unique protocols, user populations too technical, etc)
- Project forward the expected costs of running the current system
- Determine the costs of the different alternative clustering technologies, including equipment and software costs, additional data center costs, network costs, and implementation costs
- Determine the benefits of the different alternative clustering technologies, including reduction in business risk from better data control, improvements in availability,, improvement in performance, improvement in speed of application change/new application deployment, etc.
- Business case constructed:
- Analyst constructs business case / cost benefit analysis detail of alternative scenarios against the bases of continuing to run the current system
- Recommend the best alternative to the business
- Initial Design and business case accepted by sponsor and any other stakeholders necessary
Design phase
Acceptance test considerations
The design phase will be completed when the design has been accepted by the sponsor and agreed to by the key stakeholders, the RFP has been issued and key hardware, software and network vendors selected, agreement to fund the project has been agreed, and agreement has been reached to proceed to the deploy phase or kill the initiative.
Key design milestones
This phase should take about 8-12 weeks and about 30 person days of effort.
- Primary vendor decided
- Decide on vendor hardware and software technologies available and issue RFP/solicit bids
- Ipsilon, BlueArch and HP PolyServe would be important clustering vendors to consider, as well as future clustering storage from NetApp and EMC
- Determine network requirements and issue RFP/solicit bids
- Decide on vendor hardware and software technologies available and issue RFP/solicit bids
- NAS procedures designed
- Design procedures around hardware and software decided and design integrated with current procedures
- Pay particular attention to the criteria instruction for end-users, together with help-desk support.
- Determine training requirements for operations
- Design test procedures and scripts
Deploy phase
Acceptance test considerations
The deploy phase will be completed when:
- The NAS Clustering and virtualization system is built, tested, and brought into service
- The operations group is fully responsible for all aspects of the installation
Key deployment milestones
This phase should take about 4-6 months and cost between $0.5 million & $1.5 million.
- NAS Clustering topology built
- Installation of clustered NAS storage hardware and storage management functionality
- Installation of any network facilities required
- Installation of front-end and back-end virtualization capabilities
- Update and creation of new process and procedures, with full documentation
- NAS Clustering tested
- Testing of equipment, software, and procedures on historical data
- Testing of recovery on some non-mission critical live applications
- Testing of procedures for migration, backup, recovery, and allocation
- Migration & Cut-over to NAS Clustering completed
- Phased migration cut-over to NAS Clustering
- Extensive monitoring of performance, reliability, & network performance
- NAS Clustering initiative wrapped up
- Procedures set up for monitoring performance, availability, and flexibility
- Procedures set up for adding additional storage, storage functionality, and network bandwidth
- Final review of documentation
- All project staff released and full hand-over to storage operations
NAS clustering initiative summary
NAS Clustering together with file virtualization are viable technologies. New vendors such as Isilon and BlueArc do not yet have the range of traditional NAS functions that NetApp and EMC offer. However, they do provide an overall solution to bringing file-based storage costs (IT and hidden user costs) under control, especially in areas where there are high performance and/or availability requirements, and there is extensive requirement to share files across users and departments.
Users with mainly NetApp or EMC filers installed should probably choose to wait and see what clustering and virtualization solutions are available from their current vendor later in 2007. It also makes sense to take a piece of the file-based storage infrastructure and implement another clustering solution; this will enable experience of clustering and processes to be put in place that will accelerated future consolidation of NAS in 2008 on whatever emerges as the best technology. In addition, it will significantly improve future negotiations with the current vendor.
For enterprises that have a number of different NAS storage vendors, it will probably make sense now to consolidate major portions of that on NAS clustering and virtualization solutions.