This whitepaper is aimed at helping the audience write Requests For Proposals (RFP) for building a complete storage infrastructure.
The below should offer a framework on which I expect the community to build upon.
The Main Requirements
The RFP is written with the following prerequisites in mind:
- There is a central site (think of it as your company's HQ)
- There are two separate datacenters or computing rooms in the central site campus
- There are 4 small satellite sites (think of them as your remote offices, maybe in another city or another continent)
- There is a DR site (this could be one of the satellite sites or a collocation facility
- The name of the company is NewCo (although I recommend to do more than just Search and Replace)
- The RFP is trying to stay vendor-neutral in order to give most vendors a fair chance at bidding against it
And the RFP is requesting the following items:
- A fast central storage solution (two units that are synchronously replicated which each other) to support high demand applications
- A bulk storage solution to support high volume solutions
- A backup solution (disk to disk and tape as an option)
- A midrange solution for the satellite sites asynchronously synchronized with the central site
- An asynchronously replicated DR site
- A servers, desktop and laptop backup solution
The RFP does not address the following issues
- No requirements made with regard to performance. This is intentional since performance requirements are hard to define and environment specific. You could ask vendors to quote their http://www.storageperformance.org numbers but EMC will not be able to abide.
- The RFP is trying to stay vendor-neutral in order to give most vendors a fair chance at bidding against it
- The RFP is kept fairly generic
There are some other assumption in the contents of the actual document, you will need to read and customize it according to your needs.
The RFP
- 1. The central storage system will need to support fault tolerant storage architecture for the two datacenters on the NewCo campus.
- 1.1. Fault Tolerant should be understood as RPO=0, RTO=0. Failure in any of the two sites should allow the data to be available in the other one with no interruption.
- 1.1.1. Solution should be based on synchronous replication or similar technologies
- 1.1.2. The system should allow high configuration granularity where some subsets of data are replicated in a Fault Tolerant manner and some are not. Granularity should be at least at LUN level.
- 1.1.3. Interconnect between datacenters is assumed to be dark fiber
- 1.1.4. Distance between datacenters is assumed to be 10km.
- 1.2. The systems should allow expansion over 1PB of usable capacity
- 1.2.1. For the purpose of this exercise the central storage should have 120 TB available
- 1.2.2. For the purpose of this exercise the storage solution should assume 5%/45%/50% disk speed tiers
- 1.2.3. The growth rate for data is assumed to be 50% per year
- 1.3. The central, storage system needs to support at least (32) 8 Gb/s Fibre Channel ports in each location
- 1.3.1. The central storage system needs to support at least (8) iSCSI ports in each location
- 1.3.1.1. iSCSI ports can be provided through a separate gateway
- 1.3.1.2. The iSCSI ports should be 10Gbps
- 1.3.1.2.1. The central storage solution needs to support network specific file access protocols (NFS, CIFS, etc)
- 1.3.1.2.2. The file system protocols can be provided through a NAS gateway
- 1.3.1. The central storage system needs to support at least (8) iSCSI ports in each location
- 1.4. The central storage system needs to support de-duplication of active data
- 1.5. The central storage system needs to support thin provisioning
- 1.6. The central storage systems need to support the use of SSD modules.
- 1.7. The central storage systems need to support integration with Citrix Xenserver and Microsoft HyperV, VMware
- 1.8. The central storage systems need to support snapshots
- 1.9. The required availability of the central storage solution needs to be 99.999%
- 1.1. Fault Tolerant should be understood as RPO=0, RTO=0. Failure in any of the two sites should allow the data to be available in the other one with no interruption.
- 2. The storage solution will allow for any number of satellite sites
- 2.1. A satellite site is defined as a secondary site with a local SAN solution but which is connected to the central storage system
- 2.2. The satellite site will allow partial or total replication of data to the central site asynchronously
- 2.3. For the purpose of this exercise the number of satellite sites is assumed to be 4
- 2.4. Satellite sites are assumed to be connected by either LAN or WAN Ethernet links
- 2.5. Satellite sites should support expansion up to 50 TB of usable capacity
- 2.5.1. For the purpose of this exercise each satellite storage should have 15TB available
- 2.5.2. The growth rate for data is assumed to be 20% per year
- 2.6. The satellite storage system needs to support at least (2) 4 Gb/s Fibre Channel ports in each location
- 2.7. The satellite storage system needs to support at least (2) iSCSI ports in each location
- 2.8. The satellite storage systems need to support integration with Citrix Xenserver and Microsoft HyperV, VMware
- 2.9. For the purpose of this exercise a satellite site is assumed to have 20 physical servers
- 2.10. Replication between satellite sites and central site should be performed in a bandwidth efficient manner (using compression, source de-duplication or other methods)
- 2.11. The required availability of the satellite storage solution needs to be 99.999%
- 3. The storage solution will need to support at least one DR site for data replication.
- 3.1. Interconnect between main site and DR site is assumed to be redundant WAN links
- 3.2. Solution should be based on asynchronous replication of data or similar technologies
- 3.3. DR site is assumed to be 120km away
- 3.4. The system should allow high configuration granularity where some subsets of data are replicated to the DR site and some are not. Granularity should be at least at LUN level.
- 3.5. The systems should allow high configuration granularity where different subsets of data are replicated with different frequency. Granularity should be at least at LUN level.
- 3.6. DR site must be able to replicate and store all information from the central site
- 3.7. Operation from DR site can tolerate a reasonable amount of performance degradation
- 3.8. The DR storage system needs to support at least (16) 4 Gb/s Fibre Channel ports in each location
- 3.9. The central storage system needs to support at least (4) iSCSI ports in each location
- 3.9.1. iSCSI ports can be provided through a separate gateway
- 3.10. The required availability of the DR storage solution needs to be 99.999%
- 4. The storage system will include a backup and archival system
- 4.1. The purpose of the backup system is to preserve application consistent point in time copies of the data from the central and satellite sites
- 4.1.1. The backup system should allow high configuration granularity where different subsets of data are backed up with different RPO. Granularity should be at least at LUN level.
- 4.1.1.1. The backup system should be tape or disk based
- 4.1.1.2. The aim of the backup system is to operate with minimum or no manual intervention
- 4.1.1.3. The backup system needs to support the option of copying the full set or subsets of the backup data to tape using a manual or/and automated procedure
- 4.1.1.4. The backup and archival solution needs to provide the necessary functionality to backup other devices such as
- 4.1.1.4.1. Server systems with DAS
- 4.1.1.4.2. Desktop devices
- 4.1.1.4.3. Laptop devices in both LAN and remote access setups
- 4.1.1.4.4. The backed up data needs to be stored de-duplicated
- 4.1.1.4.5. The system must provide the ability to access backed up information to end users in an easy to use and automated manner
- 4.1.1. The backup system should allow high configuration granularity where different subsets of data are backed up with different RPO. Granularity should be at least at LUN level.
- 4.2. The aim of the archival system is to store historical application data that is hosted on the central site in a cost effective manner and serve as a lower tier storage device
- 4.2.1. The access to the archived data needs to be fully automated
- 4.2.1.1. The archival system must provide the ability to access archived information to end user in an easy and automated manner
- 4.2.2. The migration of the data from the central storage to the archival system and back needs to be automatic and/or manual
- 4.2.3. Data migration strategies should be fully configurable and have the ability of being schedule bound
- 4.2.4. The system should have the ability of identifying potential candidates for archival on a sub LUN level
- 4.2.5. The archival system can be either a separate component of the infrastructure or a subcomponent of the central storage
- 4.2.6. The required availability of the backup and archival storage solution needs to be 99.999%
- 4.2.1. The access to the archived data needs to be fully automated
- 4.1. The purpose of the backup system is to preserve application consistent point in time copies of the data from the central and satellite sites
- 5. Other components
- 5.1. The infrastructure design should include all necessary active components to achieve the desired functionality
- 5.2. Components such as SAN switches and necessary software, transmission modules, SFPs, need to be included
- 5.2.1. For the purpose of this exercise we assume that up to 250 physical machines will be connected to the central storage.
- 5.3. Assumptions with regard to the passive components such as optical fiber, Ethernet cables and racks need to be documented.
- 5.4. Assumptions with regard to the power requirements need to be documented (including power and cooling requirements of each component of the described solution).
- 5.5. Design should exclude any single points of failure in the connection path up to the server. All solutions need to support multipathing for Windows, Linux and virtualized environments.
- 6. Software and licensing
- 6.1. All software, firmware and licenses necessary to achieve full functionality needs to be included
- 6.2. All software, firmware and licenses to configure this functionality needs to be included
- 6.3. All Software update services need to be included in the BOQ weather FOC or delivered trough a service contract.
- 6.3.1. This includes product specific firmware updates
- 6.3.2. This includes other software related to the presented solution
- 6.4. All assumptions with regard to the existing storage infrastructure need to be documented
- 6.5. A unified management software for the entire storage solution that is capable of managing all aspects of the central, satellite, backup and archival and DR site systems needs to be provided
- 6.6. Management software can present reports on all aspects of the storage solution
- 6.7. Management software allows secure delegation of rights to manage components of the storage solution to various users
- 6.8. Authentication for management software should be integrated with Active Directory
- 6.9. If software requires special modules or licenses in order to interface with third party software, the modules or licenses must be included
- 6.9.1. NewCo uses Microsoft Server System, Oracle database, MS SQL databases, Linux and Linux based OS, Citrix XenServer, Microsoft HyperV
- 7. Support
- 7.1. The storage architecture will require a 7 days per week with 4 hours response time remote support
- 7.2. On-site support is required with a 12h response time
- 7.3. Spare parts for critical components that have the potential of disrupting availability or significantly impacting the overall performance of the system need to be available on site
- 7.4. Replacement parts for other failed components need to be delivered to site within 24h
- 7.5. The vendor is expected to provide the option of receiving automated alerts either from the storage components themselves or from NewCo’s ServiceDesk in order to insure proactive detection of potential failures and shipment of replacement parts
- 7.6. The storage vendor commits to training 4 full time employees in all aspects of configuring and managing the storage infrastructure in order to bring them to a proficient operational level.
- 8. Other technical requirements
- - You could request here the use of specific components, brands, etc.
- - You might also request integration with some of your current applications.
- - You might want to request the solution to be certified with certain applications or other hardware infrastructure.
- - Any power requirements, space or physical size limitations could also be mentioned here.
- 9. Performance
You should quote here your own numbers if you have any specific requirements.
- 10. Other requirements
- 10.1. LUN Masking: Logical Unit Number
- 10.2. SAN Zoning (Soft Zoning, Hard Zoning, Broadcast Zoning)
- 10.3. WWN zoning: World Wide Names
- 10.4. FCAP: Fiber Channel Authentication Protocol
- 10.5. FCPAP: Fiber Channel Password Authentication Protocol
- 10.6. SLAP: Switch Link Authentication Protocol
- 10.7. FC-SP: Fiber Channel Security Protocol
- 10.8. ESP over Fiber Channel
- 10.9. DH-CHAP: Diffie Hellman - Challenge handshake authentication protocol
- 10.10. Support for Multiple Anti Virus Engines to include but not limited to: Sophos, Kaspersky and Symantec
- 10.11. File Integrity Monitoring
- 10.12. Forensics tools support namely Encase
- 10.13. Support for Data Encryption (provided support list of encryption protocol)
- 10.14. Data classification and labeling
- 10.15. Security functionality for the followings:
- 10.15.1. Integrity
- 10.15.2. Availability
- 10.15.3. Confidentiality
- 10.15.4. Authentication
- 10.15.5. Authorization
- 10.15.6. Accounting
- 10.16. Switch Security Feature
- 10.16.1. Management Policy Set:
- 10.16.2. WWN
- 10.16.3. Management Access Control
- 10.16.4. Device Connection Control
- 10.16.5. Switch Connection Control
- 10.16.6. Zoning Configuration
- 10.16.7. Password
- 10.16.8. Community Strings