There is a rapidly growing segment of data that needs to be available 24x7 to all the stakeholders of an organization, customers, partners, suppliers, and community. This data needs to be optimally placed, balancing user response time and bandwidth usage against the cost of holding multiple distributed copies.
File systems such as those from Amazon (Amazon S3) and Google (GFS) spread the data over the network with multiple servers, storage devices and data centers, using commodity hardware and software. The file system is designed to expect failures from the technologies and the software, and to be able to recover from any such failure.
If this technology is solid, does it obviate the need for this class of data to be backed up? Time will tell if tape is dead for this class of data, but if so, it introduces a significant potential saving in storage costs and complexity. What is clear is that there are many new storage topologies that will be possible with different combinations of storage and network technologies.
Action Item: It will take some years before the market decides what combinations of storage topologies are optimum, as vendors and customers try different approaches to melding file system, disk storage, tape storage and network technologies. Organizations should ensure that experience is gained with multiple storage approaches, including emerging clustered designs for so-called cloud computing. These technologies should be tested at all stages from application design to operational implementation. Storage needs to be organized to encourage different approaches, including outsourcing storage to third parties.
Footnotes: