Capturing electronic satellite and telescope imagery creates a raw data cache with billions of small files occupying over a petabyte. Once that data is processed, it is put into a public archive library also of petabyte-scale. The archive library is the mission-critical part of the infrastructure. If it is lost months or even years of work are lost.
When Caltech does a hardware refresh, new stuff goes to the more mission or operation critical archive library first, and the older storage is pushed down to the raw data cache in a cascade or waterfall scheme. This strategy generates more work, but it provides better reliability at the high end of the infrastructure.
For managing the raw data cache, a sandbox-like approach is used. Caltech’s sandboxes were built with Nexsan ATABeasts, each with about 400 gigabytes of drive capacity, and are now more than five years old. Caltech’s strategy is to never get rid of hardware until it dies. In Caltech’s experience, controllers and chassis don’t go bad -- only disk drives go bad, and these can easily be replaced. Caltech uses a spare parts approach. When equipment comes off maintenance, Caltech takes on the risk and inventories older arrays for spares.
Action Item: Understand the data flow and refresh infrastructure intelligently, using a waterfall methodology to cascade older infrastructure to less mission critical parts of the application and point the newer gear toward the most important parts of the application. The downside is it’s more work this way, but by having commonality across the board, everything is interchangeable.
Footnotes: