Dealing With A Billion File Cache And A Petabyte Library: Caltech Only Retires Hardware When It Breaks

Become a Member!

Why Register?

Login

Featured Research

Announcements

Technology Events

Home Profile Peers Wiki Activity Groups Feedback

Dealing with a Billion file Cache and a Petabyte Library: Caltech only Retires Hardware When it Breaks

Currently 5/5 Stars.
1
2
3
4
5

rate this

Last Update: Oct 21, 2009 | 12:37

Viewed 11780 times | Community Rating: 5

Capturing electronic satellite and telescope imagery creates a raw data cache with billions of small files occupying over a petabyte. Once that data is processed, it is put into a public archive library also of petabyte-scale. The archive library is the mission-critical part of the infrastructure. If it is lost months or even years of work are lost.

When Caltech does a hardware refresh, new stuff goes to the more mission or operation critical archive library first, and the older storage is pushed down to the raw data cache in a cascade or waterfall scheme. This strategy generates more work, but it provides better reliability at the high end of the infrastructure.

For managing the raw data cache, a sandbox-like approach is used. Caltech’s sandboxes were built with Nexsan ATABeasts, each with about 400 gigabytes of drive capacity, and are now more than five years old. Caltech’s strategy is to never get rid of hardware until it dies. In Caltech’s experience, controllers and chassis don’t go bad -- only disk drives go bad, and these can easily be replaced. Caltech uses a spare parts approach. When equipment comes off maintenance, Caltech takes on the risk and inventories older arrays for spares.

Action Item: Understand the data flow and refresh infrastructure intelligently, using a waterfall methodology to cascade older infrastructure to less mission critical parts of the application and point the newer gear toward the most important parts of the application. The downside is it’s more work this way, but by having commonality across the board, everything is interchangeable.

Footnotes:

Comments on 'Dealing with a Billion file Cache and a Petabyte Library: Caltech only Retires Hardware When it Breaks'

There are currently no comments. Be the first!

Post A Comment

You must be logged in to post a comment, please Sign in

Revision ID	Author	Timestamp	Comment
25656	Wikibon Daemon	09 Oct 21 12:37:05
25655	Wikibon Daemon	09 Oct 21 12:36:43
25200	Niki20	09 Oct 02 09:23:22	moved [[Dealing with a Billion file Cache and a Petabyte Libary: Caltech only Retires Hardware When it Breaks]] to [[Dealing with a Billion file Cache and a Petabyte Library: Caltech only Retires Hardware When it Breaks]]: spelling error libary vs lib
25183	Dvellante	09 Sep 30 22:57:10
25182	Dvellante	09 Sep 30 22:55:09
25181	Dvellante	09 Sep 30 22:54:55
25090	Bert Latamore	09 Sep 30 15:34:07
25079	Niki20	09 Sep 30 12:39:19	Created page with 'Capturing electronic satellite and telescope imagery creates a raw data cache with billions of small files occupying over a petabyte. Once that data is processed it ...'

Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.

Become a Member!

Login

Featured Research

Announcements

Technology Events

Comments on 'Dealing with a Billion file Cache and a Petabyte Library: Caltech only Retires Hardware When it Breaks'

Post A Comment

most recent wikibon articles

latest wikibon blog posts

company profiles

wikibon community information