Next week is EMC World and we’re going to hear a lot more about Federated Storage. In March, EMC put Pat Gelsinger in front of the analyst community and he unveiled the vision of a virtualized, global, federated, cache coherent storage infrastructure…for the cloud. Wow – that’s a mouthful.
I hit the Twitter crowd to see what they thought was meant by Federated Storage- here’s some of what came back:
@gregoryjoconnor said: Looks a lot like I hybrid cloud but for storage…Hybrid storage [haaaa]
@karenmcp weighed in with: When you can look across various repositories to Know What You Have, and Where – and access it. Federated storage
@ianht says: Now along wanders ‘Federation‘ as the latest word to be put through the hype & definition mangler
@stuiesav said: it would be of great help to the industry if some firm definition of terms were made
Okay…here’s my bid at a definition.
Federated storage is the collection of autonomous storage resources governed by a common management system that provides rules about how data is stored, managed, and migrated throughout the storage network. In this definition, storage resources include disk capacity managed by controllers or appliances controlling multiple arrays.
Practitioners should think of individual resources (e.g. arrays) as nodes within the federation. By enabling a loosely coupled set of storage resource nodes to act unilaterally and still be managed centrally, organizations can, over some distance, create networks of virtually limitless capacities, move data and applications globally, eliminate disruptive migrations and dramatically improve recovery.
Technologies are coming onto the market to exploit such architectures, particularly in file and object environments (e.g. scale out NAS, Google File System, EMC’s Atmos, Cleversafe, etc). In block-based worlds, as is often the case, functional and robust software lags hardware. This is what makes EMC’s expected announcement so important as it appears to be coming out of the block-based Symmetrix group.
What Problems Does Federated Storage Solve?
If it works as advertised…only some of the world’s biggest – data migration without disruption, more facile workload and data movement, higher availability at lower cost, faster recovery at distance…
What – no world hunger?
Related Links & Resources





#1 by vingomootime on May 6, 2010 - 12:52 pm
WTF is right. Its like wow dude.
#2 by Sunder on May 6, 2010 - 1:45 pm
Given the direction we are going in, EMC is doing what the world is demanding … Hope it works as it has the potential to get more ideas come out based on it.
#3 by evanplaice on May 12, 2010 - 5:56 am
I smell hype.
Any massive data store (such as a search engine) already has to support a similar architecture to be able to scale to the massive amounts of data they store.
So, they've abstracted out the message handling (queue) to allow the data backend to be split and de-centralized.
How is that such a big deal? I've already proven this on a smaller scale in one of the applications I wrote.
Essentially, the front-end db has a table or tables that acts as the message request queue. A second table acts as a reference lookup table for all of the data entries in the backend databases (eventually this is the part that may grow in size and need to be split into smaller parts because it contains the references to all of the back-end data stores) the third table is a cache that only contains values that are currently in use (this is what the user interacts with on the receiving side of the interaction).
The back-end databases are just normal databases split logically in some way that makes sense to the lookup table in the front-end database.
The front end scales based on how many variables are currently stored in the cache, not how many the database stores.
Honestly, if there's that much hype behind this, hire me to architect it. I'll have a working prototype in no time.
It's really not that complex. It's just a data access abstraction layer using an active database (constantly changing queue and cache) riding on top of a larger passive layer (containing a larger body of indexed data).
Any data systems architect with some database experience worth half his/her salt could tackle this. AFAIK, the large data storage organizations (Goog) tackled this years ago.
#4 by Techwatch on May 12, 2010 - 3:24 pm
its more than a little worrying to have all that info floating around for someone to grab and use