We've been getting some direct emails and DM's on Twitter about the FAST piece Nick Allen posted. This article is being posted to capture them so that we can analyze them more comprehensively.
Questions on the Questions
- Do you really believe the answer given to question 1? If so, you must explain how a ‘FAST’ I/O on arbitrated loop does not block other devices waiting to win arbitration on that loop.
By managing WHEN reads/writes are executed, it is possible to schedule them so that they do not have a meaningful impact. And though you seem to doubt my earlier point, it is indeed the rare situation that channel arbitration is a bottleneck, especially on Symmetrix where you have 16 active/active back-end channels in the smallest system (128 in the largest).
- Question 3, how is there ‘no impact’ to performance if a block is demoted? (Wikibon comment: this seems to be a 'nit.' Sure there will be an overall performance impacts, but the user shouldn't see objectionable degradation in application performance when less frequently accessed data are demoted.)
The primary objective is to move stuff "down" that is rarely accessed as a cache miss, and to concentrate as many cache-miss I/Os on Flash as possible. Even if this isn't 100% perfectly laid out, 90% "flash hits" at less than 1ms and 10% SATA hits at 20ms will effect an AVERAGE response time of 2.9ms - significantly faster than a 15K rpm drive
- In question 4, you correctly state “It's not designed to move data up and down the hierarchy heuristically based on usage patterns over a period of time to optimize performance in an anticipatory fashion.”. But this is the very premise on which EMC sells the ‘feature’! (Wikibon comment: certainly not true of technical people at EMC). Can you explain the juxtaposition?
Not sure who you're accusing of mis-positioning FAST v1 as dynamic enough react to rapid changes in workloads. While FAST v1 can conceivably make changes more than once a day, it is likely to be more practical to enable only 1 move per LUN per day, or perhaps even per week.
- Lastly, in your ‘action item’, you ask: “Is FAST v1 worth it for customers? Yes, if you can save money by moving things down the hierarchy onto cheaper disk or avoid using under-provisioned FC drives.” How exactly does one save _money_ by doing this? After all, you must buy not only more expensive disk but the same amount (or more) of cheaper disk in order for the feature to work. You end up buying at least _twice_ the disk you would normally, under a business-value calculation of the importance of the data and therefore what tier on which it should reside. I would argue that you save space, not money. There is a subtle but very important difference. This is a zero-sum game, after all. (Wikibon comment: Interesting point and one that requires further consideration. Over time, as experience increases users should see savings but there's a learning curve there).
Unlike like approaches using Flash-as-cache, there is no need to duplicate any of the storage used with FAST - there merely needs to be enough unused space to handle a "move". Symmetrix FAST also supports "swap", which requires zero extra space. More importantly, customers are ALREADY saving money with Flash drives, avoiding the purchase of numerous short-stroked 15K FC drives by consolidating critical components of their applications/databases on Flash. FAST v1 automates the process of identifying promotion/demotion candidates, allowing customers to simply install a small amount of Flash and then lett the system figure out what should go there.
Block movement within LUNs is a non-feature, truth be told. It does not save money whatsoever; in fact, it is explicitly designed for vendors to make more money by selling more disk to customers in anticipation of the blocks moving around. Rather, it is a space-time tradeoff; save space on tier N for a trade of time – worse response on tier N+1. Block movement within LUNs will also always be a non-feature, until the time where file systems understand they have different tiers of storage available for the blocks and intelligently manipulate the metadata. The very concept of believing that last-access-time – which is the only information used to determine a block movement - translates to business value that is illusory.
It is a provably inferior technique for databases. In fact, for many databases, the technique is exactly opposite of what one wants. If you hit a block in a random-access database, chances are you won’t hit that block again for quite some time. So, the simplistic LRU algorithm employed by EMC and CML is dead wrong – MRU is in fact optimal. You should demote the block just hit and get it out of the way, much like write caching. I'm not sure where you got the notion that EMC uses LRU for FAST v2. Nor do I think MRU is the right answer either, especially in a large-cached array like Symm (where the MRU stuff tends to live in DRAM cache most of the time anyway).
Also, consider this. If users backup their files once a week, the feature will NEVER kick in. If you doubt this, read the CML documentation carefully. Both Symm and CLARiiON already today recognize the I/O patterns of backups (which look very different than the random I/O characteristics of OLTP), and both adjust their resource allocations (prefetch, cache flushing, etc.) to accomodate backups - without sacrificing performance for the "normal" workloads. You can expect that knowledge and behaviour will have a significant influence on FAST algorithms.
I am somewhat surprised you guys have been layering on the praise for EMC & CML and this non-feature. It’s great for CML – they sell more disk – but it’s not great for users. Many CML users end up disabling the feature because it is so intrusive on normal I/O.
Check out the history. (Wikibon comment: Worth a read here. Most users we've talked to from Compellent love the feature, but none are near the scale that FAST users will be at. This user seems to be much more sensitive to the issue).
The moral of that story is this – the user felt the need to buy more disks in order to mitigate against the performance degradation invoked by the ‘feature’. That, of course, is the entire raison d’etre of the feature – sell more disk to the unsuspecting user, under the guise of ‘saving money’. Complete smoke and mirrors. (Wikibon comment: The fact is FAST-like features do reduce the need to manually move stuff around, which does add business value). If indeed Compellent's implementation suffers from the reported issues, that doesn't necessarily mean that ALL implementations will suffer the same. Again, EMC FAST doesn't require duplicated storage, and its being built by the company that has lead the way in large-cached external storage for nearly 2 decades.
What is really needed in the industry is an analytic engine that operates at the file system level to determine file and volume movements within tiers, not block movement within LUNs. There are some examples – none of them very sophisticated (e.g. Abrevity, Commvault). (Wikibon comment: This warrants some investigation). This is PRECISELY what Celerra FAST does (file-level FAST across multiple tiers).
Further Criticisms
As for question 2, it’s an interesting point. Does low access mean low value, or low importance to the business? I would argue there is little or no (i.e. negative) correlation between last-access-time and business value. Yet, the entire operation of the ‘feature’ is about last-access-time. (Wikibon comment: this does keep things simple and enable automation, which is a good thing).
Consider the very real and common case of month-end processing. If you allow FAST to move data down tiers, when the month-end processing kicks in you have blocks on slow storage. It takes, as you correctly say, a batch job to move it back up. Your database is dog-slow until that happens, and it doesn’t happen until at least the next day, or perhaps longer. This is yet another application of the Murphy principle, or perhaps the Law of Unintended Consequences. (Wikibon comment: This was one of our big questions coming into the announcement and it's clear FAST is not designed to deal with this).
In other words, FAST guarantees that your storage is in the worst possible place at the most important time –- when you need it. Just because data is inactive does not mean it is not valuable and does not need fast response times. (Wikibon comment: Users will clearly have to be sensitive to this and there's little doubt EMC will work hard to make sure FAST is not degrading performance.)
Really, if you want to move files around based on last access time, do something like this:
“find . –depth –atime +168 –exec mv –f {} /vol1 /vol2 \;”
the exact syntax may be wrong, but this is just a Unix command to move files not accessed in the last 168 hours (one week) from where they reside to a different volume. (Wikibon comment: Sounds like a simple solution-- simple is good). You could be even more clever and leave behind a symbolic link (stub) so the user is unaware that his/her file moved around. It’s FREE, and it works on every Linux/Unix system on the planet. There is the equivalent for all Windows systems as well.
Over time, if you can really save space, sure, you can save money by deferring purchases of incremental space of a given tier. No argument. But with techniques like FAST, you are not saving space so much as _trading_ space. There is no free lunch – you must store the block somewhere.
If the user really wanted to save space, they would space-reduce their data – i.e. compression, dedupe, incrementalization. The problem is that FAST trades response time for space, it just doesn’t _purely_ save space. I’d much rather have a compressed block on tier 1 than a raw block on tier 2. I save _both_ money and time that way. (Wikibon comment: Compression is coming to primary disk. Storwize is a good example -- but it's block only -- at least today. With today's processing power, there's great potential for compression on primary).
Ten years ago compression was a bad tradeoff – the CPUs in servers were way too slow, and compression led to poor performance. Today, you can’t even notice the difference in all, but the most performance-starved applications (PSAs) when using compressed versus uncompressed files. Try it for yourself.
I know Symm customers have to move data around manually, and it’s a pain for sure. I’d rather see EMC move away from BIN files –- which are the root cause of the problem here –- instead of putting yet another layer of software and management on top. Still, consider this: why doesn’t IT engage the business units to _really_ determine business value of data, and make tiering choices appropriately? (Wikibon comment: it will never happen -- too cumbersome, too non-automated. Chalk talk with data classification never works, it has to be automated).
Plus, again, files, not blocks are the object of concern here. Sure, to your point, Symm customers are ‘excited’ about the feature, because they think it saves them administration time. Well, in one sense it does, because they don’t have to sit over a hot control center stove, slaving away moving the metas and redefining the hypers. But does it help the business achieve its goals? All it does is sell more disk, which is the byproduct of this feature that users should watch out for.
You see, if I have a near-endless pool of _all_ tiers, I look great and can sit back and let the machine have fun juggling the blocks. No worries. I don’t save time, space or money, but what do I care – that’s the business unit’s problem, not mine as an IT guy. As a Symm admin, I’m happy, sure.
Over resourced is not optimal in my book, though. Efficiency is optimal. Plus, I could buy into the feature more if the machine had fabric in the backend, but arbitrated loop? Please.
Sorry, all Compellent customers do not love the feature. They love the concept of the feature, but in the practical sense, once they see how it works many don’t like it at all, if they are really interested in saving money, that is. The smoke and mirrors look cool, but the substance underneath is lacking. There are great alternatives available to them that really do save money, space and time.
I have yet to see an EMC or Compellent user actually rip out tier 1 disks, return them and get their money back, once purchased, because they are saving so much space. Am I wrong? (Wikibon comment: Over time the user can exploit that free space. The problem is that the asset has been depreciating over time).
Do you have an authoritative study that spells out, in unbiased terms, the TCO of using this ‘feature’? Has any user on the planet actually taken the time to see what they’ve actually spent for disk, given all the reserved space that Compellent forces you to acquire and spin, but not use? (Wikibon comment: No...but thanks for the suggestion).
If you tell me that they ‘progressed’ data down to tier 3 and are happy, I will tell you the data never belonged on tier 1 in the first place. I could have saved them more money with a little analysis. (Wikibon comment: The analysis doesn't happen unless it's automated).
You’d think that we, as an industry, would have moved past that by now.
Action Item:
Footnotes: