How do you run capacity planning for Big data? Capacity planning should be explore more than just calculating the percentage and experience.
It should be more mathematical calculation of every byte of the data sources coming into the system. How about designing a predictive model which will confirm my data growth with an accuracy until 10 years? How about involving business to confirm the data growth drivers and feasibility of future born data sources ? Why don’t consider compression factor and purging into the calculation to reclaim the space for data grow. Why we consider only disk utilization and why there is no consideration about other hardware resources like memory, processor, cache? After all, it is all about data processing environment. I think this list of consideration can still grow…. Explore complete write up on: http://datumengineering.wordpress.com/2013/02/15/how-do-you-run-capacity-planning/