In this white paper, I will be going over storage technology’s past, present and what might lay ahead in the near future. I will cover how different pieces of storage jigsaw (HW and SW layers) are used to produce a total solution, leading to the future of global virtual storage. Starting with SCSI-1 with 8 parallel lines at 5MB/s. It was conceived in early 80s followed by SCSI-2 with double the performance and SCSI-3 and so on. The rise in performance was tremendous at the time and off course, slow by today’s standard. In early 2000 the need for network storage had risen. This was possible due to higher network bandwidth. Adoption and trust to storage solutions by corporations, gave rise to new requirements. Faster, more manageable and higher demand for storage space were a few demand from larger organizations. Data was scattered in detached storage devices w/ no cross networking. As more storage space was added to DCs, maintaining data became costly for IT departments. To lower the cost of ownership, investment (i.e. IT resources and newer HW) and an easier maintenance of a networked storage, new network storage protocols were concocted. For reliability and availability, a higher network bandwidth allowed for storage pools to mirror themselves. For instance in the event of an earthquake in the western-seaboard , you were sure to have data availability which was provided by the mirrored system on the eastern sea-board. Network attached storage (i.e. NAS) and SAN (Storage Area Network) became prominent storage solutions in the industry. Together w/ newer enterprise SW, they allowed central maintenance and monitoring of scattered storage, more manageable. NAS solutions, relied on variety of underlying IO protocols such SCSI, FC and a variety of network file-systems and protocols but folks on the SCSI side had to come up w/ a competitive network solution and hence ISCSI was born which resulted in SAN. Surfing on top of a higher network bandwidth, SAN competed w/ NAS. They both were reasonable and feasible solutions. SAN basically did the same thing as NAS but access to data over network file-system was provided using block format as against file format on the NAS side. SAN provided storage on the client side (i.e. drive E: or F, etc.) versus NAS where the newer storage became available on the server side. Many companies started delivering these solutions. Lower-layers of storage technologies such as ISCSI, NAS, RAID, FCOE, AOE, etc., became more mature and supported the upper-layer enterprise storage SW applications.
In general storage infrastructure is divided into two distinct portions. The Lower-layer and the Upper-layer. The first portion (Upper-layer apps) consists of the SW Tools and applications that serve as solutions to a variety of market requirement. These apps could be thin provisioning, Dedup, VTL, cloud storage, hadoop (kind of GRID), ZFS, NAS, Lustre and others. The second portion (Lower-layer) is the underlying and often hidden part that serves the upper-layer. Reducing latency is continually the primary intention for developing newer interfaces which would allow solutions that would address market’s demand. This was a must as the Upper-layer applications were not feasible without a faster Lower-layer. To achieve some of the latency problems, eventually parallel storage technologies gave way to point-to-point serial storage technologies such as SAS, SATA and FC. The point-to-point architecture together with other higher bandwidth protocols such as PCI Gen-3, 10/40/100GbE, multi-core CPU, faster memory and best ROC based RAID served the Upper-layer storage applications and helped to achieve faster storage response time. These two layers have been matured to work hand-in-hand as higher performance, scalable, reliable and highly available real-time storage solutions that would satisfy the ever growing market appetite for faster and safer storage.
Today, SAS-3 working w/ PCI-E Gen3 can deliver a theoretical 6.4GB/s (i.e. an 8 lane PCI-E Gen3 at 6.4 GB/s acts as bottleneck for SAS-3 performance of 9.6GB/s. The upcoming PCI-E Gen4 at 12GB/s (i.e. currently targeted in late 2016) will reverse the performance bottleneck order. Other storage solutions such as NVMe is currently (Sep 2014 by Dell and Supermicro) being released which performs as fast as PCI-E bus. It however is limited in storage features and is not as mature as SAS. It’s noteworthy to mention that SAS is based on many years of SCSI thus real reliable.
As newer solutions become available, backward compatibility is a continuous challenge. These underlying technologies have been introduced to the market so quickly that the end-user’s investment has become a bit shaky as newer buses may have issues w/ older ones, hence possible compatibility issues. Customers might have difficulties trying to make their older storage systems work w/ the newer ones. For example although all the recent SAS designs are backward compatible, you still may have to make sure the changes in the newer SAS SSD and HDD’s FW can work with older FW. For example timing differences and new features between the older and newer generations must work seamlessly to avoid incompatibility. It may make sense for DCs to segregate the newer and older storage systems but blanket both of them with the higher level enterprise apps. This would still raise the average performance w/ mixed feature capabilities. Some of the non-mechanical SAS-3 storage devices with much higher performance than their mechanical counterparts can operate at close to 900MB/s. This is almost a required performance in order to realize the SAS-3 infrastructure. Still NVMe SSDs can achieve a 2.9 GB/s and tests have shown 2.3 GB/s IO throughput per 4 PCI-e lanes. Many storage companies add value by providing proprietary solutions w/ specialize features. For example EMC’s dedup, Dell and Overland storage’s intuitive ISCSI (already available in both Linux and windows as target and client), FalconStor’s VTL, Google’s and apple’s storage cloud, Netapp’s NAS and 3PAR’s thin provisioning. Also other free and open-source equivalents such freeNAS, Openfiler, etc. are also available. Storage warehousing where your data can be saved on central network based storage (start of storage cloud) has revolutionized an intuitive storage solution for corporations. This has led the technology providers to realize newer ways to earn revenue from big corporations. Corporations in turn realized more revenue due to lower cost of ownership as well. NAS and SAN were amongst the type of technologies that enabled providers to just do that. However again, a central and easily maintained storage warehousing required underlying networking and storage protocols with even higher throughput. Today cloud storage is as intuitive and common storage for average folks as it was prior to 2000 for your direct attached internal disks. Cloud storage’s virtual space replaced old direct attached physical storage. This eliminated native HDDs which reduced power usage. Therefore cheaper and abundant memory modules, elimination of large HDDs and power units resulted in smaller computing devices such as Ipads, etc. This provided the individual users with faster and available access to data using smaller devices.
In the few of my articles since 2001 and based on over 30 years of R&D in all disciplines of storage technologies, I have envisioned future concepts of possible storage technologies that was yet to be done and thus far I have been on the right track. For example back in 2000 I had envisioned a centralized network storage accessibility from anywhere at any time from storage warehouses by 2006. I also envisioned a mature version of warehousing for masses (i.e. Cloud storage) by 2012. Today we are starting to see intuitive network storage pool accessibility using techniques such as Cloud Storage. Cloud storage helps individuals to maintain and archive their data anywhere at any time. In my 2nd article in 2009, I envisioned convergence. True network storage centralization through virtualization resulted in centralized virtual storage using converging technologies (i.e. http://wikibon.org/wiki/v/Convergence_to_Green_Computing -or- http://www.icc-usa.com/insights/network-storage-past-present-and-future ).
Now I envision a PWCloud (i.e. Planet Wide Cloud - Patent filed) infrastructure w/ earth-wide connectivity for all of us as singular entities by 2020. TO BE DONE.
This would be the next generation of storage technologies that can benefit over 40 years of matured storage protocols. The next generation will simply be a virtual planet-wild storage. Remember that all complexities begin w/ simplicity. I will discuss a few of these below. Latest storage enterprise SW solutions can scale 1000s of inexpensive but powerful server nodes which has replaced the older super expensive vertical solutions that used single mainframe servers. Older solutions were inefficient, power hungry w/ many points of failures. Today’s solutions have reduced the cost of ownership by increased scalability, reliability and lower cost of maintenance. This would be the backbone of the PWCloud. Yes, I am an old timer and can appreciate the leap from old mainframe to today’s tiny storage farms that are glued as a single virtual provider of storage space but even the solution back in 2000 where NAS and SAN with corporations as end-users, needed lots of optimizations and although at that time they were a much better solution but were still not mature. Today’s solution are mature enough that have surpassed the corporate entities as end-users and are now provider of storage space to single individuals (i.e. iCloud, google cloud storage, etc..).
The PWCloud could be defined as the next generation of internet connectivity where huge real-time planet-wide virtual storage repository is accessible to everyone. I emphasize real-time or near real-time. PWCloud’s two main features will be world-wide accessibility and real-time. End-user’s data will reside on physical devices, anywhere but can be monitored as singular virtual storage. Data will be accessible and available, instantaneously. In a nut shell each end user will be a client node and PWCloud will be the server, serving end-users with their daily life’s data. The intuitively and real-time access to data of using these services for end-users must be as easy as breathing.
Let’s now briefly dive into PWCloud storage. Achieving PWCloud storage is indeed challenging. It requires solving complicated problems by developing new protocols to monitor and pass data between variety of layers for massively complicated, intricate storage cloud across the planet, in air and space disseminating data to every individual thru a variety of devices such as today’s IPAD, IPHONE, wearable or biological devices. Tracking of data can be done using numerous satellites and/or high altitude balloons as 1st and 2nd level FCSCs (i.e. Floating Cloud Storage Cache (world-wide cache extension – Patent pending). Briefly FCSC will simply act as giant floating data cache repositories for the PWCloud Storage and will help expedite the provision of look and feel of the real-time access to data for the end-users. Back in 90s we used swap partition in unix based OS where additional space could become available from a segment of hard drive. A similar type of provisioning of new cache space can be found in products such as Cachecade by LSI(Avago) or maxCache by Adaptec(PMC) which extends and add new cache space to the existing cache.
In PWCloud storage, metadata and possibly data could reside on FCSC nodes. It will complement the existing permanent earth based storage cloud where the remaining data will be held by extending fastest storage space thru utilization of similar type of technologies mentioned above. Depending on location of the end user, more space might be allocated on any of the data points (earth based cloud, 1st or 2nd FCSC). To optimize real-time access to data, other considerations would require that data in all layers and protocols have tuning and monitoring fields which will allow worldwide adjustment of performance and storage availability. Of course an optimized version of existing and matured technologies that provide today’s scalability, reliability and availability will help with the construction of the PWCloud-storage. A world-wide cache extension will be a requirement for PWCloud storage.
I know some of you may have privacy concerns but we as engineers are problem solvers and can come up with ways to prevent miss-use. For instance, one method could be to ensure individual encryption schemes that are unique and strictly chosen per individual end-users and handle the privacy issues. It could be deleted and repurchased by each individual periodically. As the people on our planet grow more digitally connected, we will need intuitive and real-time access to data for commerce and simple connectivity. Of course being connected would be optional just like internet today.
In conclusion, I believe that if we continue the amazing trend of exponential boost in performance of underlying layers, that we should be able to achieve PWCloud-storage or internet-storage by 2020 or even earlier. There will be laws that have to ensure data privacy and prevent virus injection. The monitory issues would be minimal due to large number of users (i.e. over 8 billion people by 2020 ?). The cost of global connectivity devices such as satellites and/or balloons could be paid by charging a minimal amount from end-users. Underlying layers and SW which is suited to the PWCloud’s global architecture will work in space, atmosphere and on earth in many warehouses, to provide intuitive and fast access to private storage space. Technology providers and corporations will realize even larger revenues due to higher number of end-users.
So ultimately all data should be controlled and constantly tuned by a single master node using some sort of AI logic to provide continuous and real-time access. May be fuzzy logic?