Information dispersal is an emerging technique for data protection and security, particularly for multi-tenant environments. What is information dispersal, how does it compare with traditional data security techniques like encryption, what are risks, benefits, advantages and disadvantages to using information dispersal, and how should the CISO think about using these technologies to satisfy the requirements of confidentiality, integrity, and availability of information?
This research examines these questions by defining and comparing IDA technologies to standalone encryption and key management techniques, and discussing the impact on how chief information security officers (CISOs) have traditionally established security over digital information. Although the knowledge and theory around information dispersal has been around for many years, commercial availability are only just coming to the market, driven primarily by improved processing performance, innovative applications, and the potential security and operational value when applied to multi-tenant or otherwise shared resources (e.g., virtualized infrastructures, public cloud services, private cloud).
This note uses the Cleversafe implementation of information dispersal (IDA) as the basis for analysis. An overview of a Unisys implementation is included for reference.
What is Information Dispersal
Information dispersal is a data-handling technique and extension made to forward error-correction schemes. A forward error-correction scheme is an algorithm for computing and adding redundant information to a block of input which appends redundant information to a block of data, allowing for recovery of the data in the case it is compromised (e.g., erased, lost, otherwise not available to an application). Common examples for computing forward error correction include parity bits and Reed-Solomon codes. Reed-Solomon techniques have been known since 1960; however computing systems at that time lacked the power to handle the computational overhead of the techniques effectively.
A Reed-Solomon code could be used, for instance, to compute five additional symbols for redundancy for every 10 input symbols. If 10 bytes go into this forward error correction scheme, 15 bytes come out. The properties of Reed-Solomon allow the original 10 bytes to be recovered even if up to 5 bytes of the block are lost. When used as part of an IDA, the output symbols generated from a forward error correction scheme are split and dispersed to a different location (e.g., a drive, a network note, or storage location). Forward correction schemes like this have seen wide-spread used in the digital media industry (which is what enables compact disk content to be read even when the physical device is scratched)
Basic Information Security Model for Multi-tenant Environments
How does IDA apply to a traditional security model for data protection? CISOs think about security as layers of protection. In essence, these layers define a model for designing and implementing security controls.
The model above has nine major security layers applicable to multi-tenant/cloud computing environments. Each layer references a specific set of controls and security measures. Information dispersal, as a set of technology and data protection capabilities, fits within the data assurance layer.
The data assurance layer has a primary responsibility for the confidentiality, integrity, and availability of data, and can offset requirements and costs of more traditional security controls including the four components of identity and access management (credential management, provisioning, authorization, and authentication), and environmental security (backup and recovery). The potential cost offset in backup and recovery is unique to IDA implementations and is one of the primary advantages of IDA over traditional encryption systems, as described below.
How is Information Dispersal Different From Encryption?
IDAs are different from encryption. By design, IDAs split data into a number of pieces, but do not encrypt, and allow data to be recovered from some threshold number of pieces. In a typical implementation, an installation might create an IDA environment which is 10-of-15. This requires that 10 data streams, nodes, or storage locations out of a total of 15 be accessed to recover data. Data recovered from fewer than 10 streams, nodes, or storage locations is unrecognizable and has little to no value to an application, end-user, or side channel attack (e.g., root access to a server, malware).
In cryptography, encryption is the process of transforming information (referred to as plaintext) using an algorithm (called cipher) to make it unreadable to anyone except those possessing special knowledge, otherwise known as the encryption key. The result of the process is encrypted information (or ciphertext). A user, application, or process must have knowledge of the key to decrypt data and return it to its original form. Encryption systems are divided into two basic forms- symmetric and asymmetric. In a symmetric system, the same key is used to encrypt and decrypt data. In asymmetric systems, different but related keys are used in encryption operations (public keys encrypt, private keys decrypt). In both systems, the encryption algorithms can be public knowledge, but the keys, specifically the symmetric and other private keys require high levels of security, must be kept secret.
Security Advantages/Disadvantages
An IDA system by design is more resilient to device failure and data loss, which translates into an availability benefit, than a standalone encryption scheme. On the other hand, encryption is considered more resilient to brute force attacks. With encryption, if either the keys or the ciphertext is lost, the data will be unrecoverable. With a 10-of-15 IDA configuration, up to five systems could be lost and the data is still recoverable. This failure tolerance ratio would be equivalent to making six copies of an encryption key and data. The downside of copying data is that with each copy you create another target for attack.
However, most IDA implementations provide no confidentiality or integrity gurantees, and if they do, they may not be on par with traditional encryption systems. Many IDA implementations are optimized for computational efficiency in a way that implies such that each location might contain some fraction of your original data in plaintext.
To overcome this limitation, the Cleversafe implementation combines two algorithms, the All-or-Nothing Transform (discovered by Ron Rivest in his 1997 paper "All-or-Nothing Encryption and the Package Transform") with an Information Dispersal Algorithm. The AONT adds the protection that without possession of the entire package the data cannot be recovered. Cleversafe pre-processes data through the AONT algorithm, and then through the IDA. This ensures that a threshold number of slices is required to recover the data. Therefore confidentiality, integrity, and availability properties may be achieved without making copies or keys.
Secret Sharing vs. Key Management
The combination of these techniques results in a "Secret Sharing Scheme", which securely splits information into some number of shares, of which a threshold number are required to reconstruct that information.
On the other hand, widely deployed, traditional encryption algorithms including AES securely scramble data in place and are considered uncrackable by any conceivable technology. So why do things any differently? The answer is that keeping data protected using encryption is a matter of secure key management. How is the key guarded against attack or loss? How often is the key stored, where is it stored, who has knowledge of where the key is stored, who has access to the key, and what would it take for an attacker to get access to it? Answers to these and other questions about key management protocols define the strength of the encryption scheme.
Using a secret sharing scheme for storage obviates the need for a separate key storage system. There are no keys to be managed - just shares of data across a configurable number of nodes. Each node and data residing at the node can be configured with strong authentication and to not have the knowledge of all other nodes and data.
Effective key management is addressed by information dispersal. Using the AONT in conjunction with an IDA, data is as securely protected as if it had been encrypted but key management risks are reduced. Here’s the argument:
- A central domain of control only requires one disgruntled employee to break. With a key management system, an attacker doesn't need to coordinate a complex simultaneous, cross country attack, he just needs to steal one box.
- Most key management systems run only a single operating system. Dispersal can work over a heterogeneous dispersed storage network. Under such a configuration, any compromise would require attacking multiple operating systems, greatly reducing exposure to malware attacks, for instance. In some implementations, this could be considered a less common attack scenario than a single attack against a single platform used by the key management system.
- To achieve security according to the security model presented above, the data must not only be kept from attackers but kept available to authorized parties. For a system using keys, this means backing up the keys to multiple locations, creating more attack vectors. With dispersal, one can survive and recover from multiple compromises without any data having been exposed. In addition, old slices which were compromised may be made obsolete to an attacker by "refreshing" the data (re-writing the same data out again, making the old slices useless).
Potential Impact on Regulations and Laws
The advent of IDA solutions may in the future impact regulatory schemes that point to traditional means of protecting data. For example, the payment card industry maintains requirements for protecting personal information related to payment transactions. Traditional encryption schemes are commonly used to satisfy these requirements. The protection afforded by IDA schemes could be at least as strong, and may even be considered stronger, since with encryption schemes, the key exists in its entirety in at least one location. With dispersal the key does not live anywhere until a threshold number of slices are retrieved and brought together.
Another example is related to the jurisdictional requirements for data as defined by the European Union and other data protection laws. An argument could be make that IDA-based storage networks by design satisfy data protection laws given that an individual slice, or set of slices short of the threshold does not constitute the data. Therefore keeping slices outside one particular country's borders may not count as keeping data outside one's borders, so long as the data is only restored (by bringing the slices together) within that country.
On a related note, dispersal allows some interesting possibilities related to jurisdictions. For instance, a configuration could be deployed across different jurisdictions, and therefore remain private should one of those jurisdictions seize all the storage servers within their jurisdiction. This could allow individuals and organizations to store data in locations they would never trust a full copy of their data to reside.
Benefits and Risks
Simplifying key management and reducing key management costs are key benefits to IDA solutions over traditional key managed encryption systems. Additionally, IDA may provide equivalent storage capacity at lower floor space, power, cooling, and array costs compared to existing file copy-based solutions. Furthermore, it provides confidentiality and integrity of the data through the use of cryptographic functions.
The primary risks associated IDA technologies are the immaturity to the market, current lack of commercial implementations, and lack of exposure to the security and regulatory community. These risks, however, will certainly be overcome with time, experience, third-party vetting, and enterprise metrics for determining the value/costs aspects of IDA when compared to traditional data protection systems.
Bottom Line
Information dispersal is a new weapon in the battle to protect information in the growing digital universe through inherent properties supporting confidentiality, integrity, and availability needs of end-users. These technologies are of particular interest in multi-tenant environments and distributed computing and storage architectures with significant content storage and distribution requirements. IDA technologies also underpin a trust fabric in storage-as-a-service. And the trust is not only about security, but being able to respect privacy and never lose or expose data accidentally or intentionally.
Just as CTOs, CIOs, storage and information security experts must keep their eyes on emerging threats to data protection particularly in multi-tenant and distributed system, they must also engage in the development, validation, and deployment of IDA solutions as emerging data protection solutions.
And for more discussion on information dispersal and security implications, check in with ISSA Connect
Action Item:
Footnotes: