Originating Author: Fred Moore
As the amount of digital data grows, so does the exposure to data loss. It is difficult to find a day when there hasn’t been a high-profile data security incident. The risk has reached such a level that data encryption is being implemented for stored data and mobile data, in addition to the traditional use of encrypting data in transit via the network.
Data encryption is defined as the process of scrambling transmitted or stored information making it unintelligible until it is unscrambled by the intended recipient. With regard to computing, data encryption has historically been used primarily to protect mission critical data, government records and military secrets from foreign governments.
Encryption has been increasingly used over the past 10 years by the financial industry to protect money transfers, by businesses to protect credit-card information, for electronic commerce, and by corporations to secure sensitive network transmission of proprietary information.
Most of the encryption focus had been on data transmission prior to 2000 but the events of Sept. 11th, 2001, the rise of compliance, and the tremendous amount of data being stored on mobile personal appliances are moving the topic of encrypting stored data much higher on the priority list of leading-edge data protection strategies today. The enciphering and deciphering of messages in secret code or cipher is called cryptology and has now become a topic of serious interest to the storage industry.
DES – the first standard
In 1977 the Data Encryption Standard (DES and later Triple DES) was adopted in the United States as the first federal encryption standard. DES applies a 56-bit key to each 64-bit block of data. DES is now considered to be insecure for many applications. This is chiefly due to the 56-bit key size being too small as DES keys have been broken in less than 24 hours or less as microprocessor speeds increase. Since there was growing concern over the viability DES encryption algorithm, NIST (National Institutes of Standards and Technology) indicated DES would not be recertified as a standard and submissions for its replacement to become the encryption standard were accepted. Other encryption algorithms have been in use for years and include Secure Sockets Layer (SSL) for Internet transactions, Pretty Good Privacy (PGP), and Secure Hypertext Transfer Protocol (S-HTTP).
AES – the second standard
The second encryption standard to be adopted was known as the Advanced Encryption Standard (AES). AES, like DES, is a symmetric (Secret or Private Key) 128-bit block data encryption technique developed by Belgian cryptographers Joan Daemen and Vincent Rijmen. Symmetric standards require that both the sender and the receiver must share the same key and also keep it secret from anyone else. The U.S government adopted the algorithm as its encryption technique in October 2000 after a long standardization process finally replacing the DES encryption algorithm. On December 6, 2001, the Secretary of Commerce officially approved AES as FIPS (Federal Information Processing Standard) 197. It was expected to be used extensively worldwide as was the case with its predecessor DES. AES is more secure than DES as it offers a larger key size, while ensuring that the only known approach to decrypt a message is for an intruder to try every possible key. The AES algorithm can specify variable key lengths of 128-bit key (the default), a 192-bit key, or a 256-bit key. AES was initially used on a selective basis and is backwards compatible with DES. Top Secret, classified and government information normally requires use of either the 192 or 256 key lengths. The implementation of AES is intended to protect US national security systems and secret information and it must be reviewed and certified by NSA (National Security Agency) prior to its acquisition and use. As of 2006, no successful attacks against AES had been recognized.
From Symmetric to Asymmetric Encryption – public and private keys
Symmetric standards require that both the sender and the receiver must share the same key and also keep it secret from anyone else. Asymmetric Encryption differs from symmetric encryption in that it uses two keys; a public key known to everyone and a private key, or secret key, known only to the recipient of the message. Asymmetric encryption lessens the risk of key exposure by using two mathematically related keys, the private key and the public key. When users want to send a secure message to another user, they use the recipient's public key to encrypt the message. The recipient then uses a private key to decrypt it. An important element to the public key system is that the public and private keys are related in such a way that only the public key can be used to encrypt messages and only the corresponding private key can be used to decrypt them. Moreover, it is virtually impossible to determine the private key if you know the public key.
There are a number of asymmetric key encryption systems but the best known and most widely used is RSA, a public key algorithm named for its three co-inventors Rivest, Shamir and Adleman. The Secure Sockets Layer used for secure communications on the Internet uses RSA (the popular https protocol is simply http over SSL). Asymmetric encryption is based on algorithms that are complex and its performance overhead is more significant making it unsuitable for encrypting very large amounts of data or response time sensitive data. Asymmetric encryption is considered one level more secure than symmetric encryption, because the decryption key can be kept private. Public key encryption is more computationally intensive and requires a longer key than a symmetric key algorithm to achieve the same level of security.
Keys are the Key - for successful encryption
The basic idea of key-based encryption means that a block, file or other unit of data is scrambled by an encryption algorithm so that the original information is hidden within a level of encryption. The scrambled data is called cyphertext. A unique key must be generated for each data element, device, LUN or other entity that needs to be encrypted. Keys must be stored and maintained for the life of the data. This can mean over 100 years for some compliance and archival data applications. In theory, only the person or machine doing the scrambling and the recipient of the cyphertext knows how to decrypt or unscramble the data since it will have been encrypted using an agreed-upon set of keys.
The difficulty of cracking an encrypted message is a function of the key length. The length of the key determines how many combinations are possible to encrypt a data element. For example, an 8-bit key allows for only 256 possible keys (28) and can be cracked quickly. A 256-bit key (which equates to searching 2256 keys) might take decades to crack. The same computer power that yields strong encryption can be used to break weak encryption schemes. Strong encryption makes data private, but not necessarily secure. To be secure, the recipient of the data, often just a server, must be positively identified as being the approved party. This is usually accomplished online using digital signatures or certificates. Data can be re-keyed by creating a new key and deleting the old key. This is done for example when an older tape is being upgraded to a newer tape.
The increasing mobility of digital data means that the keys need to follow the data. Encryption keys and passwords should be carefully stored in a secure, near bullet-proof storage environment or even in escrow with a secure third party. It is critical to establish an effective key management plan. The encrypted data is useless if the key is lost. Key management is the key to the successful use of encryption!
Hashing
A third form of cryptology is called Hashing (One-Way) Encryption. A hash is a cryptographic algorithm that takes data input of any length and produces an output of a fixed length. The hash output is called a digital signature and is used for data integrity. Some hash algorithms such as MD5 (Message Digest 5) have the remote possibility of producing the same signature making it vulnerable to attack as a duplicate key can be produced. Digital signatures typically range from 128 bits using the MD5 algorithm to160 bits in size using the more secure SHA1 (Secure Hash Algorithm 1). The larger the signature, the more secure the hash though performance degrades as hash size increases. Hashing is gaining momentum as it is used extensively with the de-duplication products to reduce backup windows.
Data exposure grows – the dark side of the Internet
For years the storage industry focused its high availability developments on protecting data from technology failures such as disk crashes, operating system failures, or tapes that couldn’t be read. Technology failures were addressed with concepts such as RAID, clustering, component redundancy, replication software, and vastly improved intelligent error recovery capabilities implemented in both disk and tape subsystems. With the use of the somewhat vulnerable IP-based storage network protocol in full swing by 2000, a new threat to data loss appeared in the form of intrusion and it has now became the next big data exposure issue for the IT industry to address. Despite the many benefits and “all the good things” the Internet brings us, the Internet's dark side has appeared in the form of its lack of built-in security. This is a major concern since there are trillions of dollars of value continuously floating all over the world unprotected. No one envisioned that a wave of digital terrorism, spy-ware, scam artists, sexual predators and identity thieves would emerge using the internet to host their digital crime wave. Where did all of these malicious people come from in such a short time?
Having spent significant amounts of time and money shoring up their physical security, many enterprises are beginning to guard their stored data from insider attacks, disgruntled employees, and onsite contractors and visiting clients. Malicious attacks on company networks are nearly doubling each year and the biggest single source of the attacks is now believed to be employees. Worms, viruses, spy-ware, scams and spam have contaminated porous IP networks causing significant business losses and an estimated 90% of the e-mail content being transmitted on the Internet is now estimated to be useless unwanted materials such as SPAM. This is a severe problem since over 50% of all disk data is now network-attached via NAS, SAN or wireless. Viruses, worms, Trojan horses, zombies, distributed denial-of-service attacks, hacking, and blended threats are all out there, and many can hitch rides with e-mails, web links, downloads and electronic transmissions, including instant messages. There are an estimated 60,000 different viruses currently being transmitted via the Internet. The number is growing daily increasing the security exposures. The Radicati Group estimates that 2 million e-mail messages are processed per second every day worldwide. Compliance drives encryption.
Another reason for the heightened interest in encryption is the advent of government regulations such as HIPAA, Sarbanes-Oxley and PHIPA in Canada. Total claims filed in the US in 2004 for damages caused by worms and viruses totaled $17.5B according to a survey released by the Computer Economics Impact of Malicious Code Study. The Love-bug attack in 2004 cost an estimated $8.8B in damages alone! Intrusion is being addressed by anti-virus protection software but this remains a catch-up game for now as the exposure to data loss mounts. Viruses and worms are more aggressively targeting nearly every type digital appliances and cell phones. Security jobs are on the rise and estimates indicate demand for 2.1 million information security professionals in 2008, up from 1.3 million in 2005. Data security may well be on its way to becoming the most important storage and data management discipline given the increasing value of digital data. In a surprising Information Week survey published in 2006, only 19 percent of the 966 US companies indicated that encryption was beneficial to their compliance and security efforts. However California Senate Bill 1386 requires that companies publicly disclose instances when they believe unencrypted personal information about California residents might have been compromised. The bill suggests that implementing encryption could keep them out of the headlines.
Major examples of data loss and vulnerability
Businesses are storing more data in distributed locations than ever before to protect against physical threats such as loss of electricity, floods, devastating hurricanes or other site related damages. Data may arrive at distributed locations either electronically or the storage media can be physically transported in an offline mode by other ground and air vehicles. What happens if the data being transported to another location is lost or stolen?
The growing list of lost data and security breaches includes CardSystems loss of account information for 200,000 credit card holders, some 6,000 current and former employees of the Federal Insurance Deposit Corp. had data revealed through a security breach, a loss of backup tapes at City National Bank, and Bank of America Corp. disclosed early in 2005 that it lost digital tapes containing the credit card account records of 1.2 million federal employees including 60 U.S. senators. Was the data really lost or stolen? Who has the data now? Is this valuable data readable or was it encrypted so it could not be understood? What does this mean to potential identity theft problems? Finding answers to these questions has been difficult. If any of this mobile data had been encrypted, the damages would be minimized as the stolen or lost data would be essentially useless.
Choosing where to encrypt
Once you determine your needs to encrypt, the main decision points revolve around whether data encryption should happen at the server (host-based), in transit (appliance-based) or at the disk or tape drive. Encryption and decryption are compute-intensive activities that can slow access to stored data, especially when organizations are storing and accessing large amounts of information. Encryption doesn’t help protect against device failures, worms or viruses. It does address data loss and theft from spy-ware, lost PC’s and personal digital appliances, or lost media, as confiscated data that is encrypted is essentially meaningless. Regardless of where you choose to encrypt, the key management process must be evaluated to determine which method will best work for your business. Most of the encryption solutions today use a proprietary key management system.
Host-based/software - Servers encrypt data before it hits the I/O interface with host-based encryption so it's never unencrypted going to disk or tape making it possibly the most secure encryption technique. Encrypted data is functionally uncompressible. Many host and appliance vendors insure their products both compress and then encrypt data on its way to tape. Mainframe based encryption may exploit IBM's System z9 Integrated Information Processor (zIIP) engine to reduce general processor capacity requirements, enabling customers to fully protect their growing volume of business-critical data without having to purchase additional encryption hardware or tape devices. Data must first be compressed, and then it can be encrypted. Standalone appliances and switches – All data passes through an appliance that sits between the server and the disk drives or tape library, and normally undergoes compression and encryption in the process. These special-purpose appliances are placed between the storage devices (normally disk and tape) and the server running applications requesting encrypted data. The appliance uses a silicon chips to encrypt all data going to storage and decrypts data going back to the applications as it monitors all file access attempts. Like host-based encryption, encryption appliances protect data at rest and can also purge data after a prescribed time by simply deleting keys.
Disk drives, tape, CDs, DVDs – While RAID and mirroring address device failures; encryption attempts to addresses the data loss/data theft problem by rendering the data useless. In general, it is more difficult to decide which data to compress on disk drives as there are performance considerations. For disk drives used in multi-user systems, encrypting highly active, response time sensitive data can cause performance degradation and its impact should be carefully evaluated. Encrypting disk drives in PCs has the most pressing value for disk drives, since it is highly mobile and easily subject to loss or theft.
Tape drive – Many tape drive manufacturers are implementing encryption via an ASIC in the tape drive, similar to the way tape compression was implemented in the mid-1980s. Choosing to encrypt tapes is a fairly easy decision if the cartridges highly mobile. Because the encryption capability is built into the drive itself, backup servers and networks won't take a performance hit. Encryption at the tape drive will be able to support diverse operating infrastructures because tape drives are independent from the backup environment and by their very nature they are application-agnostic.
What data should be encrypted?
Data classification has become a hot storage industry topic. Knowing the value of your data is important to determine which data protection services to deploy as well as which data to encrypt but many businesses haven’t gone through this process yet. Some very large businesses may only choose to encrypt their regulated data since managing the keys and the overall encryption process remains complex. Small and medium-size companies may consider encrypting just about everything to ease the management challenge of determining what to encrypt. The choices will vary by business and application. Standard data classification guidelines are listed in the following article: Classifying data and are primarily based on a recovery time objective (RTO). Keep in mind that certain data in each category can be a candidate for encryption.
Conclusion
With as much as 80% of the world’s digital data estimated to reside on removable and mobile storage media including PCs, protecting data at rest must be now treated as more than managing an archival repository. Presently the majority of IT businesses haven’t directly made encryption part of their high availability strategy for stored data or mobile data. Nonetheless, scrambling data is easier than stopping its theft. While widespread usage of biometric security solutions remain in the distant future, encryption makes good sense for mobile data stored on PCs, tapes, CDs, DVDs, PDAs, iPods, and other mobile storage products containing important information as this data is at high risk. Though encrypting data is clearly gaining momentum, it will continue to be used for specific applications in the near term as the lack of standards for compatibility, performance tradeoffs, and key management issues persist. For encryption to ultimately succeed, a centralized key management system will yield the optimal strategy. Stay abreast of your storage and IT provider’s longer-term strategies for encryption, sooner or later the future of your most valuable asset will most likely depend on it.
Relevant Links to Encryption Providers
http://www.decru.com (acquired by NetApp)
http://www.emc.com (via RSA acquisition)
http://www.sun.com (via StorageTek acquisition)