What is a hash? (2a2c2075f67a55e2f170b9af7e2212d0cc9f70f9)
The OpenSSL Heartbleed vulnerability has brought the concept of cryptography center stage, and with it comes a whole new bag of buzzwords. As a result, everyday computer users are left with the difficult task of deciphering a conversation that is often as cryptic as its content. “Cryptographic hash” is one of the most fundamental and recurrent of these terms.
A cryptographic hash is a string of numbers and letters produced by a cryptographic hash function. A cryptographic hash function is simply an algorithm, or a set of mathematical steps, performed by a computer. To begin to understand this, we can take a look at this article’s intimidating title:
What is a hash? (2a2c2075f67a55e2f170b9af7e2212d0cc9f70f9)
In English, this article is titled
What is a hash? In the SHA-1 cryptographic hash language, that translates to
Though it might it seem complicated, this is nothing more than an input-output relationship. To make the translation, the input
What is a hash? is simply fed to a computer program that applies the SHA-1 cryptographic hash function and then spits out the hash as an output.
Note how the hash has been broken into 10 pieces, each of which is 4 characters long, for a total of 40 characters. Though formatted differently, the hash is still the same. This format is applied to increase readability and to help analysts decrypt a hash when they need to. Here are a few more examples:
Note that even though each input has a different number of characters, the SHA-1 hash output is always the same length, 40 characters. This makes hashes harder to crack. For example, you could hash your favorite letter, “E”, or you could hash the entire contents of your favorite book, The Da Vinci code, and in both cases you’d still end up with a 40 character hash. This makes hashing a powerful cryptographic tool.
Even more interesting is what’s called the “avalanche effect,” which can be illustrated using a common misspelling of our company name:
Note how a slight variance in the input – leaving out that first S in Emsisoft – has caused a huge shift in output: this is the avalanche effect, a highly desirable ability for a cryptographic hash function to have. Just one letter creates an entirely different hash, making manual decryption quite difficult.
In other words, one letter makes all the difference!
What are hashes used for?
Hashes are a fundamental component of cryptography because they allow for a set of data of any size to be associated with a randomized signature. In the examples above, we illustrated things using the SHA-1 cryptographic hash function, but in reality there are actually a number of hash functions that can be used. In addition to SHA-1, computer security professionals also use SHA-2 and MD5. Different functions can supply different degrees of complexity and are therefore used in different scenarios, depending on the amount of security required.
For the everyday computer user, the most direct contact with hashes comes from passwords. When you create a password and you share it with a service provider, the service provider archives it as a hash instead of in its plain text form, so that in the event their server is compromised the attacker can only steal encrypted hashes. The problem with Heartbleed, and the reason it was deemed a crisis, was that it allowed attackers to access plain text passwords in a server’s temporary memory, as opposed to encrypted passwords in the server’s archives.
Additionally, password encryption is not fool-proof. Commonly used passwords such as “123456” or “password” are still vulnerable to what is called a dictionary attack. This is because an attacker can simply put these common passwords into a hash function, find the hash, create a dictionary of common password hashes, and then use that dictionary to “look up” the hashes of stolen, encrypted passwords. Determined hackers can also utilize rainbow tables to decrypt hashed passwords – if they somehow get their hands on them. Fortunately, the rainbow table hack can be prevented by using a “salted” hash. A salted hash takes the original password hash and adds a little something extra to it, like a random number or a user-id (i.e., a “dash of salt”). The salted hash is then itself hashed, providing an extra layer of security that is almost impossible to crack.
Here at Emsisoft, one of the most common uses of cryptographic hashes is in the identification of malware. When the malware analysis community finds a new threat, they assign it a cryptographic hash that serves as its signature. Emsisoft’s dual engine malware scanner contains over 12,000,000 of these signatures and uses them to protect your computer. In principle, this works just like an FBI fingerprint database. When you scan your computer for malware with Emsisoft Anti-Malware, the software compares all of your file’s signatures to the signatures of known malware in its database – a database that updates every 15 minutes. If the software finds a match, then it knows your computer is infected and it lets you know you should delete that malicious file.
There is one slight problem with scanning, however. Due to the avalanche effect, a malware author can change their malware’s hash signature quite easily. All it takes is the slightest change to the malware’s code. This means that new malware signatures are generated en masse, on a daily basis, and also means that not even the best anti-malware signature database maintained by the most vigilant of analysts can keep up. This flaw is precisely the reason Emsisoft Anti-Malware uses Behavior Blocking technology, an innovation that recognizes when a file is attempting to perform a malicious process. In this way, Behavior Blocking serves as a backup, in the off chance that our signature database hasn’t yet registered a new threat.
Other uses for the hash
In addition to security, the signatory nature of cryptographic hashes can also be used to legitimize digital content. This application is often used to copyright digital media, and has been adopted by file sharing service providers to prevent their users from illegally sharing copyrighted content. This is powerful, because it allows service providers to monitor what their users are storing without actually infringing on their privacy. Much like we do with our anti-malware, file sharing service providers simply create databases of hashes that are associated with copyrighted files. If they then notice that a user is attempting to transfer a file with one of those hashes, they can infer that that user is attempting to illegally share copyrighted material.
Hashes can also be used to validate what’s called “message integrity” by acting as a “checksum.” If two parties want to share a file, they can use cryptographic hash signatures to validate that the file was not tampered with in transit by a third, malicious party. This works because a file that hasn’t been tampered with should produce the same hash at the sending and receiving ends of transmission. If comparison reveals that these hashes are different, then the people sharing the file know that someone else has “tampered” with their package!
What happens when you hash a hash?
The beautiful and yet scary thing about cryptography is that it rests on very simple principles like the hash and yet still achieves enormous complexity. This makes cryptography a powerful and yet dangerous tool. In one breath, cryptography allows computer security professionals to protect everything from home users’ financial data to top secret military documents that contain information about the world’s most powerful weaponry. In that same breath, cryptography also allows malware authors to create advanced forms of malware like Cryptolocker – a threat that has yet to be cracked.
Anyway you hash it, cryptography is complex – and the critical Heartbleed vulnerability that occurred in early April was indeed a consequence of this complexity. As the Internet moves forward, there is no doubt that new cryptographic technology and terms will arise; but, that doesn’t mean that everyday users can’t take part in the conversation and understand its basic components. Quite the contrary, those who use the Internet on a daily basis and don’t quite fully grasp its interworkings are actually in the majority, and educating this majority is one of the most fundamental components of a fully secured Internet.
That being said, consider yourself armed and ready to hash.
Have a Great (Malware-Free) Day!
Winners and Retrospective: The Emsisoft Illustration Contest 2014