Hashing provides one-way encryption. This means there is absolutely no way of recovering the original string that was hashed, from the hash string. Hashing has a significant ammount of mathematical theory behind it, most of which you needn’t know. However, I encourage you to have a read of the relevant Wikipedia articles.
Hashes are used for two main purposes:
- to uniquely identify some information: this is achieved by hashing that information into a string that is unique within the key-space of the hashing algorithm. This is how you can quickly compare two files, for instance – by hashing their contents, then comparing the hashes. If they match, the files are identical. With one caveat: collision risk, meaning that a certain (usually very small) percent of non-identical information will yield an identical hash. This is, apparently, mathematical inevitability, it is algorithm dependent, and can be used in attacks attempting to break the algorithm. With a strong enough hash algorithm, this should not be a concern for most problems.
- to obscure information: this is why we use them for password storage, where uniqueness is not the problem.
What you should know, is how to properly and correctly hash your information. Here are the most important points to consider.
Hashing algorithm: use SHA-2
SHA-1 is most likely strong enough if you’re not the government or a financial institution. However, it has been suggested that it might have some mathematical weakness. With the above exceptions, I think it’s still acceptable to use SHA-1, especially if you’re worried about storage space requirements (SHA-2 hashes are longer).
Everything below SHA-1 should never be used for security purposes. md5 is broken and should never be used in this context. SHA-2 should cover you nicely for virtually any purpose or security requirements.
Hashing “salt”: always use it
To get truly secure hashes, using a “salt” string is an absolute must. Unsalted hashes must never be used in the context of security.
The “salt” is simply a random string that you feed to your hash algorithm, in addition to the string you want to hash. The “salt” need not be secret, you can store it in plain view, along with the hash. This is possible because its role is not only to “strengthen” the actual hashing process, but to protect a group of hashes from the infamous “rainbow table” attacks.
For instance, if you don’t salt your hashes for, say, user passwords, an attacker will use such a technique in order to discover, or “guess”, the original strings by generating hashes of known (dictionary) strings, using pre-computed hash chains, while looking for collisions (identical hashes) – which means the “guess” hash matches the real, original string. This may or may not succeed.
However, if you do salt your hashes, and – this is very important – you use an unique “salt” string for each different hash, the “rainbow table” type of attack is rendered useless. Because all identical passwords, instead of hashing to the same string (as would happen without a random, unique “salt” for each), will yield different hashes. The attacker now has to generate tables for each possible “salt” value, which, with a long enough “salt” string (allowing for enhanced “uniqueness”), borders on physical impossibility (you’d need terabyte-scale storage and tremendous computational power and an awful lot of time).
This means running the input string (don’t forget the “salt”!) through the hashing algorithm for more than one time. This increases the computational time required by brute force attacks. It’s probably a good idea to do this, although I tend to see a normal (not stretched) SHA-2 salted hash as sufficient. Strenghtening is particularly useful for weak hash agorithms, such as md5, which I hope we agreed not to use.
Security through obscurity
This is, indeed, paranoid, but try to hide the “salt” within the hash at a secret offset, known only to your system. Of course, the “salt” string is not a secret, but hey…