Random Question (About Hash Keys)

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Developer's Corner

(1/3) > >>

KynloStephen66515:
If Hashing is supposedly 'One Way Encryption'...

How the HELL does your machine know how to interpret the information it has been given? - Surely your computer is somehow decrypting the information in order to do what it needs to?

KynloStephen66515:
**Edit**

Not only hashes, but anything claimed to be "1 Way Encryption/Cypher"

db90h:
I'll try to be clear (not great at that ;p). There are TWO distinct types of operations here, though they are similar in nature, it is just that one has a larger bitspace to reduce collisions to a near zero possibility.

The terms ARE *often* interchangable in the real world, so either is valid, but I prefer this way of thinking. You can use whichever term you want - just make sure you pick the right algorithm.

1. Hash - short bitspace, collisions can occur
2. Digest - larger bitspace, collisions have near zero probability, designed to be mathematically irreversible by people smarter than me (cryptographically secure)

In practice, this means no two data sets are likely to result in the same DIGEST, so it can be used reliably to uniquely identify a data set. HASHes are used often for quick data integrity checks in the old days, or quick lookups in arrays and databases. Collisions are expected and accounted for.

DIGESTS are usually much larger in length than HASH. Instead of 32-bits, you may be speaking of 128-bits, for example. These are mathematically formed in such a way that no two data sets should produce the same DIGEST.

That is why a password is stored as a DIGEST, and thus when you type in the plaintext password, the DIGEST is computed and compared to the stored DIGEST. In this way, even if an attacker compromised the site or software and retrieved the password's DIGEST, they'd NOT know the actual password. However, that is what 'Rainbow Tables' are for (pre-computed digests of common passwords). These are getting larger all the time. Still, unless you used a word in the dictionary, or some short password, you're likely ok.

Why is it called a 'digest'? Well, think about it -- the data set is 'digested' through an algorithm and out comes a 128bit (or however long) 'digest'.

Example HASH algorithms: CRC32, Checksum

Example DIGEST algorithms: MD5, SHA1

Again, the *terms* HASH and DIGEST are often interchanged in the real world, but that's MY definition. In the real world, you can say either one and be fine.

To web site owners *and* even software authors --- NEVER, EVER store a password plaintext. ALWAYS store its digested form (usually SHA1 or MD5). Sometimes people also 'salt' the outputed digest with a quick XOR or some other operation, just to throw off rainbow tables.

This is the 'one way encryption' I believe you speak of. Does that sum it up? This would be in a CS101 class I believe, it's pretty introductory stuff, some of the first things you learn in college other than OS fundamentals and classical algorithms that still are used to this very day (e.g. search and sort algorithms, or the basics of compression algorithms later).

e.g.:

Last month I created this 'online tool', one of thousands I'm sure, to do a quick MD5 or SHA1 computation of given plaintext:

Calculate MD5 or SHA1 Digest

MD5 of 'mypassword' is '34819d7beeabb9260a5c854bc85b3e44'

All that need be stored is '34819d7beeabb9260a5c854bc85b3e44'. Then when the user types in 'mypassword' the MD5 is recomputed and compared to '34819d7beeabb9260a5c854bc85b3e44'. If match, then all is good. If no match, then it's the wrong password. In that way, the actual password need never be stored. Note that MD5 has been compromised in that it has been shown to produce collisions, as shown here: http://en.wikipedia.org/wiki/MD5 .. Thus, SHA1 is the preference usually .. but it hardly matters unless you're into banking or some really hard core stuff.

db90h:
Rewrote for clarity (as always). This sort of information is stuff I should post on my own Forum, as it makes me sound smart, lol. Seldom do customers ask about message digests though, lol.

mouser:
How the HELL does your machine know how to interpret the information it has been given? - Surely your computer is somehow decrypting the information in order to do what it needs to?
--- End quote ---

You may be thinking that the idea of a hash (or digest) is to "encrypt" some information so that it can be "recovered/decrypted" later -- but that's precisely the opposite of what you use a hash to do.

The whole point of using a hash/digest for security purposes is to convert plain text data (like a password), into a (unique) representation such that it's completely impractical for anyone to reverse it and get back the original. The computer never "decrypts" the hash back into the original. That's why we call hashes "one-way functions".

The way you use hashes to handle account passwords is that you hash the plain text password and store the hash. The next time someone enters your password, you again calculate a hash of what they type, and then compare it to what they stored. Now you see why no website has a function that will resend you your password if you forget it -- because they can't do that -- they can't get back what your original plain text password is. That's why they have to send you a link to RESET your password.

Hash functions are used on files for a different purpose, but the idea is the same -- you perform a mathematical hash function on the input, which can be arbitrarily big, and the output is a number (or a short string of bytes). If you want to compare 2 files to see if they are the same, you perform the hash function again on the new file, and compare the output to the result you got when you ran it on the original file. And because the hash function cannot be reversed, people cannot create fake files that generate the same hash as (and therefor might appear to match) the original.

So.. the whole mathematical beauty of hash functions are that the are very carefully designed to be impossible to efficiently reverse (they only go one way) -- and that they have good properties in terms of rarely (but it can happen) mapping different inputs to the same output (collisions), even when the input data is large.

You can't just use any function as your hash function -- it needs to meet these requirements. If it falls short, then bad things can happen. If a hash function turns out to be reversible then malicious people could create fake files or fake passwords that compute to the same hash; If a hash function turns out to have lots of collisions, then you will end up with cases where many files/passwords look the same (according to their hash) when they are different.

Navigation

[0] Message Index

[#] Next page

Go to full version