topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday October 31, 2024, 7:12 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Random Question (About Hash Keys)  (Read 13195 times)

KynloStephen66515

  • Animated Giffer in Chief
  • Honorary Member
  • Joined in 2010
  • **
  • Posts: 3,758
    • View Profile
    • Donate to Member
Random Question (About Hash Keys)
« on: May 07, 2012, 03:38 AM »
If Hashing is supposedly 'One Way Encryption'...

How the HELL does your machine know how to interpret the information it has been given? - Surely your computer is somehow decrypting the information in order to do what it needs to?

KynloStephen66515

  • Animated Giffer in Chief
  • Honorary Member
  • Joined in 2010
  • **
  • Posts: 3,758
    • View Profile
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #1 on: May 07, 2012, 04:24 AM »
**Edit**

Not only hashes, but anything claimed to be "1 Way Encryption/Cypher"

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #2 on: May 07, 2012, 05:20 AM »
I'll try to be clear (not great at that ;p). There are TWO distinct types of operations here, though they are similar in nature, it is just that one has a larger bitspace to reduce collisions to a near zero possibility.

The terms ARE *often* interchangable in the real world, so either is valid, but I prefer this way of thinking. You can use whichever term you want - just make sure you pick the right algorithm.

1. Hash - short bitspace, collisions can occur
2. Digest - larger bitspace, collisions have near zero probability, designed to be mathematically irreversible by people smarter than me (cryptographically secure)

In practice, this means no two data sets are likely to result in the same DIGEST, so it can be used reliably to uniquely identify a data set. HASHes are used often for quick data integrity checks in the old days, or quick lookups in arrays and databases. Collisions are expected and accounted for.

DIGESTS are usually much larger in length than HASH. Instead of 32-bits, you may be speaking of 128-bits, for example. These are mathematically formed in such a way that no two data sets should produce the same DIGEST.

That is why a password is stored as a DIGEST, and thus when you type in the plaintext password, the DIGEST is computed and compared to the stored DIGEST. In this way, even if an attacker compromised the site or software and retrieved the password's DIGEST, they'd NOT know the actual password. However, that is what 'Rainbow Tables' are for (pre-computed digests of common passwords). These are getting larger all the time. Still, unless you used a word in the dictionary, or some short password, you're likely ok.

Why is it called a 'digest'? Well, think about it -- the data set is 'digested' through an algorithm and out comes a 128bit (or however long) 'digest'.

Example HASH algorithms: CRC32, Checksum

Example DIGEST algorithms: MD5, SHA1

Again, the *terms* HASH and DIGEST are often interchanged in the real world, but that's MY definition. In the real world, you can say either one and be fine.

To web site owners *and* even software authors --- NEVER, EVER store a password plaintext. ALWAYS store its digested form (usually SHA1 or MD5). Sometimes people also 'salt' the outputed digest with a quick XOR or some other operation, just to throw off rainbow tables.

This is the 'one way encryption' I believe you speak of. Does that sum it up? This would be in a CS101 class I believe, it's pretty introductory stuff, some of the first things you learn in college other than OS fundamentals and classical algorithms that still are used to this very day (e.g. search and sort algorithms, or the basics of compression algorithms later).

e.g.:

Last month I created this 'online tool', one of thousands I'm sure, to do a quick MD5 or SHA1 computation of given plaintext:

Calculate MD5 or SHA1 Digest

MD5 of 'mypassword' is '34819d7beeabb9260a5c854bc85b3e44'

All that need be stored is '34819d7beeabb9260a5c854bc85b3e44'. Then when the user types in 'mypassword' the MD5 is recomputed and compared to '34819d7beeabb9260a5c854bc85b3e44'. If match, then all is good. If no match, then it's the wrong password. In that way, the actual password need never be stored. Note that MD5 has been compromised in that it has been shown to produce collisions, as shown here: http://en.wikipedia.org/wiki/MD5 .. Thus, SHA1 is the preference usually .. but it hardly matters unless you're into banking or some really hard core stuff.
« Last Edit: May 07, 2012, 05:52 AM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #3 on: May 07, 2012, 05:32 AM »
Rewrote for clarity (as always). This sort of information is stuff I should post on my own Forum, as it makes me sound smart, lol. Seldom do customers ask about message digests though, lol.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,913
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #4 on: May 07, 2012, 09:14 AM »
How the HELL does your machine know how to interpret the information it has been given? - Surely your computer is somehow decrypting the information in order to do what it needs to?

You may be thinking that the idea of a hash (or digest) is to "encrypt" some information so that it can be "recovered/decrypted" later -- but that's precisely the opposite of what you use a hash to do.

The whole point of using a hash/digest for security purposes is to convert plain text data (like a password), into a (unique) representation such that it's completely impractical for anyone to reverse it and get back the original.  The computer never "decrypts" the hash back into the original.  That's why we call hashes "one-way functions".

The way you use hashes to handle account passwords is that you hash the plain text password and store the hash.  The next time someone enters your password, you again calculate a hash of what they type, and then compare it to what they stored.  Now you see why no website has a function that will resend you your password if you forget it -- because they can't do that -- they can't get back what your original plain text password is.  That's why they have to send you a link to RESET your password.

Hash functions are used on files for a different purpose, but the idea is the same -- you perform a mathematical hash function on the input, which can be arbitrarily big, and the output is a number (or a short string of bytes).  If you want to compare 2 files to see if they are the same, you perform the hash function again on the new file, and compare the output to the result you got when you ran it on the original file.  And because the hash function cannot be reversed, people cannot create fake files that generate the same hash as (and therefor might appear to match) the original.

So.. the whole mathematical beauty of hash functions are that the are very carefully designed to be impossible to efficiently reverse (they only go one way) -- and that they have good properties in terms of rarely (but it can happen) mapping different inputs to the same output (collisions), even when the input data is large.

You can't just use any function as your hash function -- it needs to meet these requirements.  If it falls short, then bad things can happen.  If a hash function turns out to be reversible then malicious people could create fake files or fake passwords that compute to the same hash; If a hash function turns out to have lots of collisions, then you will end up with cases where many files/passwords look the same (according to their hash) when they are different.
« Last Edit: May 08, 2012, 04:43 PM by mouser »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #5 on: May 08, 2012, 04:04 PM »
And note that when two files or sets of data 'match' as mouser eloquently explained it (much better than I), that is what is called a 'collision', as I tried to explain above. Thus, the collision rate is paramount when determining what algorithm you want to use.

If it need be secure, then you want an essentially zero collision rate, but that comes with high computational complexity and a large bitspace. Thus, in *my* rogue thinking, I prefer to call such 'secure hashes', digests. A digest is a hash, but a hash isn't always a digest. Of course, being irreversible is another important characteristic that applies to both forms. Anyway, it just makes it easier to differentiate.

Update: I see mouser did put (collision) is parenthesis. In retrospect, my explanation assumes the reader knows too much already.
« Last Edit: May 09, 2012, 01:21 PM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #6 on: May 08, 2012, 04:27 PM »
Updated last post to mention that I was being redundant, and also explain my 'rogue' redefinition of digests. I really do prefer this, and think it is valid. I don't know that anyone teaches my view that secure hashes should be reserved the term 'digests', but I like it because it makes it clear in conversation the difference between a secure hash and an insecure one, as well as its likely intent. That way, if you say 'digest', you know it must be a secure hash. Saying 'hash' could mean it is as little as one bit (arbitrary size, as mouser said), and therefore come with a huge collision rate... or, from a different perspective, mean its some random algorithm that need not be mathematically secure.

Deozaan

  • Charter Member
  • Joined in 2006
  • ***
  • Points: 1
  • Posts: 9,768
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #7 on: May 08, 2012, 08:02 PM »
If you require a hash to be secure before you call it a digest, then why do you call MD5 a digest when there are known collisions? Doesn't that make MD5 an insecure hash?

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #8 on: May 08, 2012, 08:46 PM »
If you require a hash to be secure before you call it a digest, then why do you call MD5 a digest when there are known collisions? Doesn't that make MD5 an insecure hash?

Well, MD5 had the intent to be secure, and indeed mostly is - except for the known issues I linked to. It's like an encryption algorithm vs an obfuscation algorithm. An encryption algorithm seeks to be secure, while an obfuscation algorithm simply seeks to obfuscate.

Hence, I believe it is intent that matters most, as all digests will eventually be broken. To reduce a larger data set to a much, much smaller one means there is always a probability of collision, even if minuscule, approaching zero. A mathematician maybe can validate that statement, but it seems reasonable to me.

You get my drift. My personal definitions are just that - my preference. Use them at your own risk ;p.
« Last Edit: May 08, 2012, 09:35 PM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #9 on: May 08, 2012, 08:52 PM »
Actually, reading a bit, SHA1 is known to theoretically have producable collisions now too, though it is not very practical at this point .. so it is still pretty darn secure.
« Last Edit: May 09, 2012, 01:22 PM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #10 on: May 08, 2012, 09:20 PM »
I'm glad you asked this question, as it has made me check into the latest on the security of common digests (or secure hashes -- see how easier it is to say 'digest'? ;p).

MD5 == at least somewhat broken, hacked with practical implications due to the speed at which it can be hacked. Still, it suffices for MANY purposes just fine.
SHA1 == theoretically broken, feasibility of attack increasing, but still not feasible YET
SHA2 == current best thing to use if you trust the NSA, otherwise maybe ripemd320+ or whirlpool
SHA3 == under development

Remember, as I posted in my first rambling explanation, salting your digest will help further in cases where it can be done (e.g. on a web server storing password digests in a database, and if only the database were compromised so that the salt value was unknown). After all, a simple XOR of any data set by a key of equal size that is not zero-filled is essentially an unbreakable encryption method (assuming the data set size is reasonably large or original value not known or guessable when right, and the key not known).

So, intent, intent, intent... ;p.

Compare a digest, for instance, to a hash used to lookup things in a hash map/table - two different worlds. For information on hash tables, where use of short hashes that are *expected to produce collisions* is here: http://en.wikipedia.org/wiki/Hash_table
« Last Edit: May 09, 2012, 01:23 PM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #11 on: May 08, 2012, 10:04 PM »
http://bitsum.com/md5.php updated to show all hash/digest algorithms supported by the current version of PHP I have installed on the web server.
« Last Edit: May 09, 2012, 01:22 PM by db90h »

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #12 on: May 08, 2012, 10:15 PM »
I actually think I'm going to go with Whirlpool-T as my recommendation for the digest to use .. as I just don't know how far I trust the NSA ;p. It's got a huge bitspace of 512 bits, should be good. No known theoretical or practical breaches, though the original version of the algorithm did have some lessening to its security via a mistake, but none that were exploitable by any means.
« Last Edit: May 09, 2012, 01:12 PM by db90h »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,913
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #13 on: June 13, 2012, 04:21 AM »
The following thread links to some articles on the importance of not using cryptographic hashes for password hashing, and is quite relevant for this discussion: https://www.donation...ndex.php?topic=31289

db90h

  • Coding Snacks Author
  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 481
  • Software Engineer
    • View Profile
    • Bitsum - Take control of your PC
    • Read more about this member.
    • Donate to Member
Re: Random Question (About Hash Keys)
« Reply #14 on: June 13, 2012, 04:47 AM »
Interesting thread. I actually am currently using something like (not precisely, as I don't want to be too precise):
SALT^SHA2-512(SALT^SHA1(password))

Why? Because I needed to upgrade the hash algorithm, and for an unbreached database, the easiest way is to double hash, as opposed to waiting for people to login again and update their stored password hashes at that time.