View previous topic :: View next topic 
Author 
Message 
MattJones Just Arrived
Joined: 03 May 2005 Posts: 0

Posted: Wed May 04, 2005 2:28 am Post subject: One Way Hash Function 


Is there anyone can tell why One way Hash Function is hard to reverse?
such as MD5, even obtain the hash value by brute force, why still not possible to reverse the message?
furthermore, if a message been signed by MD5, so that there is a 16btye(i.e. 128 bit) hash value generated. now someone gets the message(may be transferred via Internet), and also generates the hash value of this message. Can I just use the first few btyes of both hash values to confirm two messages are identical? I think just use a few bytes is not enough, should use all the bytes to compare, even though two hash values are the same, it still cannot perfectly guarantee two messages are the same since collision might occur, so I just wonder if a collision is potentially to be occurred, so why still use hash function(MD5) to verify whether the message has been altered?
I will be much appreciate if someone can answer my questions. Cheers
Matt


Back to top 


Silverleaf Trusted SF Member
Joined: 01 May 2005 Posts: 0 Location: Adelaide, Australia

Posted: Wed May 04, 2005 5:28 am Post subject: 


Hi Matt,
Trying to recover a message that you only know a one way hash of by brute force is impractical because, unlike trying to brute force an ecryption key, which is of a wellknown length, one has to try an infinite number of possibilities, since one way hash functions typically accept an input of any size. It is hard enough trying to brute force, say, a 256 bit encryption key  trying to brute force a message which could be hundreds of megabytes long is essentially impossible.
As for your question as to whether or not it is sufficient to compare only the first few bytes of a hash to see if two messages are identical  I'm not sure why you'd want to do this. Comparing two 8 byte strings, say, is not significantly easier or faster than comparing two 16 byte strings. However, if there was a really compelling reason to do so  don't quote me, but I *think* that one of the design criterion of MD5 was that changing a single bit in the input message would cause a change in at least half of the bits of the resulting hash. So, bearing this in mind, if, say, two hashes have 8 bytes in common (ie: only half the bits of one hash differ from the other) then you could propbably say with some certainty that the inputs are "probably" identical or at least similar. I'm really not too sure of this, though, so unless somebody more knowledgable comes along and confirms it, I wouldn't start only comparing a few bytes. In fact, even if it was confirmed, I wouldn't do that  comparing all 16 bytes is practically no extra time or effort, and will give you the most confidence about the messages being identical.
As for why we can use MD5 to verify that a message has not been altered even thought two messages generating the same MD5 hash is not a perfect guarantee that the messages are identical  while this is true, it's my understanding that the probability of a "collision", ie: two different messages generating the same hash, is so low that it can be ignored for practical purposes. I believe there are attacks on MD5 which enable an attacker to find a collision, but the two inputs which collide are typically of different sizes, which limits how useful these attacks are for doing mischevous things.
I hope this has helped some  I'm not really an expert on one way hashes by any means, but everything above is true to the best of my knowledge. If you wait around long enough, I would consider it quite likely that somebody who knows what they're talking about much more than I will come around and give you a very good response.


Back to top 


ray2005 Just Arrived
Joined: 17 Apr 2005 Posts: 0

Posted: Wed May 04, 2005 10:23 am Post subject: 


Hi mate,
It does help me a lot, Thank you.
Matt


Back to top 


Silverleaf Trusted SF Member
Joined: 01 May 2005 Posts: 0 Location: Adelaide, Australia

Posted: Thu May 05, 2005 2:22 pm Post subject: 


No worries, glad to be a help. So the MattJones and ray2005 accounts belong to the same person? Guess that cuts the number of people I've been able to help down, then, heh.


Back to top 


JustinT Trusted SF Member
Joined: 17 Apr 2003 Posts: 16777215 Location: Asheville, NC, US / Uberlāndia, MG, Brazil

Posted: Fri May 06, 2005 4:50 am Post subject: A little more on hash function properties. 


Allow me to expand a little on hash function design criterion, and the model it mimics, to a computational extent. Basically, a hash function is a deterministic function, with an input who's length is arbitrary and an output who's length is fixed. There are three general criteria regarding what constitutes a secure hash function  collision resistance, preimage resistance, and second preimage resistance.
 Collision resistance  With no less than around 2^n/2 work, should an adversary find a pair of messages, M and M', such that h(M)=h(M'), where h is a cryptographically secure hash function. (M!=M')
 Preimage resistance  With no less than around 2^n work, should an adversary, given a potential output Y, find an input X, such that Y=h(X).
 Second preimage resistance  With no less than around 2^n work, should an adversary, given a message, M, find a second message, M', such that h(M)=h(M').
The above outlines the general expectancies of a cryptographically secure hash function; if an attack operates with less work than the above limitations, the function is academically broken, even if the attack exceeds practicality, from a computational standpoint. The mixing transformation of a hash function is modeled closely after an idealized model; this is the random oracle model. The idea behind this modeling is that a realized function should render output that is indistinguishable from the output of an idealized function, for all practical concerns.
For any input value, x, the output h(x) should be computationally indistinguishable from a uniform binary string, in the interval of [0, 2^n], where n is the fixed length of the hash function's output. By computational indistinguishability, I'm referring to polynomialtime indistinguishability.
This is basically following the general assumption that there exist pseudorandom functions that can't be distinguished, practically, from truly random functions. I will spare, for simplicity's sake, the complexity theory, but feel free to request a more elaborate explanation, if you'd like.
The obvious difference between the realized hash function and the idealized random oracle is that the former's distribution of output is computationally indistinguishable from the uniform distribution, while the latter's distribution is uniform. The assumption here is that the difference is negligible, and can't be determined by any polynomialbounded distinguisher.
On a side note, when designing cryptographic primitives, such as block ciphers and hash functions, the practical, realized function is often designed to emulate a theoretical, idealized function. Many security assumptions are derived from the indistinguishability between the two. In conclusion, provided that the hash function is cryptographically secure, it is safe to truncate the output, to a reasonable extent. Unless you're dealing with storage constraints that are so tight as to demand truncation, use all of the bits.


Back to top 


Silverleaf Trusted SF Member
Joined: 01 May 2005 Posts: 0 Location: Adelaide, Australia

Posted: Fri May 06, 2005 3:32 pm Post subject: 


Hi Justin,
Just a quick question, hopefully I'm not the only one wondering  could you formally define "with work 2^n" (etc) in this context? I presume n is the length of the message M, such that the longer the message the harder it is to find a collision / (second) preimage. Is the work a measure of how the difficulty scales with message length? ie: doubling the message length makes finding a collision 2^2/2 = 2 times as hard? This would be a familiar concept from computational mathematics transplanted to a new field with different terminology (so I'm hoping it's right! ), but maybe this is case of everything looking like a nail because I have a hammer.
Cheers


Back to top 


JustinT Trusted SF Member
Joined: 17 Apr 2003 Posts: 16777215 Location: Asheville, NC, US / Uberlāndia, MG, Brazil

Posted: Sat May 07, 2005 11:02 am Post subject: Hash function commentary. 


Silverleaf wrote: 
Hi Justin,
Just a quick question, hopefully I'm not the only one wondering  could you formally define "with work 2^n" (etc) in this context? I presume n is the length of the message M, such that the longer the message the harder it is to find a collision / (second) preimage. Is the work a measure of how the difficulty scales with message length? ie: doubling the message length makes finding a collision 2^2/2 = 2 times as hard? This would be a familiar concept from computational mathematics transplanted to a new field with different terminology (so I'm hoping it's right! ), but maybe this is case of everything looking like a nail because I have a hammer.

"Work", in this context, refers to the complexity of the attack. For a nbit hash function, where n is the length of the hash function's output, the generic assumption is that the complexity of finding collisions is around 2^n/2, while the complexity of finding preimages and second preimages is around 2^n.
The "exhaustive search", or "brute force", approach to a hash function, is, in a nutshell: for a nbit hash, you can expect to find two messages that hash to the same value, after evaluating, or hashing, around 2^n/2 random messages. Since there is an infinite number of messages that will hash to each possible hash value, there is also an infinite number of collisions that are possible. For a nbit hash, n should be chosen such that the probability of finding a collision is negligible, to the point of no practical exploitation; it will be around 1 in 2^n/2.
As for larger messages, when using techniques such as expandable messages, among other variations and subtleties, it has been shown, in theory, that a nbit iterated hash function, under certain constructions, doesn't provide the nbit security that was once assumed, in regards to second preimage resistance, for large messages, for example. So, the length of the message doesn't necessarily dictate any consistent attack complexity; for the generic assumptions of attack complexity relating to collisions, preimages, and second preimages, we generally look at the fixed length of the hash function's output.


Back to top 


