It’s not quite correct, but you’ve broadly got the thrust of it.
When two different inputs produce the same output from a hashing function, this is called a hash collision. Finding collisions in the SHA256 hashing function is part of how bitcoin mining works. It is very computationally expensive (which is kind of the point, re: bitcoin mining), but it’s certainly tractable to find some input that generates the same output as another one.
If you were to try and search the space of all possible inputs for MD5, you’d quickly(ish) find an input that collided with the Obama Werewolf input, but it’d be garbage. If you had some system for quickly and reliably generating comprehensible English sentences, and just searched that space, you’d probably find a comprehensible English sentence that collided with it, but it would almost certainly not be the Obama Werewolf sentence. It’s worth noting that MD5 will take an input of any length, and the space of possible comprehensible English sentences of unbounded length is basically infinite.
For short snippets of text, it’s hard to find two comprehensible English sentences that collide under MD5, but in the link gjm provides above, there’s a method for forcing the MD5 hash of a PDF by exploiting how MD5 works. For this reason and others, if you’re doing any cryptographic heavy lifting, you probably want to use something more robust than MD5.
If you were to try and search the space of all possible inputs for MD5, you’d quickly(ish) find an input that collided with the Obama Werewolf input, but it’d be garbage.
Really? Last I checked, the best known preimage attack against MD5 was too slow to be practical. Finding collisions is drastically easier, though I don’t know any method for doing it with arbitrary plain-text English sentences.
It’s not quite correct, but you’ve broadly got the thrust of it.
When two different inputs produce the same output from a hashing function, this is called a hash collision. Finding collisions in the SHA256 hashing function is part of how bitcoin mining works. It is very computationally expensive (which is kind of the point, re: bitcoin mining), but it’s certainly tractable to find some input that generates the same output as another one.
If you were to try and search the space of all possible inputs for MD5, you’d quickly(ish) find an input that collided with the Obama Werewolf input, but it’d be garbage. If you had some system for quickly and reliably generating comprehensible English sentences, and just searched that space, you’d probably find a comprehensible English sentence that collided with it, but it would almost certainly not be the Obama Werewolf sentence. It’s worth noting that MD5 will take an input of any length, and the space of possible comprehensible English sentences of unbounded length is basically infinite.
For short snippets of text, it’s hard to find two comprehensible English sentences that collide under MD5, but in the link gjm provides above, there’s a method for forcing the MD5 hash of a PDF by exploiting how MD5 works. For this reason and others, if you’re doing any cryptographic heavy lifting, you probably want to use something more robust than MD5.
Really? Last I checked, the best known preimage attack against MD5 was too slow to be practical. Finding collisions is drastically easier, though I don’t know any method for doing it with arbitrary plain-text English sentences.