pengvado comments on Worse Than Random

pengvado 16 Nov 2009 6:48 UTC
4 points

A hash doesn’t have to collide; it only has to have collisions (by the Pigeonhole Principle) if the end hash is ‘smaller’ than the input. If I’m using SHA512 on data that is always less than 512 bits, then I’ll never get any collisions.

How do you know which aspect of the agent is unique, without prior communication? If it’s merely that agents have so many degrees of freedom that there’s a negligible probability that any two agents are identical in all aspects, then your hash output is smaller than its input. Also, you can’t use the 2^512 figure for SHA-512 unless you actually want to split a 2^512 size array. If you only have, say, 20 choices to split, then 20 is the size that counts for collision frequency, no matter what hash algorithm you use.

we XOR them together and get something that is unique if either was.

If hash-of-agent outputs are unique and your RNG is random, then the XOR is just random, not guaranteed-unique.
- gwern 16 Nov 2009 16:00 UTC
  1 point
  Parent
  
  How do you know which aspect of the agent is unique, without prior communication? If it’s merely that agents have so many degrees of freedom that there’s a negligible probability that any two agents are identical in all aspects, then your hash output is smaller than its input.
  
  If your array is size 20, say, then why not just take the first x bits of your identity (where 2^x=20)? (Why ‘first’, why not ‘last’? This is another Schelling point, like choice of injective mapping.)
  
  If hash-of-agent outputs are unique and your RNG is random, then the XOR is just random, not guaranteed-unique.
  
  This is a good point; I wasn’t sure whether it was true when I was writing it, but since you’ve pointed it out, I’ll assume it is. But this doesn’t destroy my argument: you don’t do any worse by adopting this more complex strategy. You still do just as well as a random pick.
  
  (Come to think of it: so what if you have to use the full bitstring specifying your uniqueness? You’ll still do better on problems the same size as your full bitstring, and if your mapping is good, the collisions will be as ‘random’ as the RNGs and you won’t do worse.)