gjm comments on What Money Cannot Buy

gjm 7 Feb 2020 13:37 UTC
13 points
I’m pretty sure you’re wrong about the xkcd example.
He doesn’t just look at the number of characters in the four words. He reckons 11 bits of entropy per word and doesn’t operate at the letter level at all. If those words were picked at random from a list of ~2000 words then the entropy estimate is correct.
I don’t know where he actually got those words from. Maybe he just pulled them out of his head, in which case the effective entropy might be higher or lower. To get a bit of a handle on this, I found something on the internet that claims to be a list of the 2000ish most common English words (the actual figure is 2265, as it happens) and
- checked whether the xkcd words are in the list (“correct” and “horse” are, “battery” and “staple” aren’t)
- generated some quadruples of random words from the list to see whether they feel stranger than the xkcd set (which, if true, would suggest that maybe he picked his by a process with less real entropy than picking words at random from a set of 2000). I got: result lie variety work; fail previously anything weakness; experienced understand relative efficiency; ear recognize list shower; classroom inflation space refrigerator. These feel to me about as strange as the xkcd set.
So I’m pretty willing to believe that the xkcd words really do have ~11 independent bits of entropy each.
In your local community’s procedure, I worry about “finding a very long sentence”. Making it up, or finding it somewhere else? The total number of sentences in already-existing English text is probably quite a lot less than 2^44, and I bet there isn’t that much entropy in your choice of which letter to pick from each word.
- adamShimi 7 Feb 2020 14:07 UTC
  7 points
  Parent
  You’re right. I was thinking on the level of letters, but the fact that he gives the same number of bits of entropy to four quite different words should have alerted me. And with around 2000 common words to choose from, the entropy is indeed around 11 bits per word.
  Thanks for the correction!
  (For our local password, the sentences tends to be created, to avoid some basic dictionary attacks, and they tends to be complex and full of puns. But you might be right about the entropy loss in this case.