To clarify a point in my sibling comment, the concept of “password strength” doesn’t cleanly apply to an individual password. It’s too contingent on factors that aren’t within the password itself. Say I had some way of scoring passwords on their strength, and that this scoring method tells me that “correct horse battery staple” is a great password. But then some guy puts that password in a webcomic read by millions of people—now my password is going to be a lot worse, even though the content of the password didn’t change.
Password selection schemes aren’t susceptible to this kind of problem, and you can consistently compare the strength of one with the strength of another, using methods like the ones I’m talking about in the OP.
Ah I see what you’re saying. Given a non-uniform password generation algorithm (like 99.9% “password”, 0.1% random 1000-bit ASCII string), taking the average entropy of the schema is misleading, since the average password has 50 bits of entropy but 99.9% of the time the password will have 1 bit of entropy, and in this case taking the worst-entropy result (1 bit) is more useful than the average.
I think you’re right, but this doesn’t seem very useful, since any password generation scheme where this is relevant is a bad idea (since if you switched to a uniform distribution, you could either have stronger passwords with the same length, or just as strong passwords with a shorter length).
any password generation scheme where this is relevant is a bad idea
I disagree; as the post mentions, sometimes considerations such as memorability come into play. One example might be choosing random English sentences as passwords. You might do that by choosing a random parse tree of a certain size. But some English sentences have ambiguous parses, i.e. they’ll have multiple ways to generate them. You *could* try to sample to avoid this problem, but it becomes pretty tricky to do that carefully. If you instead find the “most ambiguous sentence” in your set, you can get a lower bound on the safety of your scheme.
One feature of good password schemes is that you have some way to recover lost passwords.
Let’s say I chose as my password: “Tithptacsp,aiwwitcwwcaelp”. That password has plenty of entropy if you just look at it.
Then I might write down in some not “Entropy isn’t sufficient to measure − 3-1”. This allows me to go back to this post to look up the third paragraph and take the first sentence of it. Then I find the sentence “This is typically how people think about choosing strong passwords, and it works well in the case where we’re choosing among equally likely passwords” and can reconstruct “Tithptacsp,aiwwitcwwcaelp”. Sentences are also generally good mnemonics.
If someone would however know that I’m using LessWrong as my source for passwords this way, that would allow them to just go through all sentences on LessWrong posts which radically reduces the entropy.
To clarify a point in my sibling comment, the concept of “password strength” doesn’t cleanly apply to an individual password. It’s too contingent on factors that aren’t within the password itself. Say I had some way of scoring passwords on their strength, and that this scoring method tells me that “correct horse battery staple” is a great password. But then some guy puts that password in a webcomic read by millions of people—now my password is going to be a lot worse, even though the content of the password didn’t change.
Password selection schemes aren’t susceptible to this kind of problem, and you can consistently compare the strength of one with the strength of another, using methods like the ones I’m talking about in the OP.
Ah I see what you’re saying. Given a non-uniform password generation algorithm (like 99.9% “password”, 0.1% random 1000-bit ASCII string), taking the average entropy of the schema is misleading, since the average password has 50 bits of entropy but 99.9% of the time the password will have 1 bit of entropy, and in this case taking the worst-entropy result (1 bit) is more useful than the average.
I think you’re right, but this doesn’t seem very useful, since any password generation scheme where this is relevant is a bad idea (since if you switched to a uniform distribution, you could either have stronger passwords with the same length, or just as strong passwords with a shorter length).
I disagree; as the post mentions, sometimes considerations such as memorability come into play. One example might be choosing random English sentences as passwords. You might do that by choosing a random parse tree of a certain size. But some English sentences have ambiguous parses, i.e. they’ll have multiple ways to generate them. You *could* try to sample to avoid this problem, but it becomes pretty tricky to do that carefully. If you instead find the “most ambiguous sentence” in your set, you can get a lower bound on the safety of your scheme.
One feature of good password schemes is that you have some way to recover lost passwords.
Let’s say I chose as my password: “Tithptacsp,aiwwitcwwcaelp”. That password has plenty of entropy if you just look at it.
Then I might write down in some not “Entropy isn’t sufficient to measure − 3-1”. This allows me to go back to this post to look up the third paragraph and take the first sentence of it. Then I find the sentence “This is typically how people think about choosing strong passwords, and it works well in the case where we’re choosing among equally likely passwords” and can reconstruct “Tithptacsp,aiwwitcwwcaelp”. Sentences are also generally good mnemonics.
If someone would however know that I’m using LessWrong as my source for passwords this way, that would allow them to just go through all sentences on LessWrong posts which radically reduces the entropy.