tailcalled comments on Entropy isn’t sufficient to measure password strength

tailcalled 18 Jan 2022 17:03 UTC
1 point
So, suppose you have a particular probability distribution for the number of guesses an attacker has. I am describing that by saying what its pdf is as a function of 1/#guesses (the thing I’m calling p). I call this f(p). You’re describing it by saying what its cdf is as a function of #guesses (the thing you’re calling 1/P). You call this cdf(1/P). These two formulations are equivalent[1].
The place where I’m getting confused is:
In my model, the E is taking an expectation over different passwords. The P refers to the probability assigned to the selected password, not to anything related to the attacker. So 1/P refers to the number of guesses a password requires to crack.
My description of the distribution of attacker capabilities is just named cdf, not cdf(1/P); cdf(1/P) is the amount of attackers who have the capabilities to crack a given password.
… actually I just reread the thread and I think I misunderstood your previous comment:
I said “probabilities” but should really have said something like “reciprocals of numbers of trials”. The thing you called P, when you wrote things like “E[1/P]” and “E[log 1/P]”. This has the same “units” as probability; if you wanted to express it in explicitly probabilistic terms it would be something like “least probable password the attacker has the resources to try”.
Since in my original comment, P is not the least probable password the attacker has the resources to try.
Maybe to clean up the math a bit, let’s restart with my formulation:
Let $W$ be the distribution of password you are selecting from, and $w \sim W$ be a password. Let $P_{W}$ be the prior pmf for $W$ , let $A$ be a distribution of adversary capabilities, and let $c d f_{A}$ be the cdf for $A$ . The utility of a password-selection method is then given by approximately:
$U (W) = E_{w \sim W} [c d f_{A} (1 / P_{W} (w))]$
- gjm 18 Jan 2022 17:48 UTC
  2 points
  Parent
  The function is cdf. The way it’s used in the expected-utility calculation is that it’s applied to 1/p where p is the probability of a given password. My original use of the term “probability” for the reciprocal of the thing fed to the cdf function was needlessly confusing, which is why I dropped it in the rewrite.
  since in my original comment, P is not the least probable password the attacker has the resources to try.
  In your original comment, P is the probability of a particular password. (I say this just to confirm that I do, and did, understand that.)
  But if we are going to explain what the cdf function actually is, we need to say something of the form “cdf(R) is the fraction—or, in the case of improper not-exactly-probability-distributions, something more like the total number—of attackers for whom …”. And I think the correct way to fill in that ”...” is something like “when they crack passwords, we expect that the least probable password they’re likely to crack has probability 1/R”. (Right?)
  In other words, I’m trying to be more explicit about what “adversary capabilities” actually cashes out to, and I think that’s what it is.
  Your more-explicit formalization of the calculation agrees with my understanding; to whatever extent you feel that what I’m describing is different from what you’re describing, I am pretty confident the cause is not that we have different understandings of the mathematics at that point. I think it’s you misunderstanding me / me communicating badly, not me misunderstanding you / you communicating badly.
  (It is a lamentable misfeature of our language that—so far as I can tell—there is no good way to say “what’s going on here is that what A is trying to say is not what B is interpreting it as” that doesn’t tend to assign blame to one or other party. You have to call it misunderstanding (implicitly blaming B) or miscommunicating (implicitly blaming A). But it takes two to tango and communication failures often involve suboptimality at both ends, and even in cases where it doesn’t assigning/taking blame is often an irrelevant distraction.)
  - tailcalled 18 Jan 2022 18:14 UTC
    1 point
    Parent
    In your original comment, P is the probability of a particular password. (I say this just to confirm that I do, and did, understand that.)
    Yes.
    But if we are going to explain what the cdf function actually is, we need to say something of the form “cdf(R) is the fraction—or, in the case of improper not-exactly-probability-distributions, something more like the total number—of attackers for whom …”. And I think the correct way to fill in that ”...” is something like “when they crack passwords, we expect that the least probable password they’re likely to crack has probability 1/R”. (Right?)
    Yes, something like that. I’d probably fill it in with “who will probably succeed at cracking passwords w where P(w) is less than or equal to 1/R”, but it’s a similar point.
    (It is a lamentable misfeature of our language that—so far as I can tell—there is no good way to say “what’s going on here is that what A is trying to say is not what B is interpreting it as” that doesn’t tend to assign blame to one or other party. You have to call it misunderstanding (implicitly blaming B) or miscommunicating (implicitly blaming A). But it takes two to tango and communication failures often involve suboptimality at both ends, and even in cases where it doesn’t assigning/taking blame is often an irrelevant distraction.)
    Yes.