I said “probabilities” but should really have said something like “reciprocals of numbers of trials”. The thing you called P, when you wrote things like “E[1/P]” and “E[log 1/P]”. This has the same “units” as probability; if you wanted to express it in explicitly probabilistic terms it would be something like “least probable password the attacker has the resources to try”.
It’s fair enough to call that probabilities, I guess that just means that the thing I was really confused about is how you derive the CDF from the probabilities. That is, this part:
then the corresponding pdf for our measure of attacker-capability q=1/p is f(1/q)/q^2; the relevant CDF here is the integral from 0 to q of this.
Under my framework, the distribution of amount of guesses that an attacker is willing to go through is a separate independent parameter that one can choose. I don’t see where the f(1/q)/q^2 expression comes from.
It’s a separate independent parameter for me too. My intention (though of course I may have screwed up) is that I’m describing the same thing as you are :-).
So, suppose you have a particular probability distribution for the number of guesses an attacker has. I am describing that by saying what its pdf is as a function of 1/#guesses (the thing I’m calling p). I call this f(p). You’re describing it by saying what its cdf is as a function of #guesses (the thing you’re calling 1/P). You call this cdf(1/P). These two formulations are equivalent[1].
[1] Well, maybe there are some weird technical issues if the cdf is sufficiently horribly non-differentiable—a Cantor staircase or something of the sort.
Why start with f(p) rather than cdf(1/P)? No particular reason, I guess :-). It would probably have been cleaner just to define q=1/p (as I did) and then work with either the pdf or the cdf as a function of q.
So, let me try again—I am not saying anything different, or at least not intending to, just cleaning up my notation and language and so forth—and see whether you find the result (1) comprehensible and (2) compatible with what you said before. (Again, to be clear, I am claiming only to explicate what you already said, which I found excellent but a bit terse.)
A little more detail seems worth providing. Let’s say that an attacker has “resources R” if they are willing and able to crack passwords with probabilities at least 1/R. (Is this a realistic model? I think it is provided R is fairly large, which in practice it usually will be, provided attackers know the probability distribution we are using. If I am an attacker and know e.g. that passwords of type A are used 10x more often than passwords of type B, then I will use 10x more of my resources on type-A passwords. This falls down if I have enough resources that I literally can’t do that without repeating passwords, but for actually-realistic password distributions and attackers who are likely attacking multiple victims at once this is not a problem.)
So, we can describe the distribution of attackers in terms of the pdf f(R). In some sense, as described above, values of R are reciprocals of password-probabilities; the corresponding pdf over probabilities is f(1/p)/p^2 where p=1/R. If we start with the cdf F(R) instead, the pdf is f’(R), the pdf over probabilities is F’(1/p)/p^2, and of course the cdf for probabilities is just 1-F(1/p).
In each case, tailcalled’s prescription is to take E[cdf(1/p)], the expected fraction of attackers who fail to crack our password, as our figure of merit. The expectation is taken over our passwords with the probability distribution we are using (the probability distribution over passwords, not over attackers) so it equals the sum of p.F(1/p) where F is the cdf for attacker-resources.
These things I’ve called “pdfs” and “cdfs” may not actually integrate/tend to 1; we may leave them unnormalized, or indeed they may be improper (so that the integral of the pdf / limiting value of the cdf is infinite). What we are doing with these is just maximizing/minimizing some expectation involving them; rescaling the values of the pdf/cdf doesn’t change the result (so we needn’t care about normalization) and in many cases the expectation will converge even if the pdf/cdf is improper.
Let’s look more explicitly at the examples tailcalled mentioned. A uniform distribution for R, which of course is very improper indeed, corresponds to F(R)=R or f(R)=1. The corresponding pdf for p=1/R is 1/p^2. A uniform distribution for log R, still improper, corresponds to F(R) = log R or f(R) = 1/R. The corresponding pdf for p=1/R is 1/p. This is exactly the case in which expected utility = entropy.
(As tailcalled implied but didn’t quite say explicitly, the fact that these are improper is what enables the pathology described by OP: as R gets large the cdf → infinity, so a modest probability of picking extremely-low-probability passwords can give an arbitrarily large positive contribution to the expectation we’re trying to maximize, even if most of the time we do very badly.)
We looked at pdf(p) = 1/p^2 and pdf(p) = 1/p; let’s go one step further and take pdf(p)=1, corresponding to a uniform distribution over 1/R; cdf(R) = 1-1/R, and pdf(R) = 1/R^2. Our figure of merit is E[cdf(1/p)] = E[1-p] = 1-E[p], so the expected negative utility we get is (aside from irrelevant constant offsets and scalings) E[p] = sum(p^2).
(At this point, continuing from “How does this figure of demerit do …” in the original version of what I wrote seems fine, except that the final paragraph of what I already wrote is effectively incorporated above.)
So, suppose you have a particular probability distribution for the number of guesses an attacker has. I am describing that by saying what its pdf is as a function of 1/#guesses (the thing I’m calling p). I call this f(p). You’re describing it by saying what its cdf is as a function of #guesses (the thing you’re calling 1/P). You call this cdf(1/P). These two formulations are equivalent[1].
The place where I’m getting confused is:
In my model, the E is taking an expectation over different passwords. The P refers to the probability assigned to the selected password, not to anything related to the attacker. So 1/P refers to the number of guesses a password requires to crack.
My description of the distribution of attacker capabilities is just named cdf, not cdf(1/P); cdf(1/P) is the amount of attackers who have the capabilities to crack a given password.
… actually I just reread the thread and I think I misunderstood your previous comment:
I said “probabilities” but should really have said something like “reciprocals of numbers of trials”. The thing you called P, when you wrote things like “E[1/P]” and “E[log 1/P]”. This has the same “units” as probability; if you wanted to express it in explicitly probabilistic terms it would be something like “least probable password the attacker has the resources to try”.
Since in my original comment, P is not the least probable password the attacker has the resources to try.
Maybe to clean up the math a bit, let’s restart with my formulation:
Let W be the distribution of password you are selecting from, and w∼W be a password. Let PW be the prior pmf for W, let A be a distribution of adversary capabilities, and let cdfA be the cdf for A. The utility of a password-selection method is then given by approximately:
The function is cdf. The way it’s used in the expected-utility calculation is that it’s applied to 1/p where p is the probability of a given password. My original use of the term “probability” for the reciprocal of the thing fed to the cdf function was needlessly confusing, which is why I dropped it in the rewrite.
since in my original comment, P is not the least probable password the attacker has the resources to try.
In your original comment, P is the probability of a particular password. (I say this just to confirm that I do, and did, understand that.)
But if we are going to explain what the cdf function actually is, we need to say something of the form “cdf(R) is the fraction—or, in the case of improper not-exactly-probability-distributions, something more like the total number—of attackers for whom …”. And I think the correct way to fill in that ”...” is something like “when they crack passwords, we expect that the least probable password they’re likely to crack has probability 1/R”. (Right?)
In other words, I’m trying to be more explicit about what “adversary capabilities” actually cashes out to, and I think that’s what it is.
Your more-explicit formalization of the calculation agrees with my understanding; to whatever extent you feel that what I’m describing is different from what you’re describing, I am pretty confident the cause is not that we have different understandings of the mathematics at that point. I think it’s you misunderstanding me / me communicating badly, not me misunderstanding you / you communicating badly.
(It is a lamentable misfeature of our language that—so far as I can tell—there is no good way to say “what’s going on here is that what A is trying to say is not what B is interpreting it as” that doesn’t tend to assign blame to one or other party. You have to call it misunderstanding (implicitly blaming B) or miscommunicating (implicitly blaming A). But it takes two to tango and communication failures often involve suboptimality at both ends, and even in cases where it doesn’t assigning/taking blame is often an irrelevant distraction.)
In your original comment, P is the probability of a particular password. (I say this just to confirm that I do, and did, understand that.)
Yes.
But if we are going to explain what the cdf function actually is, we need to say something of the form “cdf(R) is the fraction—or, in the case of improper not-exactly-probability-distributions, something more like the total number—of attackers for whom …”. And I think the correct way to fill in that ”...” is something like “when they crack passwords, we expect that the least probable password they’re likely to crack has probability 1/R”. (Right?)
Yes, something like that. I’d probably fill it in with “who will probably succeed at cracking passwords w where P(w) is less than or equal to 1/R”, but it’s a similar point.
(It is a lamentable misfeature of our language that—so far as I can tell—there is no good way to say “what’s going on here is that what A is trying to say is not what B is interpreting it as” that doesn’t tend to assign blame to one or other party. You have to call it misunderstanding (implicitly blaming B) or miscommunicating (implicitly blaming A). But it takes two to tango and communication failures often involve suboptimality at both ends, and even in cases where it doesn’t assigning/taking blame is often an irrelevant distraction.)
I said “probabilities” but should really have said something like “reciprocals of numbers of trials”. The thing you called P, when you wrote things like “E[1/P]” and “E[log 1/P]”. This has the same “units” as probability; if you wanted to express it in explicitly probabilistic terms it would be something like “least probable password the attacker has the resources to try”.
It’s fair enough to call that probabilities, I guess that just means that the thing I was really confused about is how you derive the CDF from the probabilities. That is, this part:
Under my framework, the distribution of amount of guesses that an attacker is willing to go through is a separate independent parameter that one can choose. I don’t see where the f(1/q)/q^2 expression comes from.
It’s a separate independent parameter for me too. My intention (though of course I may have screwed up) is that I’m describing the same thing as you are :-).
So, suppose you have a particular probability distribution for the number of guesses an attacker has. I am describing that by saying what its pdf is as a function of 1/#guesses (the thing I’m calling p). I call this f(p). You’re describing it by saying what its cdf is as a function of #guesses (the thing you’re calling 1/P). You call this cdf(1/P). These two formulations are equivalent[1].
[1] Well, maybe there are some weird technical issues if the cdf is sufficiently horribly non-differentiable—a Cantor staircase or something of the sort.
Why start with f(p) rather than cdf(1/P)? No particular reason, I guess :-). It would probably have been cleaner just to define q=1/p (as I did) and then work with either the pdf or the cdf as a function of q.
So, let me try again—I am not saying anything different, or at least not intending to, just cleaning up my notation and language and so forth—and see whether you find the result (1) comprehensible and (2) compatible with what you said before. (Again, to be clear, I am claiming only to explicate what you already said, which I found excellent but a bit terse.)
A little more detail seems worth providing. Let’s say that an attacker has “resources R” if they are willing and able to crack passwords with probabilities at least 1/R. (Is this a realistic model? I think it is provided R is fairly large, which in practice it usually will be, provided attackers know the probability distribution we are using. If I am an attacker and know e.g. that passwords of type A are used 10x more often than passwords of type B, then I will use 10x more of my resources on type-A passwords. This falls down if I have enough resources that I literally can’t do that without repeating passwords, but for actually-realistic password distributions and attackers who are likely attacking multiple victims at once this is not a problem.)
So, we can describe the distribution of attackers in terms of the pdf f(R). In some sense, as described above, values of R are reciprocals of password-probabilities; the corresponding pdf over probabilities is f(1/p)/p^2 where p=1/R. If we start with the cdf F(R) instead, the pdf is f’(R), the pdf over probabilities is F’(1/p)/p^2, and of course the cdf for probabilities is just 1-F(1/p).
In each case, tailcalled’s prescription is to take E[cdf(1/p)], the expected fraction of attackers who fail to crack our password, as our figure of merit. The expectation is taken over our passwords with the probability distribution we are using (the probability distribution over passwords, not over attackers) so it equals the sum of p.F(1/p) where F is the cdf for attacker-resources.
These things I’ve called “pdfs” and “cdfs” may not actually integrate/tend to 1; we may leave them unnormalized, or indeed they may be improper (so that the integral of the pdf / limiting value of the cdf is infinite). What we are doing with these is just maximizing/minimizing some expectation involving them; rescaling the values of the pdf/cdf doesn’t change the result (so we needn’t care about normalization) and in many cases the expectation will converge even if the pdf/cdf is improper.
Let’s look more explicitly at the examples tailcalled mentioned. A uniform distribution for R, which of course is very improper indeed, corresponds to F(R)=R or f(R)=1. The corresponding pdf for p=1/R is 1/p^2. A uniform distribution for log R, still improper, corresponds to F(R) = log R or f(R) = 1/R. The corresponding pdf for p=1/R is 1/p. This is exactly the case in which expected utility = entropy.
(As tailcalled implied but didn’t quite say explicitly, the fact that these are improper is what enables the pathology described by OP: as R gets large the cdf → infinity, so a modest probability of picking extremely-low-probability passwords can give an arbitrarily large positive contribution to the expectation we’re trying to maximize, even if most of the time we do very badly.)
We looked at pdf(p) = 1/p^2 and pdf(p) = 1/p; let’s go one step further and take pdf(p)=1, corresponding to a uniform distribution over 1/R; cdf(R) = 1-1/R, and pdf(R) = 1/R^2. Our figure of merit is E[cdf(1/p)] = E[1-p] = 1-E[p], so the expected negative utility we get is (aside from irrelevant constant offsets and scalings) E[p] = sum(p^2).
(At this point, continuing from “How does this figure of demerit do …” in the original version of what I wrote seems fine, except that the final paragraph of what I already wrote is effectively incorporated above.)
The place where I’m getting confused is:
In my model, the E is taking an expectation over different passwords. The P refers to the probability assigned to the selected password, not to anything related to the attacker. So 1/P refers to the number of guesses a password requires to crack.
My description of the distribution of attacker capabilities is just named cdf, not cdf(1/P); cdf(1/P) is the amount of attackers who have the capabilities to crack a given password.
… actually I just reread the thread and I think I misunderstood your previous comment:
Since in my original comment, P is not the least probable password the attacker has the resources to try.
Maybe to clean up the math a bit, let’s restart with my formulation:
Let W be the distribution of password you are selecting from, and w∼W be a password. Let PW be the prior pmf for W, let A be a distribution of adversary capabilities, and let cdfA be the cdf for A. The utility of a password-selection method is then given by approximately:
U(W)=Ew∼W[cdfA(1/PW(w))]
The function is cdf. The way it’s used in the expected-utility calculation is that it’s applied to 1/p where p is the probability of a given password. My original use of the term “probability” for the reciprocal of the thing fed to the cdf function was needlessly confusing, which is why I dropped it in the rewrite.
In your original comment, P is the probability of a particular password. (I say this just to confirm that I do, and did, understand that.)
But if we are going to explain what the cdf function actually is, we need to say something of the form “cdf(R) is the fraction—or, in the case of improper not-exactly-probability-distributions, something more like the total number—of attackers for whom …”. And I think the correct way to fill in that ”...” is something like “when they crack passwords, we expect that the least probable password they’re likely to crack has probability 1/R”. (Right?)
In other words, I’m trying to be more explicit about what “adversary capabilities” actually cashes out to, and I think that’s what it is.
Your more-explicit formalization of the calculation agrees with my understanding; to whatever extent you feel that what I’m describing is different from what you’re describing, I am pretty confident the cause is not that we have different understandings of the mathematics at that point. I think it’s you misunderstanding me / me communicating badly, not me misunderstanding you / you communicating badly.
(It is a lamentable misfeature of our language that—so far as I can tell—there is no good way to say “what’s going on here is that what A is trying to say is not what B is interpreting it as” that doesn’t tend to assign blame to one or other party. You have to call it misunderstanding (implicitly blaming B) or miscommunicating (implicitly blaming A). But it takes two to tango and communication failures often involve suboptimality at both ends, and even in cases where it doesn’t assigning/taking blame is often an irrelevant distraction.)
Yes.
Yes, something like that. I’d probably fill it in with “who will probably succeed at cracking passwords w where P(w) is less than or equal to 1/R”, but it’s a similar point.
Yes.