I don’t know why other people say it, but I can explain why it’s nice to say it.
log P(x) behaves nicely in comparison to P(x) when it comes to placing iterated bets. When you maximize P(x), you’re susceptible to high risk high reward scenarios, even when they lead to failure with probability arbitrarily close to 1. The same is not true when maximizing log P(x). I’m cheating here since this only really makes sense when big-P refers to “principal” (i.e., the thing growing or shrinking with each bet) rather than “probability”.
p(x) doesn’t vary linearly with the controls we typically have, so calculus intuition tends to break down when used to optimize p(x). Log p(x) does usually vary linearly with the controls we typically have, so we can apply more calculus intuition to optimizing it. I think this happens because of the way we naturally think of “dimensions of” and “factors contributing to” a probability and the resulting quirks of typical maximum entropy distributions.
Log p(x) grows monotonically with p(x) whenever x is possible, so the result is the same whether you argmax log p(x) or p(x).
p(x) is usually intractable to calculate, but there’s a slick trick to approximate it using the Evidence Based Lower Bound, which requires dealing with log p(x) rather than p(x) directly. Saying log p(x) calls that trick to mind more easily than saying just p(x).
Why do people keep saying we should maximize log(odds) instead of odds? Isn’t each 1% of survival equally valuable?
I don’t know why other people say it, but I can explain why it’s nice to say it.
log P(x) behaves nicely in comparison to P(x) when it comes to placing iterated bets. When you maximize P(x), you’re susceptible to high risk high reward scenarios, even when they lead to failure with probability arbitrarily close to 1. The same is not true when maximizing log P(x). I’m cheating here since this only really makes sense when big-P refers to “principal” (i.e., the thing growing or shrinking with each bet) rather than “probability”.
p(x) doesn’t vary linearly with the controls we typically have, so calculus intuition tends to break down when used to optimize p(x). Log p(x) does usually vary linearly with the controls we typically have, so we can apply more calculus intuition to optimizing it. I think this happens because of the way we naturally think of “dimensions of” and “factors contributing to” a probability and the resulting quirks of typical maximum entropy distributions.
Log p(x) grows monotonically with p(x) whenever x is possible, so the result is the same whether you argmax log p(x) or p(x).
p(x) is usually intractable to calculate, but there’s a slick trick to approximate it using the Evidence Based Lower Bound, which requires dealing with log p(x) rather than p(x) directly. Saying log p(x) calls that trick to mind more easily than saying just p(x).
All the cool papers do it.
Paul’s comment here is relevant, but I’m also confused.