Damn… You’re good. Anyway, 1 and 0 aren’t probabilities because Bayes Theorem break down there (in the log-odds/information base where Bayes Theorem is simple addition, they are positive and negative infinity). You can however meaningfully construct limits of probabilities. I prefer the notation (1 -) epsilon.
Log-odds aren’t what probability is, they’re a way to think about probability. They happen not to work so well when the probabilities are 0 and 1; they also fail rather dramatically for probability density functions. That doesn’t mean they don’t have their uses.
Similarly, Bayes’s Theorem breaks down because the proof of it assumes a nonzero probability. This isn’t fixed by defining away 0 and 1, because it can still return those as output, and then you end up looking silly. In many cases, not being able to condition on an event with probability 0 is the only thing to do: given that a d6 comes up both odd and even, what is the probability that the result is higher than 3?
[I tried saying some things about conditioning on sets of measure 0 here, but apparently I don’t know what I’m talking about so I will retract that portion of the comment for the sake of clarity.]
Log-odds are perfectly isomorphic with probabilities and satisfies Cox’s Theorem. Saying that log-odds are not what probabilities are is as non-sequiteur as saying 2+2 isn’t a valid representation of 4.
Bayes theorem assumes no such thing as non-zero probability, it assumes Real Numbered probabilities, as it is in fact a perfectly valid statement of real-number arithmetic in any other context. It just so happens to be that this arithmetic expression is undefined for when certain variables are 0, and is an identity (equal to 1) when certain variables are 1. Neither are particularly interesting.
Bayes Theorem is interesting because it becomes propositional logic when you apply it to a limit going towards 1 or 0.
Real life applications are not my expertise, but I know my groups, categories and types. 0 and 1 are not probabilities, just as positive and negative infinity are not Real Numbers. This is a truth derived directly from Russel’s Axioms, which is the definition basis for all modern mathematics.
When you say P(A) = 1 you are not using probabilities anymore, At best your are doing propositional logic, at worst you’ll get a type error. If you want to be as sure as you can, let credence be 1 - epsilon for arbitrarily small positive real epsilon.
Clearly log-odds aren’t perfectly isomorphic with multiplicative probabilities, since clearly one allows probabilities of 0 and 1 and the other doesn’t.
Bayes’s theorem does assume nonzero probability, as you can observe by examining its proof.
Pr[A & B] = Pr[B] Pr[A|B] = Pr[A] Pr[B|A] by definition of conditional probability.
Pr[A|B] = Pr[A] Pr[B|A] / Pr[B] if we divide by Pr[B]. This assumes Pr[B]>0 because otherwise this operation is invalid.
You can’t derive properties of probability from Russell’s axioms, because these describe set theory and not probability. One standard way of deriving properties of probability is via Dutch Book arguments. These can only show that probabilities must be in the range [0,1] (including the endpoints). In fact, no finite sequence of bets you offer me can distinguish a credence of 1 from a credence of 1-epsilon for sufficiently small epsilon. (That is, for any epsilon, there’s a bet that distinguishes 1-epsilon from 1, but for any sequence of bets, there’s an 1-epsilon that is indistinguishable from 1).
Here is an analogy. The well-known formula D = RT describes the relationship between distance traveled, average speed, and time. You can also express this as log(D) = log(R) + log(T) if you like, or D/R = T. In either of these formulas, setting R=0 will be an error. This doesn’t mean that there’s no such thing as a speed of 0, and if you think your speed is 0 you are actually traveling at a speed of epsilon for some very small value of epsilon. It just means that when you passed to these (mostly equivalent) formulations, you lost the capability to discuss speeds of 0. In fact, when we set R to 0 in the original formula, we get a more useful description of what happens: D=0 no matter the value of T. In other words, 0 is a valid speed, but you can’t travel a nonzero distance with an average speed of zero, no matter how much time you allow yourself.
What is the difference between log-odds and log-speeds, that makes the former an isomorphism and the latter an imperfect description?
Finally, do you really think that someone who thinks “0 and 1 are probabilities” is a statement LW is irrational about is unaware of the “0 and 1 are not probabilities” post?
By virtue of the definition of a logarithm, exp(log(x)) = x, we can derive that since the exponential function is well-defined for complex numbers, so is the logarithm. Taking the logarithm of a negative number nets you the logarithm of the absolute plus imaginary pi. The real part of any logarithm is a symmetric function, and there are probably a few other interesting properties of logarithms in complex analysis that I don’t know of.
log(0) is undefined, as you note, but that does not mean the limit x → 0 log(x) is undefined. It is in fact a pole singularity (if I have understood my singularity analysis correct). No matter how you approach it, you get negative infinity. So given your “logarithmic velocities,” I counter with the fact that using a limit it is still a perfect isomorphism. Limit of x → -inf exp(x) is indeed 0, so when using limits (which is practically what Real Numbers are for) your argument that log isn’t an isomophism from reals to reals is hereby proven invalid (if you want a step by step proof, just ask, it’ll be a fun exercise).
Given that logarithms are a category-theoretic isomorphism from the reals onto the reals (from the multiplication group onto the addition group) there is no reason why log-odds isn’t as valid as odds, which is as valid as ]0,1[ probabilities. Infinities are not valid Reals, 0 and 1 are not valid probablities QED.
As I said. Do not challenge anyone (including me) on the abstract algebra of the matter.
[I do apologize if the argument is poorly formulated, I am as of writing mildly intoxicated]
Yes, log(x) is an isomorphism from the positive reals as a multiplicative group to the real numbers as an additive group. As a result, it is only an isomorphism from multiplicative probabilities to additive log-probabilities if you assume that 0 is not a probability to begin with, which is circular logic.
To obtain paradoxes, it is you that would need access to more proofs than I do.
From an evidence-based point of view, as a contrapositive of the usual argument against P=0 and P=1, we can say that if it’s possible to convince me that a statement might be false, it must be that I already assign it probability strictly <1.
As you may have guessed, I also don’t agree with the point of view that I can be convinced of the truth of any statement, even given arbitrarily bizarre circumstances. I believe that one needs rules by which to reason. Obviously these can be changed, but you need meta-rules to describe how you change rules, and possibly meta-meta-rules as well, but there must be something basic to use.
So I assign P=1 to things that are fundamental to the way I think about the world. In my case, this includes the way I think about probability.
In more mathematical settings, you can successfully condition on events with probability 0 (for instance, if (X,Y) follow a bivariate normal distribution, you might want to know the probability distribution of Y given X=x).
You can’t really do this, since the answer depends on how you take the limit. You can find a limit of conditional probabilities, but saying “the probability distribution of Y given X=x” is ambiguous. This is known as the Borel-Kolmogorov paradox.
Damn… You’re good. Anyway, 1 and 0 aren’t probabilities because Bayes Theorem break down there (in the log-odds/information base where Bayes Theorem is simple addition, they are positive and negative infinity). You can however meaningfully construct limits of probabilities. I prefer the notation (1 -) epsilon.
Log-odds aren’t what probability is, they’re a way to think about probability. They happen not to work so well when the probabilities are 0 and 1; they also fail rather dramatically for probability density functions. That doesn’t mean they don’t have their uses.
Similarly, Bayes’s Theorem breaks down because the proof of it assumes a nonzero probability. This isn’t fixed by defining away 0 and 1, because it can still return those as output, and then you end up looking silly. In many cases, not being able to condition on an event with probability 0 is the only thing to do: given that a d6 comes up both odd and even, what is the probability that the result is higher than 3?
[I tried saying some things about conditioning on sets of measure 0 here, but apparently I don’t know what I’m talking about so I will retract that portion of the comment for the sake of clarity.]
Log-odds are perfectly isomorphic with probabilities and satisfies Cox’s Theorem. Saying that log-odds are not what probabilities are is as non-sequiteur as saying 2+2 isn’t a valid representation of 4.
Bayes theorem assumes no such thing as non-zero probability, it assumes Real Numbered probabilities, as it is in fact a perfectly valid statement of real-number arithmetic in any other context. It just so happens to be that this arithmetic expression is undefined for when certain variables are 0, and is an identity (equal to 1) when certain variables are 1. Neither are particularly interesting.
Bayes Theorem is interesting because it becomes propositional logic when you apply it to a limit going towards 1 or 0.
Real life applications are not my expertise, but I know my groups, categories and types. 0 and 1 are not probabilities, just as positive and negative infinity are not Real Numbers. This is a truth derived directly from Russel’s Axioms, which is the definition basis for all modern mathematics.
When you say P(A) = 1 you are not using probabilities anymore, At best your are doing propositional logic, at worst you’ll get a type error. If you want to be as sure as you can, let credence be 1 - epsilon for arbitrarily small positive real epsilon.
1 and 0 are not probabilities by definition
Clearly log-odds aren’t perfectly isomorphic with multiplicative probabilities, since clearly one allows probabilities of 0 and 1 and the other doesn’t.
Bayes’s theorem does assume nonzero probability, as you can observe by examining its proof.
Pr[A & B] = Pr[B] Pr[A|B] = Pr[A] Pr[B|A] by definition of conditional probability.
Pr[A|B] = Pr[A] Pr[B|A] / Pr[B] if we divide by Pr[B]. This assumes Pr[B]>0 because otherwise this operation is invalid.
You can’t derive properties of probability from Russell’s axioms, because these describe set theory and not probability. One standard way of deriving properties of probability is via Dutch Book arguments. These can only show that probabilities must be in the range [0,1] (including the endpoints). In fact, no finite sequence of bets you offer me can distinguish a credence of 1 from a credence of 1-epsilon for sufficiently small epsilon. (That is, for any epsilon, there’s a bet that distinguishes 1-epsilon from 1, but for any sequence of bets, there’s an 1-epsilon that is indistinguishable from 1).
Here is an analogy. The well-known formula D = RT describes the relationship between distance traveled, average speed, and time. You can also express this as log(D) = log(R) + log(T) if you like, or D/R = T. In either of these formulas, setting R=0 will be an error. This doesn’t mean that there’s no such thing as a speed of 0, and if you think your speed is 0 you are actually traveling at a speed of epsilon for some very small value of epsilon. It just means that when you passed to these (mostly equivalent) formulations, you lost the capability to discuss speeds of 0. In fact, when we set R to 0 in the original formula, we get a more useful description of what happens: D=0 no matter the value of T. In other words, 0 is a valid speed, but you can’t travel a nonzero distance with an average speed of zero, no matter how much time you allow yourself.
What is the difference between log-odds and log-speeds, that makes the former an isomorphism and the latter an imperfect description?
Finally, do you really think that someone who thinks “0 and 1 are probabilities” is a statement LW is irrational about is unaware of the “0 and 1 are not probabilities” post?
Potholing that last sentence was mostly for fun.
By virtue of the definition of a logarithm, exp(log(x)) = x, we can derive that since the exponential function is well-defined for complex numbers, so is the logarithm. Taking the logarithm of a negative number nets you the logarithm of the absolute plus imaginary pi. The real part of any logarithm is a symmetric function, and there are probably a few other interesting properties of logarithms in complex analysis that I don’t know of.
log(0) is undefined, as you note, but that does not mean the limit x → 0 log(x) is undefined. It is in fact a pole singularity (if I have understood my singularity analysis correct). No matter how you approach it, you get negative infinity. So given your “logarithmic velocities,” I counter with the fact that using a limit it is still a perfect isomorphism. Limit of x → -inf exp(x) is indeed 0, so when using limits (which is practically what Real Numbers are for) your argument that log isn’t an isomophism from reals to reals is hereby proven invalid (if you want a step by step proof, just ask, it’ll be a fun exercise).
Given that logarithms are a category-theoretic isomorphism from the reals onto the reals (from the multiplication group onto the addition group) there is no reason why log-odds isn’t as valid as odds, which is as valid as ]0,1[ probabilities. Infinities are not valid Reals, 0 and 1 are not valid probablities QED.
As I said. Do not challenge anyone (including me) on the abstract algebra of the matter.
[I do apologize if the argument is poorly formulated, I am as of writing mildly intoxicated]
Yes, log(x) is an isomorphism from the positive reals as a multiplicative group to the real numbers as an additive group. As a result, it is only an isomorphism from multiplicative probabilities to additive log-probabilities if you assume that 0 is not a probability to begin with, which is circular logic.
So, pray tell: When are P=0 and P=1 applicable? Don’t you get paradoxes? What prior allows you to attain them?
I am really genuinely curious what sort of proofs you have access to that I do not.
To obtain paradoxes, it is you that would need access to more proofs than I do.
From an evidence-based point of view, as a contrapositive of the usual argument against P=0 and P=1, we can say that if it’s possible to convince me that a statement might be false, it must be that I already assign it probability strictly <1.
As you may have guessed, I also don’t agree with the point of view that I can be convinced of the truth of any statement, even given arbitrarily bizarre circumstances. I believe that one needs rules by which to reason. Obviously these can be changed, but you need meta-rules to describe how you change rules, and possibly meta-meta-rules as well, but there must be something basic to use.
So I assign P=1 to things that are fundamental to the way I think about the world. In my case, this includes the way I think about probability.
You can’t really do this, since the answer depends on how you take the limit. You can find a limit of conditional probabilities, but saying “the probability distribution of Y given X=x” is ambiguous. This is known as the Borel-Kolmogorov paradox.
Oops. Right, I knew there were some problems here, but I thought the way I defined it I was safe. I guess not. Thanks for keeping me honest!