In the example I cited, P(I tell you outcome is D | outcome is D) = 1 and P(I tell you outcome is D | outcome is not D) = 0 (roughly). Thus log(P(E|H)/P(E)) = 3 and log(P(E|H)/P(E|!H)) = infinity. Log is base 1⁄2. Probability-bits and Odds-ratio-bits really are very different units, and Eliezer confusingly described them as the same thing. They are not interchangable like 1⁄2 m v^2, G m1 m2 / r, and m c^2.
I may be missing something here (and the karma voting patterns suggest that I am). But I will repeat my claim—perhaps with more clarity:
Bits are bits, just as joules are joules. But just as you can use joules as a unit to quantify different kinds of energy (kinetic, potential, relativistic), you can use bits as a unit to quantify different kinds of information (log odds-ratio, log likelihood ratio, channel capacity (in some fixed amount of time), entropy of a message source. Each of these kinds of information is measured in the same unit—bits.
You can measure evidence in bits, and you can measure the information content of the answer to a question in bits. The two are calculated using different formulas, because they are different things. Just as potential and kinetic energy are different things.
You are correct that bits can be used to measure different things. The problem here is that probabilities and odds ratios describe the exact same thing in different ways. A joule of potential energy is not the same thing as a joule of kinetic energy, but they can be converted to each other at a 1:1 ratio. A probability-bit measures the same thing as an odds-ratio-bit, but is a different quantity (a probability-bit is always greater than 1 odds-ratio-bit, and can be up to infinity odds-ratio-bits). A “bit of evidence” does not unambiguously tell someone whether you mean probability-bit or odds-ratio-bit, and Eliezer does not distinguish between them properly.
1 probability bit in favor of a hypothesis gives you a posterior probability of 1/2^(n-1) from a prior of 1/2^n. n probability bits gives you a posterior of 1 from the same prior.
1 odds ratio bit in favor of a hypothesis gives you a posterior odds ratio of 1:2^(n-1) from a prior of 1:2^n. n probability bits gives you a posterior odds ratio of 1:1 (probability 1⁄2) from the same prior. It takes infinity probability bits to give you a posterior probability of 1.
As the prior probability approaches 0, the types of bits become interchangeable.
Clearly you understand me now, and I think that I understand you.
A “bit of evidence” does not unambiguously tell someone whether you mean probability-bit or odds-ratio-bit, and Eliezer does not distinguish between them properly.
OK, if what is at issue here is whether Eliezer was sufficiently clear, then I’ll bow out. Obviously, he was not sufficiently clear from your viewpoint. I will say, though, that your comment is the first time I have seen the word “evidence” used by a Bayesian for anything other than a log odds ratio.
Log odds evidence has the virtue that it is additive (when independent). On the other hand, your idea of a log probability meaning of ‘evidence’ has the virtue that a question can be decided by a finite amount of evidence.
Ok, I think you are misinterpreting, but I see what you mean. When EY writes:
...I have transmitted three bits of information to you, because I informed you of an outcome whose probability was 1⁄8.
I take this as illustrating the definition of bits in general, rather than bits of ‘evidence’. But, yes, I agree with you now that placing that explanation in a paragraph with that lead sentence promising a definition of ‘evidence’ - well it definitely could have been written more clearly.
In the example I cited, P(I tell you outcome is D | outcome is D) = 1 and P(I tell you outcome is D | outcome is not D) = 0 (roughly). Thus log(P(E|H)/P(E)) = 3 and log(P(E|H)/P(E|!H)) = infinity. Log is base 1⁄2. Probability-bits and Odds-ratio-bits really are very different units, and Eliezer confusingly described them as the same thing. They are not interchangable like 1⁄2 m v^2, G m1 m2 / r, and m c^2.
I may be missing something here (and the karma voting patterns suggest that I am). But I will repeat my claim—perhaps with more clarity:
Bits are bits, just as joules are joules. But just as you can use joules as a unit to quantify different kinds of energy (kinetic, potential, relativistic), you can use bits as a unit to quantify different kinds of information (log odds-ratio, log likelihood ratio, channel capacity (in some fixed amount of time), entropy of a message source. Each of these kinds of information is measured in the same unit—bits.
You can measure evidence in bits, and you can measure the information content of the answer to a question in bits. The two are calculated using different formulas, because they are different things. Just as potential and kinetic energy are different things.
You are correct that bits can be used to measure different things. The problem here is that probabilities and odds ratios describe the exact same thing in different ways. A joule of potential energy is not the same thing as a joule of kinetic energy, but they can be converted to each other at a 1:1 ratio. A probability-bit measures the same thing as an odds-ratio-bit, but is a different quantity (a probability-bit is always greater than 1 odds-ratio-bit, and can be up to infinity odds-ratio-bits). A “bit of evidence” does not unambiguously tell someone whether you mean probability-bit or odds-ratio-bit, and Eliezer does not distinguish between them properly.
1 probability bit in favor of a hypothesis gives you a posterior probability of 1/2^(n-1) from a prior of 1/2^n. n probability bits gives you a posterior of 1 from the same prior.
1 odds ratio bit in favor of a hypothesis gives you a posterior odds ratio of 1:2^(n-1) from a prior of 1:2^n. n probability bits gives you a posterior odds ratio of 1:1 (probability 1⁄2) from the same prior. It takes infinity probability bits to give you a posterior probability of 1.
As the prior probability approaches 0, the types of bits become interchangeable.
Clearly you understand me now, and I think that I understand you.
OK, if what is at issue here is whether Eliezer was sufficiently clear, then I’ll bow out. Obviously, he was not sufficiently clear from your viewpoint. I will say, though, that your comment is the first time I have seen the word “evidence” used by a Bayesian for anything other than a log odds ratio.
Log odds evidence has the virtue that it is additive (when independent). On the other hand, your idea of a log probability meaning of ‘evidence’ has the virtue that a question can be decided by a finite amount of evidence.
Eliezer used it to mean log probability in the section that I quoted. That was what I was complaining about.
Ok, I think you are misinterpreting, but I see what you mean. When EY writes:
I take this as illustrating the definition of bits in general, rather than bits of ‘evidence’. But, yes, I agree with you now that placing that explanation in a paragraph with that lead sentence promising a definition of ‘evidence’ - well it definitely could have been written more clearly.