It’s based on Conservation of Expected Evidence (if the “technical reasons of probability theory” refer to something else, let me know).
By the way, separate from our conversation downthread, I don’t think that is the technical reason being referred to. Or at least, it’s a rather indirect way of proving that point.
Bayes’ Theorem is
P(A|B) = P(B|A)P(A)/P(B).
If P(A) = 0, then P(A|B) = P(B|A)*0/P(B) = 0 as well, no matter what P(B|A) and P(B) are. Or in words: if you start with credence exactly zero in some proposition, it is impossible for any piece of evidence to make you update away from that. By the contrapositive, if it is not impossible for you to update away from your original opinion (“change your mind”), your credence is nonzero.
A similar argument holds for probability 1, which should be unsurprising, since P(A) = 1 is equivalent to P(~A) = 0.
The problem with this argument is that it assumes that evidence is not altered. What I mean is that Bayesian updating implicitly assumes that all evidence previously used is included in the new calculation, and the new evidence is a strict superset of the old one. However, suppose I hypothetically assign 100% to any math fact “simple enough” that I can verify it mentally in under a minute (to choose an arbitrary time). So today, when I’m visualizing 2+2=4, I can say that I put a 100% confidence on the claim “2+2=4″.
Now, is this contradicted by the fact that tomorrow I will see new evidence, causing me to conclude that 2+2=3? No. Aside from seeing new evidence later, my current evidence is being changed. Right now, the evidence consists of actual brain operations that visualize 2+2. Tomorrow, that evidence is in the form of memories of brain operations. If I live in a possible world where only memories can be edited and not actual running brain processes, then tomorrow I will conclude that today’s memories were faked. That is not something I can conclude today, because I can repeat the visualization at any time. (One minute after, I might be relying on memories, but at the time, I’m not.)
However, suppose I hypothetically assign 100% to any math fact “simple enough” that I can verify it mentally in under a minute (to choose an arbitrary time).
That isn’t a valid operation.
For one: assigning 100% confidence in your ability to correctly do something on which you do not have, historically, a 100% track record is quite unwise. Probably you aren’t even 1-10^-6 reliable, historically, and that would still be infinitely far short of 100%. But it’s a toy hypothetical, so realism isn’t the primary objection.
More importantly, we don’t get to arbitrarily assign probabilities.
The problem with this argument is that it assumes that evidence is not altered. What I mean is that Bayesian updating implicitly assumes that all evidence previously used is included in the new calculation, and the new evidence is a strict superset of the old one.
Bayes’ Theorem, as the name implies, is a theorem. It does not assume anything about evidence; it doesn’t even mention evidence. It talks strictly about probabilities. All this “evidence” stuff is high-level natural-language abstraction about what the probabilities “mean”—the math itself is a reduction of the concept of evidence. It only assumes some axioms of probability; you may attempt to dispute those if you like, but that would be a very different conversation.
And, because Bayes’ Theorem is a theorem, assigning 100% confidence to any proposition of which you could in principle ever cease to have 100% confidence is strictly, provably an error.
The special case of reasoning while unable to trust your own sanity requires lots of conditions that are usually negligible. For example, P(X happened | I remember that X happened) is usually pretty close to 1; for most purposes we can ignore it and pretend “X happened” and “I remember X happened” are the same thing. But if you suspect your memories have been altered, this is no longer true, so you’ll have that extra factor in certain calculations.
Nothing that you are describing is outside the domain of the relevant math. It’s just weird corner cases.
Why can’t “X happened” be infinite evidence for X, while “I remember that X happened” only finite?
Bayes theorem applies, but it’s not being applied accurately, because of these special cases.
And, because Bayes’ Theorem is a theorem, assigning 100% confidence to any proposition of which you could in principle ever cease to have 100% confidence is strictly, provably an error.
Define “you” and “ever”. I argue that the “you” who changes there mind tomorrow is not the same observer that decides with 100% probability today, because the one today has information that the one tomorrow doesn’t; namely, actual brain ops, versus memories for tomorrow you.
I could in principle be convinced that my 100% assesment is wrong: by removing or editing the evidence. That is not Bayesian updating, it’s brain editing, and then a Bayesian update on other evidence.
You’re equating today me with tomorrow me, and you can’t do that unless all my current evidence will still be there tomorrow.
Why didn’t EY use an example of a hypothetical other race (the 223ers), who think that everything is evidence for 2+2=3 as his example? Because we need the same person (or observer, is there a technical term for that thing-doing-the-assesing?) to change their mind. I assert that if memory can’t be trusted, it won’t count as the “same” to apply Bayes theorom straightforwardly.
You could consider a proposition to be infinite evidence for itself, I guess. That seems like maybe a kinda defensible interpretation of P(A|A) = 1. I don’t think it gets you anything useful, though.
Define “you” and “ever”. I argue that the “you” who changes there mind tomorrow is not the same observer that decides with 100% probability today, because the one today has information that the one tomorrow doesn’t; namely, actual brain ops, versus memories for tomorrow you.
[∃ B: P(A|B) ∈ (0,1)] → [P(A) ∈ (0,1)]. Better?
If, having made them, your own probability assessments are meaningless and unusable, who cares what values you assign? Set P(A) = 328+5i and P(B) = octopus, for all it matters.
Additionally, I’m not sure it matters when the mind-changing actually occurs. At the instant of assignment, your mind as it is right that moment should already have a value for P(A|B) - how you would counterfactually update on the evidence is already a fact about you. If you would, counterfactually assuming your current mind operates without interference until it can see and process that evidence, update to some credence other than 1, it is already at that moment incorrect to assign a credence of 1. Whether that chain of events does in fact end up happening won’t retroactively make you right or wrong; it was already the right or wrong choice when you made it.
Or, if you get mind-hacked, your choice might be totally moot. But this is generally a poor excuse to deliberately make bad choices.
Yes, it makes it clearer what you’re doing wrong. I’ll do what I should have done earlier, and formalize my argument:
Let’s call “2+2=4” A, “2+2=3“ B, “I can visualize 2+2=4” C, “I can visualize 2+2=3” D, “I can remember visualizing 2+2=4” E, “I can remember visualizing 2+2=3″ F.
So, my claim is that P(A|C) is 1, likewise P(B|D). (Remember, I don’t think it’s like this in real life, I’m trying to show that the argument put forward to prove that is not sufficient.)
What is the Bayes formula for tomorrow’s assessment?
Not, P(A|C,D), which (if <1) would indeed disprove P(A|C)=1.
But, instead, P(A|E,D). This can be less than 1 while P(A|C)=1. I’ll just make up some arbitrary numbers as priors to show that.
I’m assuming A and B are mutually exclusive, as are C and D.
P(A|C,D) is undefined, because C and D are mutually exclusive (which corresponds to not being able to
visualize both 2+2=3 and 2+2=4 at the same time)
P(F,D)=P(D)*.95=0.11875
P(A|E,D)= P(E,D|A)P(A)/P(E,D)=0 (Because D|A is zero).
Using my numbers, you need to derive a mathematical contradiction if there are, truly “technical reasons” for this being impossible.
The mistake you (and EY) are making is that you’re not comparing P(A) to P(A|B) for some A,B, but P(A|B) to P(A|C) for some A,B,C.
Added: I made two minor errors in definitions that have been corrected. E and F are not exclusive, and C and D shouldn’t be defined as “current”, but rather as having happened, which can only be confirmed definately if they are current. However, they have the evidential power whenever they happened, it’s just if they didn’t happen now, they’re devalued because of fragile memory.
Added: Fixed numerical error and F where it was supposed to be E. (And errors with evaluating E and F. I really should not have assumed any values that I could have calculated from values I already assumed. I have less degrees of freedom than I thought.)
Well huh. I suppose I ought to concede that point.
There are probabilities of 0 and (implicitly) 1 in the problem setup. I’m not confident it’s valid to start with that; I worry it just pushes the problem back a step. But clearly, it is at least possible for probabilities of 1 to propagate to other propositions which did not start at 1. I’ll have to think about it for a while.
I’m assuming A and B are mutually exclusive, as are C and D, and E and F.
While A and B being mutually exclusive seems reasonable, I don’t think it holds for C and D. And I’m pretty sure that it doesn’t hold at all for E and F.
If I remember visualising 2+2=3 yesterday and 2+2=4 the day before, then E and F are both simultaneously true.
P(A)=.75
P(C)=.75
P(C|A)=.50
These three statements, taken together, are impossible. Consider:
Over the 0.75 probability space where C is true (second statement), A is only true in half that space (third statement). Thus, A is false in the other half of that space; therefore, there is a probability space of at least 0.375 in which A is false. Yet A is only false over a probability space of size 0.25 (first statement).
In your calculations further down, you use the value P(C) = (.75.50+.250) = 0.375; using that value for P(C) instead of 0.75 removes the contradiction.
Similarly, the following set of statements lead to a contradiction, considered together:
The first and third comments are correct. I made some errors in first typing it up that shouldn’t take away from the argument that are now fixed. The third comment is an actual mistake that has also been fixed.
Over the 0.75 probability space where C is true (second statement), A is only true in half that space (third statement).
This is wrong. P(C|A) is read as C given A, which is the chance of C, given that A is true. You’re mixing it up with P(A|C). However, if you switch A and C in your paragraph, it becomes a valid critique, which I’ve fixed, substituting the correct values in. Thanks. (Did I mess anything else up?)
I’m starting to appreciate mathematicians now :)
You need to escape your * symbols so they output correctly.
You’re mixing it up with P(A|C). However, if you switch A and C in your paragraph, it becomes a valid critique, which I’ve fixed, substituting the correct values in. Thanks.
You’re right, I had that backwards.
(Did I mess anything else up?)
Hmmm....
P(F)=.20
P(F)= P(D)*.95+P(C)*.001=0.119125
You have two different values for P(F). Similarly, the value P(E)=0.70 does not match up with P(C), P(D) and the following:
P(memory of X | X happened yesterday)=.95
P(memory of X | X didn’t happen yesterday)=.001
None of which is going to affect your point, which seems to come down to the claim that there exist possible events A, B, C, D, E and F such that P(A|C) = 1.
By the way, separate from our conversation downthread, I don’t think that is the technical reason being referred to. Or at least, it’s a rather indirect way of proving that point.
Bayes’ Theorem is P(A|B) = P(B|A)P(A)/P(B).
If P(A) = 0, then P(A|B) = P(B|A)*0/P(B) = 0 as well, no matter what P(B|A) and P(B) are. Or in words: if you start with credence exactly zero in some proposition, it is impossible for any piece of evidence to make you update away from that. By the contrapositive, if it is not impossible for you to update away from your original opinion (“change your mind”), your credence is nonzero.
A similar argument holds for probability 1, which should be unsurprising, since P(A) = 1 is equivalent to P(~A) = 0.
The problem with this argument is that it assumes that evidence is not altered. What I mean is that Bayesian updating implicitly assumes that all evidence previously used is included in the new calculation, and the new evidence is a strict superset of the old one. However, suppose I hypothetically assign 100% to any math fact “simple enough” that I can verify it mentally in under a minute (to choose an arbitrary time). So today, when I’m visualizing 2+2=4, I can say that I put a 100% confidence on the claim “2+2=4″.
Now, is this contradicted by the fact that tomorrow I will see new evidence, causing me to conclude that 2+2=3? No. Aside from seeing new evidence later, my current evidence is being changed. Right now, the evidence consists of actual brain operations that visualize 2+2. Tomorrow, that evidence is in the form of memories of brain operations. If I live in a possible world where only memories can be edited and not actual running brain processes, then tomorrow I will conclude that today’s memories were faked. That is not something I can conclude today, because I can repeat the visualization at any time. (One minute after, I might be relying on memories, but at the time, I’m not.)
That isn’t a valid operation.
For one: assigning 100% confidence in your ability to correctly do something on which you do not have, historically, a 100% track record is quite unwise. Probably you aren’t even 1-10^-6 reliable, historically, and that would still be infinitely far short of 100%. But it’s a toy hypothetical, so realism isn’t the primary objection.
More importantly, we don’t get to arbitrarily assign probabilities.
Bayes’ Theorem, as the name implies, is a theorem. It does not assume anything about evidence; it doesn’t even mention evidence. It talks strictly about probabilities. All this “evidence” stuff is high-level natural-language abstraction about what the probabilities “mean”—the math itself is a reduction of the concept of evidence. It only assumes some axioms of probability; you may attempt to dispute those if you like, but that would be a very different conversation.
And, because Bayes’ Theorem is a theorem, assigning 100% confidence to any proposition of which you could in principle ever cease to have 100% confidence is strictly, provably an error.
The special case of reasoning while unable to trust your own sanity requires lots of conditions that are usually negligible. For example, P(X happened | I remember that X happened) is usually pretty close to 1; for most purposes we can ignore it and pretend “X happened” and “I remember X happened” are the same thing. But if you suspect your memories have been altered, this is no longer true, so you’ll have that extra factor in certain calculations.
Nothing that you are describing is outside the domain of the relevant math. It’s just weird corner cases.
Why can’t “X happened” be infinite evidence for X, while “I remember that X happened” only finite?
Bayes theorem applies, but it’s not being applied accurately, because of these special cases.
Define “you” and “ever”. I argue that the “you” who changes there mind tomorrow is not the same observer that decides with 100% probability today, because the one today has information that the one tomorrow doesn’t; namely, actual brain ops, versus memories for tomorrow you.
I could in principle be convinced that my 100% assesment is wrong: by removing or editing the evidence. That is not Bayesian updating, it’s brain editing, and then a Bayesian update on other evidence.
You’re equating today me with tomorrow me, and you can’t do that unless all my current evidence will still be there tomorrow.
Why didn’t EY use an example of a hypothetical other race (the 223ers), who think that everything is evidence for 2+2=3 as his example? Because we need the same person (or observer, is there a technical term for that thing-doing-the-assesing?) to change their mind. I assert that if memory can’t be trusted, it won’t count as the “same” to apply Bayes theorom straightforwardly.
You could consider a proposition to be infinite evidence for itself, I guess. That seems like maybe a kinda defensible interpretation of P(A|A) = 1. I don’t think it gets you anything useful, though.
[∃ B: P(A|B) ∈ (0,1)] → [P(A) ∈ (0,1)]. Better?
If, having made them, your own probability assessments are meaningless and unusable, who cares what values you assign? Set P(A) = 328+5i and P(B) = octopus, for all it matters.
Additionally, I’m not sure it matters when the mind-changing actually occurs. At the instant of assignment, your mind as it is right that moment should already have a value for P(A|B) - how you would counterfactually update on the evidence is already a fact about you. If you would, counterfactually assuming your current mind operates without interference until it can see and process that evidence, update to some credence other than 1, it is already at that moment incorrect to assign a credence of 1. Whether that chain of events does in fact end up happening won’t retroactively make you right or wrong; it was already the right or wrong choice when you made it.
Or, if you get mind-hacked, your choice might be totally moot. But this is generally a poor excuse to deliberately make bad choices.
Yes, it makes it clearer what you’re doing wrong. I’ll do what I should have done earlier, and formalize my argument:
Let’s call “2+2=4” A, “2+2=3“ B, “I can visualize 2+2=4” C, “I can visualize 2+2=3” D, “I can remember visualizing 2+2=4” E, “I can remember visualizing 2+2=3″ F.
So, my claim is that P(A|C) is 1, likewise P(B|D). (Remember, I don’t think it’s like this in real life, I’m trying to show that the argument put forward to prove that is not sufficient.)
What is the Bayes formula for tomorrow’s assessment?
Not, P(A|C,D), which (if <1) would indeed disprove P(A|C)=1.
But, instead, P(A|E,D). This can be less than 1 while P(A|C)=1. I’ll just make up some arbitrary numbers as priors to show that.
I’m assuming A and B are mutually exclusive, as are C and D.
P(A)=.75
P(B)=.25 (just assume that it’s either 2 or 3)
P(C)=.375
P(D)=.125
P(memory of X | X happened yesterday)=.95
P(memory of X | X didn’t happen yesterday)=.001
P(E)=P(C)*.95+P(~C)*.001=0.356875
P(F)= P(D)*.95+P(~D)*.001=0.119625
P(C|A)=.50
P(C|B)=0
P(D|A)=0
P(D|B)=.50
P(A|C) = P(C|A)P(A)/P(C)=(.50*.75)/(.75*.50+.25*0)=1
P(A|C,D) is undefined, because C and D are mutually exclusive (which corresponds to not being able to visualize both 2+2=3 and 2+2=4 at the same time)
P(F,D)=P(D)*.95=0.11875
P(A|E,D)= P(E,D|A)P(A)/P(E,D)=0 (Because D|A is zero).
Using my numbers, you need to derive a mathematical contradiction if there are, truly “technical reasons” for this being impossible.
The mistake you (and EY) are making is that you’re not comparing P(A) to P(A|B) for some A,B, but P(A|B) to P(A|C) for some A,B,C.
Added: I made two minor errors in definitions that have been corrected. E and F are not exclusive, and C and D shouldn’t be defined as “current”, but rather as having happened, which can only be confirmed definately if they are current. However, they have the evidential power whenever they happened, it’s just if they didn’t happen now, they’re devalued because of fragile memory.
Added: Fixed numerical error and F where it was supposed to be E. (And errors with evaluating E and F. I really should not have assumed any values that I could have calculated from values I already assumed. I have less degrees of freedom than I thought.)
blink
Well huh. I suppose I ought to concede that point.
There are probabilities of 0 and (implicitly) 1 in the problem setup. I’m not confident it’s valid to start with that; I worry it just pushes the problem back a step. But clearly, it is at least possible for probabilities of 1 to propagate to other propositions which did not start at 1. I’ll have to think about it for a while.
While A and B being mutually exclusive seems reasonable, I don’t think it holds for C and D. And I’m pretty sure that it doesn’t hold at all for E and F.
If I remember visualising 2+2=3 yesterday and 2+2=4 the day before, then E and F are both simultaneously true.
These three statements, taken together, are impossible. Consider:
Over the 0.75 probability space where C is true (second statement), A is only true in half that space (third statement). Thus, A is false in the other half of that space; therefore, there is a probability space of at least 0.375 in which A is false. Yet A is only false over a probability space of size 0.25 (first statement).
In your calculations further down, you use the value P(C) = (.75.50+.250) = 0.375; using that value for P(C) instead of 0.75 removes the contradiction.
Similarly, the following set of statements lead to a contradiction, considered together:
The first and third comments are correct. I made some errors in first typing it up that shouldn’t take away from the argument that are now fixed. The third comment is an actual mistake that has also been fixed.
This is wrong. P(C|A) is read as C given A, which is the chance of C, given that A is true. You’re mixing it up with P(A|C). However, if you switch A and C in your paragraph, it becomes a valid critique, which I’ve fixed, substituting the correct values in. Thanks. (Did I mess anything else up?)
I’m starting to appreciate mathematicians now :)
You need to escape your * symbols so they output correctly.
You’re right, I had that backwards.
Hmmm....
You have two different values for P(F). Similarly, the value P(E)=0.70 does not match up with P(C), P(D) and the following:
None of which is going to affect your point, which seems to come down to the claim that there exist possible events A, B, C, D, E and F such that P(A|C) = 1.