Interesting, but I disagree. I fully agree that the problem is ambiguous in that it doesn’t define what the actual proposition is. I think different assumptions can lead to saying 1⁄3 or 1⁄2, but with deconstruction can be shown to always be 1⁄2. I don’t think anything in between is reasonable, and I don’t think any information is gained by waking up (which has a prior of 1.0, so no surprise value).
Probability is in the map, not the territory. It matters a lot what is actually being predicted, which is what the “betting” approach is trying to get at. If this is “tails->you will make two bets, heads->you will make one bet”, then the correct approach is to assign 1⁄2 probability but 1⁄3 betting odds. If this is “you will be asked once or twice, but the bet only resolved once”, then 1⁄2 is the only reasonable answer.
Amount of time spent awake is irrelevant to any reasonable proposition (proposition=prediction of future experience) that you might be talking about when you say “probability that the coin is heads”.
My intuition rebels against these conclusions too, but if the analysis is wrong, then wherespecifically is the error? Can you point to some place where the math is wrong? Can you point to an error in the modeling and suggest a better alternative? I myself have tried to disprove this result, and failed.
The whole calculation is based on the premise that Neal’s concept of “full non-indexical conditioning” is a reasonable way to do probability theory. Usually you do probability theory on what you are calling “centered propositions”, and you interpret each data point you receive as the proposition “I have received this data”. Not as “There exists a version of me which has received this data as well as all of the prior data I have received”. It seems really odd to do the latter, and I think more motivation is needed for it. (To be fair, I don’t have a better alternative in mind.)
It seems really odd to do the latter, and I think more motivation is needed for it.
This old post of mine may help. The short version is that if you do probability with “centered propositions” then the resulting probabilities can’t be used in expected utility maximization.
(To be fair, I don’t have a better alternative in mind.)
I think the logical next step from Neal’s concept of “full non-indexical conditioning” (where updating on one’s experiences means taking all possible worlds, assigning 0 probability to those not containing “a version of me which has received this data as well as all of the prior data I have received”, then renormalizing sum of the rest to 1) is to not update, in other words, use UDT. The motivation here is that from a decision making perspective, the assigning 0 / renormalizing step either does nothing (if your decision has no consequences in the worlds that you’d assign 0 probability to) or is actively bad (if your decision does have consequences in those possible worlds, due to logical correlation between you and something/someone in one of those worlds). (UDT also has a bunch of other motivations if this one seems insufficient by itself.)
Yeah, but the OP was motivated by an intuition that probability theory is logically prior to and independent of decision theory. I don’t really have an opinion on whether that is right or not but I was trying to answer the post on its own terms. The lack of a good purely-probability-theory analysis might be a point in favor of taking a measure non-realist point of view though.
To make clear the difference between your view and ksvanhorn’s, I should point out that in his view if Sleeping Beauty is an AI that’s just woken up on Monday/Tuesday but not yet received any sensory input, then the probabilities are still 1⁄2; it is only after receiving some sensory input which is in fact different on the two days (even if it doesn’t allow the AI to determine what day it is) that the probabilities become 1⁄3. Whereas for decision-theoretic purposes you want the probability to be 1⁄3 as soon as the AI wakes up on Monday/Tuesday.
for decision-theoretic purposes you want the probability to be 1⁄3 as soon as the AI wakes up on Monday/Tuesday.
That is based on a flawed decision analysis that fails to account for the fact that Beauty will make the same choice, with the same outcome, on both Monday and Tuesday (it treats the outcomes on those two days as independent).
So you want to use FDT, not CDT. But if the additional data of which direction the fly is going isn’t used in the decision-theoretic computation, then Beauty will make the same choice on both days regardless of whether she has seen the fly’s direction or not. So according to this analysis the probability still needs to be 1⁄2 after she has seen the fly.
1. Non-indexical conditioning is not “a way to do probability theory”; it is just a policy of not throwing out any data, even data that appears irrelevant.
2. No, you do not usually do probability theory on centered propositions such as “today is Monday”, as they are not legitimate propositions in classical logic. The propositions of classical logic are timeless—they are true, or they are false, but they do not change from one to the other.
3. Nowhere in the analysis do I treat a data point as “there exists a version of me which has received this data...”; the concept of “a version of me” does not even appear in the discussion. If you are quibbling over the fact that Pdt is only the stream of perceptions Beauty remembers experiencing as of time t, instead of being the entire stream of perceptions up to time t, then you can suppose that Beauty has perfect memory. This simplifies things—we can now let Pd simply be the entire sequence of perceptions Beauty experiences over the course of the day, and define R(y,d) to mean ”y is the first n elements of Pd, for some n“—but it does not alter the analysis.
Nowhere in the analysis do I treat a data point as “there exists a version of me which has received this data...”;
This confuses me. Dacyn’s “There exists a version of me which has received this data as well as all of the prior data I have received” seems equivalent to Neal’s “I will here consider what happens if you ignore such indexical information, conditioning only on the fact that someone in the universe with your memories exists. I refer to this procedure as “Full Non-indexical Conditioning” (FNC).” (Section 2.3 of Neal2007)
Do you think Dacyn is saying something different from Neal? Or that you are saying something different from both Dacyn and Neal? Or something else?
None of this is about “versions of me”; it’s about identifying what information you actually have and using that to make inferences. If the FNIC approach is wrong, then tell me what how Beauty’s actual state of information differs from what is used in the analysis; don’t just say, “it seems really odd.”
I responded to #2 below, and #1 seems to be just a restatement of your other points, so I’ll respond to #3 here. You seem to be taking what I wrote a little too literally. It looks like you want the proposition Sleeping Beauty conditions on to be “on some day, Sleeping Beauty has received / is receiving / will receive the data X”, where X is the data that she has just received. (If this is not what you think she should condition on, then I think you should try to write the proposition you think she should condition on, using English and not mathematical symbols.) This proposition doesn’t have any reference to “a version of me”, but it seems to me to be morally the same as what I wrote (and in particular, I still think that it is really odd to say that that it is the proposition she should condition on, and that more motivation is needed for it).
It’s a useless and misleading modeling choice to condition on irrelevant data, and even worse to condition on the assumption the unstated irrelevant data is actually relevant enough to change the outcome. That’s not what “irrelevant” means, and the argument that humans are bad at knowing what’s relevant does _NOT_ imply that all data is equally relevant, and even less does it imply that the unknown irrelevant data has precisely X relevance.
Wei is correct that UDT is a reasonable approach that sidesteps the necessity to identify a “centered” proposition (though I’d argue that it picks Sunday knowledge as the center). But I think it’s _also_ solvable by traditional means just be being clear what proposition about what prediction is being assigned/calculated a probability.
It’s a useless and misleading modeling choice to condition on irrelevant data
Strictly speaking, you should always condition on all data you have available. Calling some data D irrelevant is just a shorthand for saying that conditioning on it changes nothing, i.e., Pr(A∣D,X)=Pr(A∣X) . If you can show that conditioning on Ddoes change the probability of interest—as my calculation did in fact show—then this means that D is in fact relevant information, regardless of what your intuition suggests.
even worse to condition on the assumption the unstated irrelevant data is actually relevant enough to change the outcome.
There was no such assumption. I simply did the calculation, and thereby demonstrated that certain data believed to be irrelevant was actually relevant.
Interesting, but I disagree. I fully agree that the problem is ambiguous in that it doesn’t define what the actual proposition is. I think different assumptions can lead to saying 1⁄3 or 1⁄2, but with deconstruction can be shown to always be 1⁄2. I don’t think anything in between is reasonable, and I don’t think any information is gained by waking up (which has a prior of 1.0, so no surprise value).
Probability is in the map, not the territory. It matters a lot what is actually being predicted, which is what the “betting” approach is trying to get at. If this is “tails->you will make two bets, heads->you will make one bet”, then the correct approach is to assign 1⁄2 probability but 1⁄3 betting odds. If this is “you will be asked once or twice, but the bet only resolved once”, then 1⁄2 is the only reasonable answer.
Amount of time spent awake is irrelevant to any reasonable proposition (proposition=prediction of future experience) that you might be talking about when you say “probability that the coin is heads”.
My intuition rebels against these conclusions too, but if the analysis is wrong, then where specifically is the error? Can you point to some place where the math is wrong? Can you point to an error in the modeling and suggest a better alternative? I myself have tried to disprove this result, and failed.
The whole calculation is based on the premise that Neal’s concept of “full non-indexical conditioning” is a reasonable way to do probability theory. Usually you do probability theory on what you are calling “centered propositions”, and you interpret each data point you receive as the proposition “I have received this data”. Not as “There exists a version of me which has received this data as well as all of the prior data I have received”. It seems really odd to do the latter, and I think more motivation is needed for it. (To be fair, I don’t have a better alternative in mind.)
This old post of mine may help. The short version is that if you do probability with “centered propositions” then the resulting probabilities can’t be used in expected utility maximization.
I think the logical next step from Neal’s concept of “full non-indexical conditioning” (where updating on one’s experiences means taking all possible worlds, assigning 0 probability to those not containing “a version of me which has received this data as well as all of the prior data I have received”, then renormalizing sum of the rest to 1) is to not update, in other words, use UDT. The motivation here is that from a decision making perspective, the assigning 0 / renormalizing step either does nothing (if your decision has no consequences in the worlds that you’d assign 0 probability to) or is actively bad (if your decision does have consequences in those possible worlds, due to logical correlation between you and something/someone in one of those worlds). (UDT also has a bunch of other motivations if this one seems insufficient by itself.)
Yeah, but the OP was motivated by an intuition that probability theory is logically prior to and independent of decision theory. I don’t really have an opinion on whether that is right or not but I was trying to answer the post on its own terms. The lack of a good purely-probability-theory analysis might be a point in favor of taking a measure non-realist point of view though.
To make clear the difference between your view and ksvanhorn’s, I should point out that in his view if Sleeping Beauty is an AI that’s just woken up on Monday/Tuesday but not yet received any sensory input, then the probabilities are still 1⁄2; it is only after receiving some sensory input which is in fact different on the two days (even if it doesn’t allow the AI to determine what day it is) that the probabilities become 1⁄3. Whereas for decision-theoretic purposes you want the probability to be 1⁄3 as soon as the AI wakes up on Monday/Tuesday.
That is based on a flawed decision analysis that fails to account for the fact that Beauty will make the same choice, with the same outcome, on both Monday and Tuesday (it treats the outcomes on those two days as independent).
So you want to use FDT, not CDT. But if the additional data of which direction the fly is going isn’t used in the decision-theoretic computation, then Beauty will make the same choice on both days regardless of whether she has seen the fly’s direction or not. So according to this analysis the probability still needs to be 1⁄2 after she has seen the fly.
There are several misconceptions here:
1. Non-indexical conditioning is not “a way to do probability theory”; it is just a policy of not throwing out any data, even data that appears irrelevant.
2. No, you do not usually do probability theory on centered propositions such as “today is Monday”, as they are not legitimate propositions in classical logic. The propositions of classical logic are timeless—they are true, or they are false, but they do not change from one to the other.
3. Nowhere in the analysis do I treat a data point as “there exists a version of me which has received this data...”; the concept of “a version of me” does not even appear in the discussion. If you are quibbling over the fact that Pdt is only the stream of perceptions Beauty remembers experiencing as of time t, instead of being the entire stream of perceptions up to time t, then you can suppose that Beauty has perfect memory. This simplifies things—we can now let Pd simply be the entire sequence of perceptions Beauty experiences over the course of the day, and define R(y,d) to mean ”y is the first n elements of Pd, for some n“—but it does not alter the analysis.
This confuses me. Dacyn’s “There exists a version of me which has received this data as well as all of the prior data I have received” seems equivalent to Neal’s “I will here consider what happens if you ignore such indexical information, conditioning only on the fact that someone in the universe with your memories exists. I refer to this procedure as “Full Non-indexical Conditioning” (FNC).” (Section 2.3 of Neal2007)
Do you think Dacyn is saying something different from Neal? Or that you are saying something different from both Dacyn and Neal? Or something else?
None of this is about “versions of me”; it’s about identifying what information you actually have and using that to make inferences. If the FNIC approach is wrong, then tell me what how Beauty’s actual state of information differs from what is used in the analysis; don’t just say, “it seems really odd.”
I responded to #2 below, and #1 seems to be just a restatement of your other points, so I’ll respond to #3 here. You seem to be taking what I wrote a little too literally. It looks like you want the proposition Sleeping Beauty conditions on to be “on some day, Sleeping Beauty has received / is receiving / will receive the data X”, where X is the data that she has just received. (If this is not what you think she should condition on, then I think you should try to write the proposition you think she should condition on, using English and not mathematical symbols.) This proposition doesn’t have any reference to “a version of me”, but it seems to me to be morally the same as what I wrote (and in particular, I still think that it is really odd to say that that it is the proposition she should condition on, and that more motivation is needed for it).
It’s a useless and misleading modeling choice to condition on irrelevant data, and even worse to condition on the assumption the unstated irrelevant data is actually relevant enough to change the outcome. That’s not what “irrelevant” means, and the argument that humans are bad at knowing what’s relevant does _NOT_ imply that all data is equally relevant, and even less does it imply that the unknown irrelevant data has precisely X relevance.
Wei is correct that UDT is a reasonable approach that sidesteps the necessity to identify a “centered” proposition (though I’d argue that it picks Sunday knowledge as the center). But I think it’s _also_ solvable by traditional means just be being clear what proposition about what prediction is being assigned/calculated a probability.
Strictly speaking, you should always condition on all data you have available. Calling some data D irrelevant is just a shorthand for saying that conditioning on it changes nothing, i.e., Pr(A∣D,X)=Pr(A∣X) . If you can show that conditioning on D does change the probability of interest—as my calculation did in fact show—then this means that D is in fact relevant information, regardless of what your intuition suggests.
There was no such assumption. I simply did the calculation, and thereby demonstrated that certain data believed to be irrelevant was actually relevant.