It doesn’t implement the counterfactual where depending on what response the agent assumes to give on observing a request to pay, it can agent-consistently conclude that Omega will either award or not award $1000. Even if we don’t require that Omega is a decision-theoretic agent with known architecture, the decision problem must make the intended sense.
In more detail. Agent’s decision is a strategy that specifies, for each possible observation (we have two: Omega rewards it, or Omega asks for money), a response. If Omega gives a reward, there is no response, and if it asks for money, there are two responses. So overall, we have two strategies to consider. The agent should be able to contemplate the consequences of adopting each of these strategies, without running into inconsistencies (observation is an external parameter, so even if in a given environment, there is no agent-with-that-observation, decision algorithm can still specify a response to that observation, it would just completely fail to control the outcome). Now, take your Omega implementation, and consider the strategy of not paying from agent’s perspective. What would the agent conclude about expected utility? By problem specification, it should (in the external sense, that is not necessarily according to its own decision theory, if that decision theory happens to fail this particular thought experiment) conclude that Omega doesn’t give it an award. But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
I think it would be fair to say that cousin_it’s (ha! Take that English grammar!) description of Omega’s behaviour does fit the problem specification we have given but certainly doesn’t match the problem we intended. That leaves us to fix the wording without making it look too obfuscated.
Taking another look at the actual problem specification it actually doesn’t look all that bad. The translation into logical propositions didn’t really do it justice. We have...
He will award you $1000 if he predicts you would pay him if he asked.
cousin_it allows “if” to resolve to “iif”, but translates “The player would pay if asked” into A → B; !B therefore ‘whatever’. Which is not quite what we mean when we use the phrase in English. We are trying to refer to the predicted outcome in a “possibly counterfactual but possibly real” reality.
Can you think of a way to say what we mean without any ambiguity and without changing the problem itself too much?
I believe you haven’t yet realized the extent of the damage :-)
It’s very unclear to me what it means for Omega to “implement the counterfactual” in situations where it gives the agent information about which way the counterfactual came out. After all, the agent knows its own source code A and Omega’s source code O. What sense does it make to inquire about the agent’s actions in the “possible world” where it’s passed a value of O(A) different from its true value? That “possible world” is logically inconsistent! And unlike the situation where the agent is reasoning about its own actions, in our case the inconsistency is actually exploitable. If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
The alternative is to let the agent tacitly assume that it does not necessarily receive the true value of O(A), i.e. that the causality has been surgically tweaked at some point—so the agent ought to respond to any values of O(A) mechanically by using a “strategy”, while taking care not to think too much about where they came from and what they mean. But: a) this doesn’t seem to accord with the spirit of Bongo’s original problem, which explicitly asked “you’re told this statement about yourself, now what do you do?”; b) this idea is not present in UDT yet, and I guess you will have many unexpected problems making it work.
If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
By the way, this bears an interesting similarity to the question of how would you explain the event of your left arm being replaced by a blue tentacle. The answer that you wouldn’t is perfectly reasonable, since you don’t need to be able to adequately respond to that observation, you can self-improve in a way that has a side effect of making you crazy once you observe your left arm being transformed into a blue tentacle, and that wouldn’t matter, since this event is of sufficiently low measure and has sufficiently insignificant contribution to overall expected utility to not be worth worrying about.
So in our case, the question should be, is it desirable to not go crazy when presented with this observation and respond in some other way instead, perhaps to win the Omega Award? If so, how should you think about the situation?
If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
It’s not the correct way of interpreting observations, you shouldn’t let observations drive you crazy. Here, we have A’s action-definition that is given in factorized form: action=A(O(“A”)). Normally, you’d treat such decompositions as explicit dependence bias, and try substituting everything in before starting to reason about what would happen if. But if O(“A”) is an observation, then you’re not deciding action, that is A(O(“A”)). Instead, you’re deciding just A(-), an Observations → Actions map. So being told that you’ve observed “no award” doesn’t mean that you now know that O(“A”)=”no award”. It just means that you’re the subagent responsible for deciding a response to parameter “no award” in the strategy for A(-). You might also want to acausally coordinate with the subagent that is deciding the other part of that same strategy, a response to “award”.
And this all holds even if the agent knows what O(“A”) means, it would just be a bad idea to not include O(“A”) as part of the agent in that case, and so optimize the overall A(O(“A”)) instead of the smaller A(-).
At this point it seems we’re arguing over how to better formalize the original problem. The post asked what you should reply to Omega. Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega, and whose judgment of you came from the counterfactual void rather than from looking at you. I’m not sure this constitutes a fair translation. Some of the commenters here (e.g. prase) seem to intuitively lean toward my interpretation—I agree it’s not UDT-like, but think it might turn out useful.
At this point it seems we’re arguing over how to better formalize the original problem.
It’s more about making more explicit the question of what are observations, and what are boundaries of the agent (Which parts of the past lightcone are part of you? Just the cells in the brain? Why is that?), in deterministic decision problems. These were never explicitly considered before in the context of UDT. The problem statement states that something is “observation”, but we lack a technical counterpart of that notion. Your questions resulted from treating something that’s said to be an “observation” as epistemically relevant, writing knowledge about state of the territory which shouldn’t be logically transparent right into agent’s mind.
(Observations, possible worlds, etc. will very likely be the topic of my next post on ADT, once I resolve the mystery of observational knowledge to my satisfaction.)
Thanks, this looks like a fair summary (though a couple levels too abstract for my liking, as usual).
A note on epistemic relevance. Long ago, when we were just starting to discuss Newcomblike problems, the preamble usually went something like this: “Omega appears and somehow convinces you that it’s trustworthy”. So I’m supposed to listen to Omega’s words and somehow split them into an “epistemically relevant” part and an “observation” part, which should never mix? This sounds very shady. I hope we can disentangle this someday.
Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega.
Yes. If the agent doesn’t know what Omega actually says, this can be an important consideration (decisions are made by considering agent-provable properties of counterfactuals, all of which except the actual one are inconsistent, but not agent-inconsistent). If Omega’s decision is known (and not just observed), it just means that counterfactual-you’s response to counterfactual-Omega doesn’t control utility and could well be anything. But at this point I’m not sure in what sense anything can actually be logically known, and not in some sense just observed.
It doesn’t implement the counterfactual where depending on what response the agent assumes to give on observing a request to pay, it can agent-consistently conclude that Omega will either award or not award $1000. Even if we don’t require that Omega is a decision-theoretic agent with known architecture, the decision problem must make the intended sense.
In more detail. Agent’s decision is a strategy that specifies, for each possible observation (we have two: Omega rewards it, or Omega asks for money), a response. If Omega gives a reward, there is no response, and if it asks for money, there are two responses. So overall, we have two strategies to consider. The agent should be able to contemplate the consequences of adopting each of these strategies, without running into inconsistencies (observation is an external parameter, so even if in a given environment, there is no agent-with-that-observation, decision algorithm can still specify a response to that observation, it would just completely fail to control the outcome). Now, take your Omega implementation, and consider the strategy of not paying from agent’s perspective. What would the agent conclude about expected utility? By problem specification, it should (in the external sense, that is not necessarily according to its own decision theory, if that decision theory happens to fail this particular thought experiment) conclude that Omega doesn’t give it an award. But your Omega does knowably (agent-provably) give it an award, hence it doesn’t play the intended role, doesn’t implement the thought experiment.
I think it would be fair to say that cousin_it’s (ha! Take that English grammar!) description of Omega’s behaviour does fit the problem specification we have given but certainly doesn’t match the problem we intended. That leaves us to fix the wording without making it look too obfuscated.
Taking another look at the actual problem specification it actually doesn’t look all that bad. The translation into logical propositions didn’t really do it justice. We have...
cousin_it allows “if” to resolve to “iif”, but translates “The player would pay if asked” into A → B; !B therefore ‘whatever’. Which is not quite what we mean when we use the phrase in English. We are trying to refer to the predicted outcome in a “possibly counterfactual but possibly real” reality.
Can you think of a way to say what we mean without any ambiguity and without changing the problem itself too much?
I believe you haven’t yet realized the extent of the damage :-)
It’s very unclear to me what it means for Omega to “implement the counterfactual” in situations where it gives the agent information about which way the counterfactual came out. After all, the agent knows its own source code A and Omega’s source code O. What sense does it make to inquire about the agent’s actions in the “possible world” where it’s passed a value of O(A) different from its true value? That “possible world” is logically inconsistent! And unlike the situation where the agent is reasoning about its own actions, in our case the inconsistency is actually exploitable. If a counterfactual version of A is told outright that O(A)==1, and yet sees a provable way to make O(A)==2, how do you justify not going crazy?
The alternative is to let the agent tacitly assume that it does not necessarily receive the true value of O(A), i.e. that the causality has been surgically tweaked at some point—so the agent ought to respond to any values of O(A) mechanically by using a “strategy”, while taking care not to think too much about where they came from and what they mean. But: a) this doesn’t seem to accord with the spirit of Bongo’s original problem, which explicitly asked “you’re told this statement about yourself, now what do you do?”; b) this idea is not present in UDT yet, and I guess you will have many unexpected problems making it work.
By the way, this bears an interesting similarity to the question of how would you explain the event of your left arm being replaced by a blue tentacle. The answer that you wouldn’t is perfectly reasonable, since you don’t need to be able to adequately respond to that observation, you can self-improve in a way that has a side effect of making you crazy once you observe your left arm being transformed into a blue tentacle, and that wouldn’t matter, since this event is of sufficiently low measure and has sufficiently insignificant contribution to overall expected utility to not be worth worrying about.
So in our case, the question should be, is it desirable to not go crazy when presented with this observation and respond in some other way instead, perhaps to win the Omega Award? If so, how should you think about the situation?
It’s not the correct way of interpreting observations, you shouldn’t let observations drive you crazy. Here, we have A’s action-definition that is given in factorized form: action=A(O(“A”)). Normally, you’d treat such decompositions as explicit dependence bias, and try substituting everything in before starting to reason about what would happen if. But if O(“A”) is an observation, then you’re not deciding action, that is A(O(“A”)). Instead, you’re deciding just A(-), an Observations → Actions map. So being told that you’ve observed “no award” doesn’t mean that you now know that O(“A”)=”no award”. It just means that you’re the subagent responsible for deciding a response to parameter “no award” in the strategy for A(-). You might also want to acausally coordinate with the subagent that is deciding the other part of that same strategy, a response to “award”.
And this all holds even if the agent knows what O(“A”) means, it would just be a bad idea to not include O(“A”) as part of the agent in that case, and so optimize the overall A(O(“A”)) instead of the smaller A(-).
At this point it seems we’re arguing over how to better formalize the original problem. The post asked what you should reply to Omega. Your reformulation asks what counterfactual-you should reply to counterfactual-Omega that doesn’t even have to say the same thing as the original Omega, and whose judgment of you came from the counterfactual void rather than from looking at you. I’m not sure this constitutes a fair translation. Some of the commenters here (e.g. prase) seem to intuitively lean toward my interpretation—I agree it’s not UDT-like, but think it might turn out useful.
It’s more about making more explicit the question of what are observations, and what are boundaries of the agent (Which parts of the past lightcone are part of you? Just the cells in the brain? Why is that?), in deterministic decision problems. These were never explicitly considered before in the context of UDT. The problem statement states that something is “observation”, but we lack a technical counterpart of that notion. Your questions resulted from treating something that’s said to be an “observation” as epistemically relevant, writing knowledge about state of the territory which shouldn’t be logically transparent right into agent’s mind.
(Observations, possible worlds, etc. will very likely be the topic of my next post on ADT, once I resolve the mystery of observational knowledge to my satisfaction.)
Thanks, this looks like a fair summary (though a couple levels too abstract for my liking, as usual).
A note on epistemic relevance. Long ago, when we were just starting to discuss Newcomblike problems, the preamble usually went something like this: “Omega appears and somehow convinces you that it’s trustworthy”. So I’m supposed to listen to Omega’s words and somehow split them into an “epistemically relevant” part and an “observation” part, which should never mix? This sounds very shady. I hope we can disentangle this someday.
Yes. If the agent doesn’t know what Omega actually says, this can be an important consideration (decisions are made by considering agent-provable properties of counterfactuals, all of which except the actual one are inconsistent, but not agent-inconsistent). If Omega’s decision is known (and not just observed), it just means that counterfactual-you’s response to counterfactual-Omega doesn’t control utility and could well be anything. But at this point I’m not sure in what sense anything can actually be logically known, and not in some sense just observed.