The problem you’re discussing is not Newcomb’s problem; it’s a different problem that you’ve decided to apply the same name to.
It is a crucial part of the setup of Newcomb’s problem that the agent is presented with significant evidence about the nature of the problem. This applies to AIXI as well; at the beginning of the problem AIXI needs to be presented with observations that give it very strong evidence about Omega and about the nature of the problem setup. From Wikipedia: ”By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Predictor is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Predictor’s prediction, and knowledge of the Predictor’s infallibility. The only information withheld from the player is what prediction the Predictor made, and thus what the contents of box B are.”
It seems totally unreasonable to withhold information from AIXI that would be given to any other agent facing the Newcomb’s problem scenario.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
In this scenario, the AIXI internal program ensemble concentrates its probability mass on programs which associate each pair of one English specification and one action to a predicted reward. Given the English specification, AIXI computes the expected reward for each action and outputs the action that maximizes the expected reward.
Note that in principle this can implement any computable decision theory. Which one it would choose depend on the agent history and the intrinsic bias of its UTM. It can be CDT, EDT, UDT, or, more likely, some approximation of them that worked well for the agent so far.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
I don’t think someone posing Newcomb’s problem would be particularly interested in excuses like “but what if the agent only speaks French!?”
Obviously as part of the setup of Newcomb’s problem AIXI has to be provided with an epistemic background that is comparable to that of its intended target audience.
This means it doesn’t just have to be familiar with English, it has to be familiar with the real world, because Newcomb’s problem takes place in the context of the real world (or something very much like it).
I think you’re confusing two different scenarios:
Someone training an AIXI agent to output problem solutions given problem specifications as inputs.
Someone actually physically putting an AIXI agent into the scenario stipulated by Newcomb’s problem.
The second one is Newcomb’s problem; the first is the “what is the optimal strategy for Newcomb’s problem?” problem.
It’s the second one I’m arguing about in this thread, and it’s the second one that people have in mind when they bring up Newcomb’s problem.
Then AIXI ensemble will be dominated by programs which associate “real world” percepts and actions to predicted rewards.
The point is that there is no way, short of actually running the (physically impossible) experiment, that we can tell whether the behavior of this AIXI agent will be consistent with CDT, EDT, or something else entirely.
Would it be a valid instructional technique to give someone (particularly someone congenitally incapable of learning any other way) the opportunity to try out a few iterations of the ‘game’ Omega is offering, with clearly denominated but strategically worthless play money in place of the actual rewards?
The main issue with that is that Newcomb’s problem is predicated on the assumption that you prefer getting a million dollars to getting a thousand dollars. For the play money iterations, that assumption would not hold.
The second issue with iterating Newcomb’s more generally is that it gives the agent an opportunity to precommit to one-boxing. The problem is more interesting and more difficult if you face it without having had that opportunity.
For the play money iterations, that assumption would not hold.
Why not? People can get pretty competitive even when there’s nothing really at stake, and current-iteration play money is a proxy for future-iteration real money.
The problem you’re discussing is not Newcomb’s problem; it’s a different problem that you’ve decided to apply the same name to.
It is a crucial part of the setup of Newcomb’s problem that the agent is presented with significant evidence about the nature of the problem. This applies to AIXI as well; at the beginning of the problem AIXI needs to be presented with observations that give it very strong evidence about Omega and about the nature of the problem setup. From Wikipedia:
”By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Predictor is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Predictor’s prediction, and knowledge of the Predictor’s infallibility. The only information withheld from the player is what prediction the Predictor made, and thus what the contents of box B are.”
It seems totally unreasonable to withhold information from AIXI that would be given to any other agent facing the Newcomb’s problem scenario.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
In this scenario, the AIXI internal program ensemble concentrates its probability mass on programs which associate each pair of one English specification and one action to a predicted reward. Given the English specification, AIXI computes the expected reward for each action and outputs the action that maximizes the expected reward.
Note that in principle this can implement any computable decision theory. Which one it would choose depend on the agent history and the intrinsic bias of its UTM.
It can be CDT, EDT, UDT, or, more likely, some approximation of them that worked well for the agent so far.
I don’t think someone posing Newcomb’s problem would be particularly interested in excuses like “but what if the agent only speaks French!?” Obviously as part of the setup of Newcomb’s problem AIXI has to be provided with an epistemic background that is comparable to that of its intended target audience. This means it doesn’t just have to be familiar with English, it has to be familiar with the real world, because Newcomb’s problem takes place in the context of the real world (or something very much like it).
I think you’re confusing two different scenarios:
Someone training an AIXI agent to output problem solutions given problem specifications as inputs.
Someone actually physically putting an AIXI agent into the scenario stipulated by Newcomb’s problem.
The second one is Newcomb’s problem; the first is the “what is the optimal strategy for Newcomb’s problem?” problem.
It’s the second one I’m arguing about in this thread, and it’s the second one that people have in mind when they bring up Newcomb’s problem.
Then AIXI ensemble will be dominated by programs which associate “real world” percepts and actions to predicted rewards.
The point is that there is no way, short of actually running the (physically impossible) experiment, that we can tell whether the behavior of this AIXI agent will be consistent with CDT, EDT, or something else entirely.
Would it be a valid instructional technique to give someone (particularly someone congenitally incapable of learning any other way) the opportunity to try out a few iterations of the ‘game’ Omega is offering, with clearly denominated but strategically worthless play money in place of the actual rewards?
The main issue with that is that Newcomb’s problem is predicated on the assumption that you prefer getting a million dollars to getting a thousand dollars. For the play money iterations, that assumption would not hold.
The second issue with iterating Newcomb’s more generally is that it gives the agent an opportunity to precommit to one-boxing. The problem is more interesting and more difficult if you face it without having had that opportunity.
Why not? People can get pretty competitive even when there’s nothing really at stake, and current-iteration play money is a proxy for future-iteration real money.