That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
In this scenario, the AIXI internal program ensemble concentrates its probability mass on programs which associate each pair of one English specification and one action to a predicted reward. Given the English specification, AIXI computes the expected reward for each action and outputs the action that maximizes the expected reward.
Note that in principle this can implement any computable decision theory. Which one it would choose depend on the agent history and the intrinsic bias of its UTM. It can be CDT, EDT, UDT, or, more likely, some approximation of them that worked well for the agent so far.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
I don’t think someone posing Newcomb’s problem would be particularly interested in excuses like “but what if the agent only speaks French!?”
Obviously as part of the setup of Newcomb’s problem AIXI has to be provided with an epistemic background that is comparable to that of its intended target audience.
This means it doesn’t just have to be familiar with English, it has to be familiar with the real world, because Newcomb’s problem takes place in the context of the real world (or something very much like it).
I think you’re confusing two different scenarios:
Someone training an AIXI agent to output problem solutions given problem specifications as inputs.
Someone actually physically putting an AIXI agent into the scenario stipulated by Newcomb’s problem.
The second one is Newcomb’s problem; the first is the “what is the optimal strategy for Newcomb’s problem?” problem.
It’s the second one I’m arguing about in this thread, and it’s the second one that people have in mind when they bring up Newcomb’s problem.
Then AIXI ensemble will be dominated by programs which associate “real world” percepts and actions to predicted rewards.
The point is that there is no way, short of actually running the (physically impossible) experiment, that we can tell whether the behavior of this AIXI agent will be consistent with CDT, EDT, or something else entirely.
That would require the AIXI agent to have been pretrained to understand English (or some language as expressive as English) and have some experience at solving problems given a verbal explanation of the rules.
In this scenario, the AIXI internal program ensemble concentrates its probability mass on programs which associate each pair of one English specification and one action to a predicted reward. Given the English specification, AIXI computes the expected reward for each action and outputs the action that maximizes the expected reward.
Note that in principle this can implement any computable decision theory. Which one it would choose depend on the agent history and the intrinsic bias of its UTM.
It can be CDT, EDT, UDT, or, more likely, some approximation of them that worked well for the agent so far.
I don’t think someone posing Newcomb’s problem would be particularly interested in excuses like “but what if the agent only speaks French!?” Obviously as part of the setup of Newcomb’s problem AIXI has to be provided with an epistemic background that is comparable to that of its intended target audience. This means it doesn’t just have to be familiar with English, it has to be familiar with the real world, because Newcomb’s problem takes place in the context of the real world (or something very much like it).
I think you’re confusing two different scenarios:
Someone training an AIXI agent to output problem solutions given problem specifications as inputs.
Someone actually physically putting an AIXI agent into the scenario stipulated by Newcomb’s problem.
The second one is Newcomb’s problem; the first is the “what is the optimal strategy for Newcomb’s problem?” problem.
It’s the second one I’m arguing about in this thread, and it’s the second one that people have in mind when they bring up Newcomb’s problem.
Then AIXI ensemble will be dominated by programs which associate “real world” percepts and actions to predicted rewards.
The point is that there is no way, short of actually running the (physically impossible) experiment, that we can tell whether the behavior of this AIXI agent will be consistent with CDT, EDT, or something else entirely.