Wei Dai comments on Does Solomonoff always win?

Wei Dai 23 Feb 2011 23:45 UTC
0 points

counted over world-programs that “contain” corresponding agent.

How do you formalize this? I couldn’t figure it out when I tried this.
- Vladimir_Nesov 24 Feb 2011 0:10 UTC
  0 points
  Parent
  Select the worlds whose world history is ambiently controlled by the agent, that is the ambient dependence is non-constant, the conclusion of which world-history is implemented by given world-program depends on which strategy we assume the agent implements. Then read out the utility of reward channel from that strategy in that world.
  
  Hmm… This is problematic if the same world contains multiple agent-instances that received different rewards (by following the same strategy but encountering different observations). What is the utility of such a world? This is a necessary question of specifying the decision problem. Perhaps it is a point where the notion of reinforcement learning breaks.