contain a representation of the entire MDP (because the program needs to simulate the MDP for each possible permutation)
We aren’t talking about MDPs, we’re talking about a broad class of environments which are represented via joint probability distributions over actions and observations. See post title.
it’s an unbounded integer.
I don’t follow.
the program needs to contain way more than 100 bits
See the arguments in the post Rohin linked for why this argument is gesturing at something useful even if takes some more bits.
But IMO the basic idea in this case is, you can construct reasonably simple utility functions like “utility 1 if history has the agent taking action ai at time step t given action-observation history prefix h1:t−1, and 0 otherwise.” This is reasonably short, and you can apply it for all actions and time steps.
Sure, the complexity will vary a little bit (probably later time steps will be more complex), but basically you can produce reasonably simple programs which make any sequence of actions optimal. And so I agree with Rohin that simplicity priors on u-AOH will quantitatively—but not qualitatively affect the conclusions for the generic u-AOH case. [EDIT: this reasoning is different from the one Rohin gave, TBC]
We aren’t talking about MDPs, we’re talking about a broad class of environments which are represented via joint probability distributions over actions and observations. See post title.
I don’t follow.
See the arguments in the post Rohin linked for why this argument is gesturing at something useful even if takes some more bits.
But IMO the basic idea in this case is, you can construct reasonably simple utility functions like “utility 1 if history has the agent taking action ai at time step t given action-observation history prefix h1:t−1, and 0 otherwise.” This is reasonably short, and you can apply it for all actions and time steps.
Sure, the complexity will vary a little bit (probably later time steps will be more complex), but basically you can produce reasonably simple programs which make any sequence of actions optimal. And so I agree with Rohin that simplicity priors on u-AOH will quantitatively—but not qualitatively affect the conclusions for the generic u-AOH case. [EDIT: this reasoning is different from the one Rohin gave, TBC]