The real problem is how to construct natural rich classes of hypotheses-for-agents which can be learned statistically efficiently. All the classical answers to this go through the cybernetic framework, which imposes a particular causal structure. At the same time, examples like repeated XOR blackmail or repeated counterfactual mugging are inconsistent with any Bayesian realization of that causal structure. Because of this, classical agents (e.g. Bayesian RL with some class of communicating POMDPs as its prior) fail to to converge to optimal behavior in those examples. On the other hand, infra-Bayesian agents do converge to optimal behavior. So, there is something to be confused about in decision theory, and arguably IB is the solution for that.
The real problem is how to construct natural rich classes of hypotheses-for-agents which can be learned statistically efficiently. All the classical answers to this go through the cybernetic framework, which imposes a particular causal structure. At the same time, examples like repeated XOR blackmail or repeated counterfactual mugging are inconsistent with any Bayesian realization of that causal structure. Because of this, classical agents (e.g. Bayesian RL with some class of communicating POMDPs as its prior) fail to to converge to optimal behavior in those examples. On the other hand, infra-Bayesian agents do converge to optimal behavior. So, there is something to be confused about in decision theory, and arguably IB is the solution for that.