Later information can “choose many different games”—specifically, whenever the posterior distribution of system-state S given two possible X values is different, there must be at least oneY value under which optimal play differs for the two X values.
Given your four conditions, I wonder if there’s a result like “optimally power-seeking agents (minimizing information costs) must model the world.” That is, I think power is about being able to achieve a wide range of different goals (to win at ‘many different games’ the environment could ask of you), and so if you want to be able to sufficiently accurately estimate the expected power provided by a course of action, you have to know how well you can win at all these different games.
Given your four conditions, I wonder if there’s a result like “optimally power-seeking agents (minimizing information costs) must model the world.” That is, I think power is about being able to achieve a wide range of different goals (to win at ‘many different games’ the environment could ask of you), and so if you want to be able to sufficiently accurately estimate the expected power provided by a course of action, you have to know how well you can win at all these different games.
Yes! That is exactly the sort of theorem I’d expect to hold. (Though you might need to be in POMDP-land, not just MDP-land, for it to be interesting.)