because Alex’s paper doesn’t take an arbitrary utility function and prove instrumental convergence;
That’s right; that would prove too much.
namely X = “the reward function is typical”. Does that sound right?
Yeah, although note that I proved asymptotic instrumental convergence for typical functions under iid reward sampling assumptions at each state, so I think there’s wiggle room to say “but the reward functions we provide aren’t drawn from this distribution!”. I personally think this doesn’t matter much, because the work still tells us a lot about the underlying optimization pressures.
The result is also true in the general case of an arbitrary reward function distribution, you just don’t know in advance which terminal states the distribution prefers.
That’s right; that would prove too much.
Yeah, although note that I proved asymptotic instrumental convergence for typical functions under iid reward sampling assumptions at each state, so I think there’s wiggle room to say “but the reward functions we provide aren’t drawn from this distribution!”. I personally think this doesn’t matter much, because the work still tells us a lot about the underlying optimization pressures.
The result is also true in the general case of an arbitrary reward function distribution, you just don’t know in advance which terminal states the distribution prefers.