By farsightedness, I mean the value of the discount factor γ∈[0,1), with which the agent geometrically discounts rewards at future time steps. That is, the reward r received k steps in the future is discounted as γkr. My theorems assume that, given the reward function R, the agent computes the optimal policy (set) for R at discount rate γ.
There’s a different (intuitive) notion of farsightedness, in which the agent can only compute policies within a k-neighborhood of the current state. I think this is the notion you’re referring to. In this case, gaining power is a good heuristic, as you say.
Ah! Thanks so much. I was definitely conflating farsightedness as discount factor and farsightedness as vision of possible states in a landscape.
And that is why some resource increasing state may be too far out of the way, meaning NOT instrumentally convergent, - because the more distant that state is the closer its value is to zero, until it actually is zero. Hence the bracket.
By farsightedness, I mean the value of the discount factor γ∈[0,1), with which the agent geometrically discounts rewards at future time steps. That is, the reward r received k steps in the future is discounted as γkr. My theorems assume that, given the reward function R, the agent computes the optimal policy (set) for R at discount rate γ.
There’s a different (intuitive) notion of farsightedness, in which the agent can only compute policies within a k-neighborhood of the current state. I think this is the notion you’re referring to. In this case, gaining power is a good heuristic, as you say.
Ah! Thanks so much. I was definitely conflating farsightedness as discount factor and farsightedness as vision of possible states in a landscape.
And that is why some resource increasing state may be too far out of the way, meaning NOT instrumentally convergent, - because the more distant that state is the closer its value is to zero, until it actually is zero. Hence the bracket.