If an agent is randomly placed in a given distribution of randomly connected points, I see why there are diminishing returns on seeking more power, but that return is never 0, is it?
“most agents stay alive in Pac-Man and postpone ending a Tic-Tac-Toe game”, but only in the limit of farsightedness (γ→1)
I think there are two separable concepts at work in these examples, the success of an agent and the agent’s choices as determined by the reward functions and farsightedness.
If we compare two agents, one with the limit of farsightedness and the other with half that, farsightedness (γ→1/2), then I expect the first agent to be more successful across a uniform distribution of reward functions and to skip over doing things like Trade School, but the second agent in light of more limited farsightedness would be more successful if it were seeking power. As Vanessa Kosoy said above,
… gaining is more robust to inaccuracies of the model or changes in the circumstances than pursuing more “direct” paths to objectives.
What I meant originally is that if an agent doesn’t know if γ→1, then is it not true that an agent “seeks out the states in the future with the most resources or power? Now, certainly the agent can get stuck at a local maximum because of shortsightedness, and an agent can forgo certain options as result of its farsightedness.
So I am interpreting the theorem like so:
An agent seeks out states in the future that have more power at the limit of its farsightedness, but not states that, while they have more power, are below its farsightedness “rating.”
By farsightedness, I mean the value of the discount factor γ∈[0,1), with which the agent geometrically discounts rewards at future time steps. That is, the reward r received k steps in the future is discounted as γkr. My theorems assume that, given the reward function R, the agent computes the optimal policy (set) for R at discount rate γ.
There’s a different (intuitive) notion of farsightedness, in which the agent can only compute policies within a k-neighborhood of the current state. I think this is the notion you’re referring to. In this case, gaining power is a good heuristic, as you say.
Ah! Thanks so much. I was definitely conflating farsightedness as discount factor and farsightedness as vision of possible states in a landscape.
And that is why some resource increasing state may be too far out of the way, meaning NOT instrumentally convergent, - because the more distant that state is the closer its value is to zero, until it actually is zero. Hence the bracket.
If an agent is randomly placed in a given distribution of randomly connected points, I see why there are diminishing returns on seeking more power, but that return is never 0, is it?
This gives me pause.
Can you expand? Also, what’s the distribution of reward functions in this scenario – uniform?
You say:
I think there are two separable concepts at work in these examples, the success of an agent and the agent’s choices as determined by the reward functions and farsightedness.
If we compare two agents, one with the limit of farsightedness and the other with half that, farsightedness (γ→1/2), then I expect the first agent to be more successful across a uniform distribution of reward functions and to skip over doing things like Trade School, but the second agent in light of more limited farsightedness would be more successful if it were seeking power. As Vanessa Kosoy said above,
What I meant originally is that if an agent doesn’t know if γ→1, then is it not true that an agent “seeks out the states in the future with the most resources or power? Now, certainly the agent can get stuck at a local maximum because of shortsightedness, and an agent can forgo certain options as result of its farsightedness.
So I am interpreting the theorem like so:
An agent seeks out states in the future that have more power at the limit of its farsightedness, but not states that, while they have more power, are below its farsightedness “rating.”
Note: Assuming a uniform reward function.
By farsightedness, I mean the value of the discount factor γ∈[0,1), with which the agent geometrically discounts rewards at future time steps. That is, the reward r received k steps in the future is discounted as γkr. My theorems assume that, given the reward function R, the agent computes the optimal policy (set) for R at discount rate γ.
There’s a different (intuitive) notion of farsightedness, in which the agent can only compute policies within a k-neighborhood of the current state. I think this is the notion you’re referring to. In this case, gaining power is a good heuristic, as you say.
Ah! Thanks so much. I was definitely conflating farsightedness as discount factor and farsightedness as vision of possible states in a landscape.
And that is why some resource increasing state may be too far out of the way, meaning NOT instrumentally convergent, - because the more distant that state is the closer its value is to zero, until it actually is zero. Hence the bracket.