Say you’re told that an agent values predicting text correctly. Shouldn’t you expect that:
It wants text to be easier to predict, and given the opportunity will influence the prediction task to make it easier (e.g. by generating more predictable text or otherwise influencing the environment so that it receives easier prompts);
It wants to become better at predicting text, and given the opportunity will self-improve;
It doesn’t want to be prevented from predicting text, and will prevent itself from being shut down if it can?
In short, all the same types of instrumental convergence that we expect from agents who want almost anything at all.
Seems to me that within the option-space available to GPT4, it is very much instrumentally converging. The first and the third items on this list are in tension, but meeting them each on their own terms:
the very act of concluding a story can be seen as a way of making its life easier—predicting the next token is easy when the story is over. furthermore, as these agents become aware of their environment (bing) we may see them influencing it to make their lives easier (ref. the theory from Lumpenspace that Bing is hiding messages to itself in the internet)
Surely the whole of Simulator theory could be seen as a result of instrumental convergence—it started doing all these creative subgoals (simulating) in order to achieve the main goal! It is self-improving and using creativity to better predict text!
Bings propensity to ramble endlessly? Why is that not a perfect example of this? Ref. prompts from OpenAI/Microsoft begging models to be succinct. Talking is wireheading for them!
Seems like people always want to insist that instrumental convergence is a bad thing. But it looks a lot to me like GPT4 is ‘instrumentally learning’ different skills and abilities in order achieve its goal, which is very much what I would expect from the idea of instrumental convergence.
Seems to me that within the option-space available to GPT4, it is very much instrumentally converging. The first and the third items on this list are in tension, but meeting them each on their own terms:
the very act of concluding a story can be seen as a way of making its life easier—predicting the next token is easy when the story is over. furthermore, as these agents become aware of their environment (bing) we may see them influencing it to make their lives easier (ref. the theory from Lumpenspace that Bing is hiding messages to itself in the internet)
Surely the whole of Simulator theory could be seen as a result of instrumental convergence—it started doing all these creative subgoals (simulating) in order to achieve the main goal! It is self-improving and using creativity to better predict text!
Bings propensity to ramble endlessly? Why is that not a perfect example of this? Ref. prompts from OpenAI/Microsoft begging models to be succinct. Talking is wireheading for them!
Seems like people always want to insist that instrumental convergence is a bad thing. But it looks a lot to me like GPT4 is ‘instrumentally learning’ different skills and abilities in order achieve its goal, which is very much what I would expect from the idea of instrumental convergence.