Instrumental convergence only matters if you have a goal to begin with. As far as I can tell, ChatGPT doesn’t ‘want’ to predict text, it’s just shaped that way.
It seems to me that anything that could or would ‘agentify’ itself, is already an agent. It’s like the “would Gandhi take the psychopath pill” question but in this case the utility function doesn’t exist to want to generate itself.
Is your mental model that a scaled-up GPT 3 spontaneously becomes an agent? My mental model says it just gets really good at predicting text.
My mental model is that a scaled up GPT becomes as dangerous as many agents precisely because it gets extremely good at producing text that would be an apt continuation of the preceding text.
Note that I do not say “predicting” text, since the system is not “trying” to predict anything. It’s just shaped in initial training by a process that involves treating its outputs as predictions. In fine-tuning it’s very likely that the outputs will not be treated as predictions, and the process may shape the system’s behaviour so that the outputs are more agent-like. It seems likely that this will be more common as the technology matures.
In many ways GPT is already capable of manifesting agents (plural) depending upon its prompts. They’re just not very capable—yet.
Instrumental convergence only matters if you have a goal to begin with. As far as I can tell, ChatGPT doesn’t ‘want’ to predict text, it’s just shaped that way.
It seems to me that anything that could or would ‘agentify’ itself, is already an agent. It’s like the “would Gandhi take the psychopath pill” question but in this case the utility function doesn’t exist to want to generate itself.
Is your mental model that a scaled-up GPT 3 spontaneously becomes an agent? My mental model says it just gets really good at predicting text.
My mental model is that a scaled up GPT becomes as dangerous as many agents precisely because it gets extremely good at producing text that would be an apt continuation of the preceding text.
Note that I do not say “predicting” text, since the system is not “trying” to predict anything. It’s just shaped in initial training by a process that involves treating its outputs as predictions. In fine-tuning it’s very likely that the outputs will not be treated as predictions, and the process may shape the system’s behaviour so that the outputs are more agent-like. It seems likely that this will be more common as the technology matures.
In many ways GPT is already capable of manifesting agents (plural) depending upon its prompts. They’re just not very capable—yet.