Our current LLMs like GPT-4 are not, in their base configurations, agents. They do not have goals.
What is the difference between being an agent and not being an agent here? Goals seem like an obvious point but since GPT-4 also minimized its loss during training and perhaps still does as they keep tweaking it, is the implied difference that base GPT-4 is not minimizing its loss anymore (which is its goal in some sense) or does not minimize it continually? If so, the distinction seems quite fuzzy since you’d have to concede the same for an AutoGPT where you authorize the individual steps it runs.
That seems to be an answer considered when you later write
Will that cause the LLM to act as if it were an agent during the training run, seeking goals that arise out of the training run and thus almost certainly are only maximally fulfilled in ways that involve the LLM taking control of the future (and likely killing everyone), before we even get a chance to use RLHF on it? During the RLHF training run? Later on? At what level does this happen?
I’m pointing to both sections (and the potential tension between them) since with the clear agentive properties of AutoGPTs that run continually & where you can literally input goals, it seems like a straightforward failure mode to now only expect agentive properties from such systems. They might instead emerge in other AIs too (e.g. if they continually minimize their loss).
Does anyone have a better way to differentiate agents from non-agents, specifically for the cases we are witnessing?
Part of the answer: an agent reliably steers the world in a particular direction, even when you vary it’s starting conditions. GPT does a bunch of cool stuff, but if you give it a different starting prompt, it doesn’t go out of its way to accomplish the same set of things.
What is the difference between being an agent and not being an agent here? Goals seem like an obvious point but since GPT-4 also minimized its loss during training and perhaps still does as they keep tweaking it, is the implied difference that base GPT-4 is not minimizing its loss anymore (which is its goal in some sense) or does not minimize it continually? If so, the distinction seems quite fuzzy since you’d have to concede the same for an AutoGPT where you authorize the individual steps it runs.
That seems to be an answer considered when you later write
I’m pointing to both sections (and the potential tension between them) since with the clear agentive properties of AutoGPTs that run continually & where you can literally input goals, it seems like a straightforward failure mode to now only expect agentive properties from such systems. They might instead emerge in other AIs too (e.g. if they continually minimize their loss).
Does anyone have a better way to differentiate agents from non-agents, specifically for the cases we are witnessing?
Part of the answer: an agent reliably steers the world in a particular direction, even when you vary it’s starting conditions. GPT does a bunch of cool stuff, but if you give it a different starting prompt, it doesn’t go out of its way to accomplish the same set of things.