Hm, that might be a potential point of confusion. I agree that there’s no agentic stuff, at least without RL or a memory source, but the LLM is still pursuing the goal of maximizing the likelihood of the training data, which comes apart pretty quickly from the preferences of humans, for many reasons.
You’re right that it doesn’t actively intervene, mostly because of the following:
There’s no RL, usually.
It is memoryless, in the sense that it forgets itself.
It doesn’t have a way to store arbitrarily long/complex problems in their memory, nor can it write memories to a brain.
But the Maximum Likelihood Estimation goal still gives you misaligned behavior, and I’ll give you examples:
So the LLM is still optimizing for Maximum Likelihood Estimation, it just has certain limitations so that it just misaligns it passively, instead of actively.
Hm, that might be a potential point of confusion. I agree that there’s no agentic stuff, at least without RL or a memory source, but the LLM is still pursuing the goal of maximizing the likelihood of the training data, which comes apart pretty quickly from the preferences of humans, for many reasons.
You’re right that it doesn’t actively intervene, mostly because of the following:
There’s no RL, usually.
It is memoryless, in the sense that it forgets itself.
It doesn’t have a way to store arbitrarily long/complex problems in their memory, nor can it write memories to a brain.
But the Maximum Likelihood Estimation goal still gives you misaligned behavior, and I’ll give you examples:
Completing buggy Python code in a buggy way
https://arxiv.org/abs/2107.03374
Or to espouse views consistent with those expressed in the prompt (sycophancy).
https://arxiv.org/pdf/2212.09251.pdf
So the LLM is still optimizing for Maximum Likelihood Estimation, it just has certain limitations so that it just misaligns it passively, instead of actively.