My other reply addressed what I thought is the core of our disagreement, but not particularly your exact statements you make in your comment. So I’m addressing them here.
The model’s “training/optimization”, as characterized by the outer loss, is not what determines the inner optimizer’s cognition.
Let me be clear that I am NOT saying that any inner optimizer, if it exists, would have a goal that is equal to minimizing the outer loss. What I am saying is that it would have a goal that, in practice, when implemented in a single pass of the LLM has the effect of of minimizing the LLM’s overall outer loss with respect to that ONE token. And that it would be very hard for such a goal to cash out, in practice, to wanting long range real-world effects.
Let me also point out your implicit assumption that there is an ‘inner’ cognition which is not literally the mask.
Here is some other claim someone could make:
This person would be saying, “hey look, this datacenter full of GPUs is carrying out this agentic-looking cognition. And, it could easily carry out other, completely different agentic cognition. Therefore, the datacenter must have these capabilities independently from the LLM and must have its own ‘inner’ cognition.”
I think that you are making the same philosophical error that this claim would be making.
However, if we didn’t understand GPUs we could still imagine that the datacenter does have its own, independent ‘inner’ cognition, analogous to, as I noted in a previous comment, John Searle in his Chinese room. And if this were the case, it would be reasonable to expect that this inner cognition might only be ‘acting’ for instrumental reasons and could be waiting for an opportunity to jump out and suddenly do something else other than running the LLM.
The GPU software is not tightly optimized specifically to run the LLM or an ensemble of LLMs and could indeed have other complications and who knows what it could end up doing?
Because the LLM does super duper complicated stuff instead of massively parallelized simple stuff, I think it’s a bit more reasonable to expect there to be internal agentic stuff inside it. For all I know it could be one agent (or ensemble of agents) on top of another for many layers!
But, unlike in the case of the datacenter, we do have strong reasons to believe that these agents, if they exist, will have goals correctly targeted at doing what in practice achieves the best best results in a single forward pass of the model (next token prediction) and not on attempting long-term or real world effects (see my other reply to your comment).
Could you rephrase your point without making mention to “masks” (or any synonyms), and describe more concretely what you’re imagining here, and how it leads to a (nonfake) “goal slot”?
The LLM is generating output that resembles training data produced by a variety of processes (mostly humans). The stronger the LLM becomes, the more the properties of the output are determined by (generalizations of) the properties of the training data and generating processes. Some of the data is generated by agentic processes with different goals. In order to accurately predict them, the LLM must model these goals. The output of the LLM is then influenced by these goals which are derived/generalized from these external processes. (This is the core of what I mean by the “mask”). Any separate goal that originates “internally” must not cause deviations from all this, or it would have been squashed in training. Therefore, apparently agentic behaviour of the output must originate in the external processes being emulated or generalizations of them, and not from separate, internal goals (see my other reply for additional argument but also caveats).
My other reply addressed what I thought is the core of our disagreement, but not particularly your exact statements you make in your comment. So I’m addressing them here.
Let me be clear that I am NOT saying that any inner optimizer, if it exists, would have a goal that is equal to minimizing the outer loss. What I am saying is that it would have a goal that, in practice, when implemented in a single pass of the LLM has the effect of of minimizing the LLM’s overall outer loss with respect to that ONE token. And that it would be very hard for such a goal to cash out, in practice, to wanting long range real-world effects.
Let me also point out your implicit assumption that there is an ‘inner’ cognition which is not literally the mask.
Here is some other claim someone could make:
This person would be saying, “hey look, this datacenter full of GPUs is carrying out this agentic-looking cognition. And, it could easily carry out other, completely different agentic cognition. Therefore, the datacenter must have these capabilities independently from the LLM and must have its own ‘inner’ cognition.”
I think that you are making the same philosophical error that this claim would be making.
However, if we didn’t understand GPUs we could still imagine that the datacenter does have its own, independent ‘inner’ cognition, analogous to, as I noted in a previous comment, John Searle in his Chinese room. And if this were the case, it would be reasonable to expect that this inner cognition might only be ‘acting’ for instrumental reasons and could be waiting for an opportunity to jump out and suddenly do something else other than running the LLM.
The GPU software is not tightly optimized specifically to run the LLM or an ensemble of LLMs and could indeed have other complications and who knows what it could end up doing?
Because the LLM does super duper complicated stuff instead of massively parallelized simple stuff, I think it’s a bit more reasonable to expect there to be internal agentic stuff inside it. For all I know it could be one agent (or ensemble of agents) on top of another for many layers!
But, unlike in the case of the datacenter, we do have strong reasons to believe that these agents, if they exist, will have goals correctly targeted at doing what in practice achieves the best best results in a single forward pass of the model (next token prediction) and not on attempting long-term or real world effects (see my other reply to your comment).
The LLM is generating output that resembles training data produced by a variety of processes (mostly humans). The stronger the LLM becomes, the more the properties of the output are determined by (generalizations of) the properties of the training data and generating processes. Some of the data is generated by agentic processes with different goals. In order to accurately predict them, the LLM must model these goals. The output of the LLM is then influenced by these goals which are derived/generalized from these external processes. (This is the core of what I mean by the “mask”). Any separate goal that originates “internally” must not cause deviations from all this, or it would have been squashed in training. Therefore, apparently agentic behaviour of the output must originate in the external processes being emulated or generalizations of them, and not from separate, internal goals (see my other reply for additional argument but also caveats).