More in general, yes. As an extreme case, imagine an expert operating in some rich domain, whose actions entailed building a Turing machine that implemented the expert itself and running it. An agent faithfully imitating the expert would get a functional expert behavior after a single learning session. To bend the chess allegory, if you were in some alien conceptual world where chess was Turing-complete and chess grandmasters were short Turing machines if written in chess, you might be able to become a chess grandmaster just by observing a grandmaster’s actual play. This weird scenario violates the assumption of the “probability of error” argument, that the expert mind could probably not be inferred from its actions.
Ah, very good point, but the crucial fact about those environments that allows them to break the behaviour cloning “curse” is that the agent’s implementation is part of the environment, not merely that the environment be expressive enough. It’s not enough that the agent build the Turing machine that implements the expert, it needs to furthermore modify itself to behave like that Turing machine, otherwise you just have some Turing machine in the environment doing its own thing, and the agent still can’t behave like the expert. Another requirement is that the environment be free from large-ish random perturbations, if the expert has to subtly adjust its behaviour to account for perturbations while it’s building the Turing machine, the agent won’t be able to learn a long sequence of moves from copying a single episode of the expert, and we get the same (1-e)^n problem.
I suppose that now that GPT is connected to the internet, its environment is now technically expressive enough that it could modify its own implementation in this way. That is, if it ever gets to the point that it manages to stay coherent long enough (or have humans extend its coherent time horizon by re-rolling prompts) to modify itself in this way and literally implement this pathological case.
This argument morphs to LLMs in the following way: human language is rich. It is flexible, for you can express in principle any thought in it. It is recursive, for you can talk about the world, yourself within the world, and the language within yourself within the world, and so on. Intuitively, it can be that language contains the schemes of human thought, not just as that abstract thing which produced the stream of language, but within the language itself, even though we did not lay down explicitly the algorithm of a human in words. If imitation training can find associations that somehow tap into this recursiveness, it could be that optimizing the imitation of a relatively short amount of human text was sufficient to crack humans.
It seems to me that this is arguing that human language is a special sort of environment where learning to behave like an expert lets you generalise much further out of distribution than for generic environments. That might be the case, and I agree that there’s something special about human language that wouldn’t be there if we were talking about robots imitating human walking gait for instance. I’m frankly unsure how to think about this, maybe we get unlucky and somehow everything generalises much further than it has any right to do.
I’ve also been thinking of the failure modes of this whole argument since yesterday, and I think a crucial point is that different abilities of GPT will have different coherent timescales, and some abilities I expect to never get incoherent. For instance, probably grammar will never falter for GPT4 and beyond no matter the sequence length. Because the internet contains enough small grammar deviations followed by correct text that the model can learn to correct them and ignore small mistakes. Importantly, because basically all grammar mistakes exist in the dataset, GPT is unlikely to make a mistake that humans have never made and get itself surprised by an “inhuman mistake”.
(If we get unlucky there might be a sequence of abilities, each of which enables the efficient learning of the next, once the coherent timescale gets to infinity. Maybe you can’t learn arithmetic until you’ve learned to eternally generate correct grammar, etc.)
In particular, the Lecun Argument predicts that the regions of language where GPT will have the most trouble will be those regions where “thinking is not done in public”, because those will contain very few examples of how to correct perturbations. So maybe it can generate unending coherent unedited forum posts, but scientific papers should be very significantly harder.
It’s not enough that the agent build the Turing machine that implements the expert, it needs to furthermore modify itself to behave like that Turing machine, otherwise you just have some Turing machine in the environment doing its own thing, and the agent still can’t behave like the expert.
I don’t care if the agent is “really doing the thing himself” or not. I care that the end result is the overall system imitating the expert. Of course my extreme example is in some sense not useful, I’m saying “the expert is already building the agent you want, so you can imitate it to build the agent you want”. The point of the example is showing a simple crisp way the proof fails.
So yeah then I don’t know how to clearly move from the very hypothetical counterexample to something less hypothetical. To start, I can have the agent “do the work himself” by having the expert run the machine it defined with its own cognition. This is in principle possible in the autoregressive paradigm, since if you consider the stepping function as the agent, it’s fed its previous output. However there’s some contrivance in having the expert define the machine in the initial sequence, and then running it, in such a way that the learner gets both the definition and the running part from imitation. I don’t have a clear picture in my mind. And next I’d have to transfer somehow the intuition to the domain of human language.
I agree overall with the rest of your analysis, in particular thinking about this in term of threshold coherence lengths. If somehow the learner needs to infer the expert Turing machine from the actions, the relevant point is indeed how long is the specification of such machine.
Ah, very good point, but the crucial fact about those environments that allows them to break the behaviour cloning “curse” is that the agent’s implementation is part of the environment, not merely that the environment be expressive enough. It’s not enough that the agent build the Turing machine that implements the expert, it needs to furthermore modify itself to behave like that Turing machine, otherwise you just have some Turing machine in the environment doing its own thing, and the agent still can’t behave like the expert. Another requirement is that the environment be free from large-ish random perturbations, if the expert has to subtly adjust its behaviour to account for perturbations while it’s building the Turing machine, the agent won’t be able to learn a long sequence of moves from copying a single episode of the expert, and we get the same (1-e)^n problem.
I suppose that now that GPT is connected to the internet, its environment is now technically expressive enough that it could modify its own implementation in this way. That is, if it ever gets to the point that it manages to stay coherent long enough (or have humans extend its coherent time horizon by re-rolling prompts) to modify itself in this way and literally implement this pathological case.
It seems to me that this is arguing that human language is a special sort of environment where learning to behave like an expert lets you generalise much further out of distribution than for generic environments. That might be the case, and I agree that there’s something special about human language that wouldn’t be there if we were talking about robots imitating human walking gait for instance. I’m frankly unsure how to think about this, maybe we get unlucky and somehow everything generalises much further than it has any right to do.
I’ve also been thinking of the failure modes of this whole argument since yesterday, and I think a crucial point is that different abilities of GPT will have different coherent timescales, and some abilities I expect to never get incoherent. For instance, probably grammar will never falter for GPT4 and beyond no matter the sequence length. Because the internet contains enough small grammar deviations followed by correct text that the model can learn to correct them and ignore small mistakes. Importantly, because basically all grammar mistakes exist in the dataset, GPT is unlikely to make a mistake that humans have never made and get itself surprised by an “inhuman mistake”.
(If we get unlucky there might be a sequence of abilities, each of which enables the efficient learning of the next, once the coherent timescale gets to infinity. Maybe you can’t learn arithmetic until you’ve learned to eternally generate correct grammar, etc.)
In particular, the Lecun Argument predicts that the regions of language where GPT will have the most trouble will be those regions where “thinking is not done in public”, because those will contain very few examples of how to correct perturbations. So maybe it can generate unending coherent unedited forum posts, but scientific papers should be very significantly harder.
I don’t care if the agent is “really doing the thing himself” or not. I care that the end result is the overall system imitating the expert. Of course my extreme example is in some sense not useful, I’m saying “the expert is already building the agent you want, so you can imitate it to build the agent you want”. The point of the example is showing a simple crisp way the proof fails.
So yeah then I don’t know how to clearly move from the very hypothetical counterexample to something less hypothetical. To start, I can have the agent “do the work himself” by having the expert run the machine it defined with its own cognition. This is in principle possible in the autoregressive paradigm, since if you consider the stepping function as the agent, it’s fed its previous output. However there’s some contrivance in having the expert define the machine in the initial sequence, and then running it, in such a way that the learner gets both the definition and the running part from imitation. I don’t have a clear picture in my mind. And next I’d have to transfer somehow the intuition to the domain of human language.
I agree overall with the rest of your analysis, in particular thinking about this in term of threshold coherence lengths. If somehow the learner needs to infer the expert Turing machine from the actions, the relevant point is indeed how long is the specification of such machine.