It’s not enough that the agent build the Turing machine that implements the expert, it needs to furthermore modify itself to behave like that Turing machine, otherwise you just have some Turing machine in the environment doing its own thing, and the agent still can’t behave like the expert.
I don’t care if the agent is “really doing the thing himself” or not. I care that the end result is the overall system imitating the expert. Of course my extreme example is in some sense not useful, I’m saying “the expert is already building the agent you want, so you can imitate it to build the agent you want”. The point of the example is showing a simple crisp way the proof fails.
So yeah then I don’t know how to clearly move from the very hypothetical counterexample to something less hypothetical. To start, I can have the agent “do the work himself” by having the expert run the machine it defined with its own cognition. This is in principle possible in the autoregressive paradigm, since if you consider the stepping function as the agent, it’s fed its previous output. However there’s some contrivance in having the expert define the machine in the initial sequence, and then running it, in such a way that the learner gets both the definition and the running part from imitation. I don’t have a clear picture in my mind. And next I’d have to transfer somehow the intuition to the domain of human language.
I agree overall with the rest of your analysis, in particular thinking about this in term of threshold coherence lengths. If somehow the learner needs to infer the expert Turing machine from the actions, the relevant point is indeed how long is the specification of such machine.
I don’t care if the agent is “really doing the thing himself” or not. I care that the end result is the overall system imitating the expert. Of course my extreme example is in some sense not useful, I’m saying “the expert is already building the agent you want, so you can imitate it to build the agent you want”. The point of the example is showing a simple crisp way the proof fails.
So yeah then I don’t know how to clearly move from the very hypothetical counterexample to something less hypothetical. To start, I can have the agent “do the work himself” by having the expert run the machine it defined with its own cognition. This is in principle possible in the autoregressive paradigm, since if you consider the stepping function as the agent, it’s fed its previous output. However there’s some contrivance in having the expert define the machine in the initial sequence, and then running it, in such a way that the learner gets both the definition and the running part from imitation. I don’t have a clear picture in my mind. And next I’d have to transfer somehow the intuition to the domain of human language.
I agree overall with the rest of your analysis, in particular thinking about this in term of threshold coherence lengths. If somehow the learner needs to infer the expert Turing machine from the actions, the relevant point is indeed how long is the specification of such machine.