Do LLM’s learn to break their sensors?
Yes, I am proposing something that is not a standard part of ML training.
Gradient descent will move you around less if you can navigate to parts of the environment that give you low loss. This setup is somehow between RL and unsupervised learning in the sense that it has state but you are using autoregressive loss. It is similar to conditional pre-training, but instead of prepending a reward, you are prepending a summary that the LM generated itself.
The gradient would indeed be flowing indirectly here, and that actions would make the input more predictable is an empirical prediction that A) I could be wrong about and B) is not a crux for this method and C) is not a crux for this article, unless the reader thinks that there is no way to train an AI in a human like way and needs and existence proof.
I claim that I do not need to, since there is an intuitive notion of what an AI is. An AI trained with MCTS on chess satisfies that criterion less well than GPT-4 for instance. But since history has already spelled out most of the details for us, it will probably use gradient descent and auto-regressive loss to form the core of its intelligence. Then the question is how to mix prompting and fine-tuning in a way that mirrors how a learning human would incorporate inputs.
Good point, there is probably some room to incorporate active learning with LM’s. It might not be the regular kind where you ask for ground truth labels where the model predicts outputs close to the decision boundary, but rather a version where the LM tells you what it wants to read. This may only work once the model is sufficiently competent, though.
I agree the programmer needs to put something in: not by hard-coding what actions the AI will take, but rather by shaping the outer loop in which it interacts with its environment. I can see how this would seem to contradict my claim that nurture is more important than nurture for AIs. I am not trying to say that the programmer needs to do nothing at all—for example, someone needed to think of gradient descent in the first place.
My point is rather that this shaping process can be quite light-handed. For instance, my example earlier in this comment thread is that we can structure the prompt to take actions (like langchain or toolformer or ReACT …) and additionally fine-tune on observations conditioned on state. The way that you are phrasing putting “nature” in, sounds much more heavy-handed, like somehow hard-coding some database with human values. Oh yeah, people did this, called it Constitutional AI, and I also think this is heavy-handed in the sense of trying to hard-code what specifically is right and wrong. It feels like the good old fashioned AI mistake all over again.
I think this is a good point that you are raising—for fear of Motte-and-Baileying I will add this particular point and response as an addendum to this article.
A pure auto-regressive model will indeed predict “B”. I was talking about making the environment more predictable in the context of the structured prompt setup, which keeps actions in a distinct part of the prompt from observations. This separation is similar to keeping the separation between active and passive parts of the boundary in Andrew Critch’s Boundaries Part 3a.