kolmplex comments on Agentized LLMs will change the alignment landscape

kolmplex 10 Apr 2023 14:05 UTC
2 points
1
This DeepMind paper explores some intrinsic limitations of agentic LLMs. The basic idea is (my words):
If the training data used by an LLM is generated by some underlying process (or context-dependent mixture of processes) that has access to hidden variables, then an LLM used to choose actions can easily go out-of-distribution.
For example, suppose our training data is a list of a person’s historical meal choices over time, formatted as tuples that look like (Meal Choice, Meal Satisfaction). The training data might look like (Pizza, Yes)(Cheeseburger, Yes)(Tacos, Yes).
When the person originally chose what to eat, they might have had some internal idea of what food they wanted to eat that day, so the list of tuples will only include examples where the meal was satisfying.
If we try to use the LLM to predict what food a person ought to eat, that LLM won’t have access to the person’s hidden daily food preference. So it might make a bad prediction and you could end up with a tuple like (Taco, No). This immediately puts the rest of the sequence out-of-distribution.
The paper proposes various solutions for this problem. I think that increasing scale probably helps dodge this issue, but it does show an important weak point of using LLMs to choose causal actions.