mako yass comments on Paper: LLMs trained on “A is B” fail to learn “B is A”

mako yass 30 Sep 2023 1:30 UTC
2 points
0
Hold on, if the model were just interpreting this as a fair sample, this would be correct behavior. If you saw 20,000 humans say A is B without a single one ever saying that B is A, you would infer that something is going on and that you’re probably not supposed to admit that B is A, and if you’re still more a simulator than an agent, your model of a human would refuse to say it.
Do the tests address this? Or do they need to? (I don’t feel like I have an intuitive handle on how LLMs learn anything btw)