Daniel Paleka comments on Paper: LLMs trained on “A is B” fail to learn “B is A”

Daniel Paleka 15 Nov 2023 19:32 UTC
1 point
0
The key adjustment in this post is that they train on the entire sequence
Yeah, but my understanding of the post is that it wasn’t enough; it only worked out when A was Tom Cruise, not Uriah Hawthorne. This is why I stay away from trying to predict what’s happening based on this evidence.
Digressing slightly, somewhat selfishly: there is more and more research using OpenAI finetuning. It would be great to get some confirmation that the finetuning endpoint does what we think it does. Unlike with the model versions, there are no guarantees on the finetuning endpoint being stable over time; they could introduce a p(A | B) term when finetuning on {”prompt”: A, “completion”: B} at any time if it improved performance, and experiments like this would then go to waste.