>Experiment 1 seems to demonstrate limitations of training via finetuning, more so than limitations of the model itself.
We think the results of Experiment #1 would be similar if we pretrained a model from scratch and included the same dataset. Do you disagree? (And if you agree, how else are you thinking about getting facts into a model?)
The rest of the points are interesting and relate to thoughts we’ve had. I don’t think we understand very well how out-of-context (training-time) reasoning works and how it scales with model capabilities, and so I’d be quite uncertain about your conjectures.
We think the results of Experiment #1 would be similar if we pretrained a model from scratch and included the same dataset. Do you disagree? (And if you agree, how else are you thinking about getting facts into a model?)
Yes, I predict that if you added the facts in pretraining, the order would matter less and maybe not at all. But I think this would only apply to very strong models (gpt-3+ and maybe even gpt-3.5-instruct-turbo+).
Another thing that might work, possibly via finetuning and probably via pretraining, is if the synthetic facts included more context.
e.g. Daphne Barrington is the director of "A Journey Through Time". She also wrote and directed "A Journey Through Time 2". She is well-known for her time-based movies.
(Why do I expect this to work? Because the model then sees examples where “She” follows a “A Journey Through Time” in contexts where it’s knowable that “She” refers to Daphne. )
Less confidently, I predict that if you finetuned an even weaker model (e.g. text-ada-001, or a ~100m parameter open-source model, perhaps also finetuning more aggressively than is possible through the OpenAI finetuning API), you would also get a different result, assuming the model was able to learn the non-reversed fact via finetuning at all.
Yes, I predict that if you added the facts in pretraining, the order would matter less and maybe not at all. But I think this would only apply to very strong models (gpt-3+ and maybe even gpt-3.5-instruct-turbo+).
There are two pieces of evidence against this. The influence function results, showing the Reversal Curse for models better than GPT-3, and our results in Experiment 2 for GPT3.5 and GPT-4.
Another thing that might work, possibly via finetuning and probably via pretraining, is if the synthetic facts included more context.
If the training set includes texts of the form “A is B. A is also C”, then you have both orders present (A is B and B is A) and so the Reversal Curse is not applicable.
We trained ada, which is 350M parameters. We trained Llama-1 “aggressively” (e.g. for many epochs and with a hyperparameter sweep). It’s all in the paper.
Ah, my bad. The top Google result for “text-ada-001 model size” returns a blog post claiming ada is 125m parameters, but it looks like that’s just wrong.
If the training set includes texts of the form “A is B. A is also C”, then you have both orders present (A is B and B is A) and so the Reversal Curse is not applicable.
Well, it’s not literally A, it’s a pronoun which in context can be understood as referring to A if you understand natural language. Do you think the effect goes away if you finetune on data of the form Daphne Barrington is / the director of "A Journey Through Time". She (cutting off the answer as early as “She”)?
Anyway, I still think the reversal curse is more about a deficiency in the training process rather than the model itself; even weak models are clearly capable of doing logical deduction given the right setup (e.g. within a prompt), so the question is more like, how good does the training process have to be (and maybe how big does the model have to be) for the model to be reliably capable of doing logical deduction on:
facts that are present in its prompt (pretty easy)
facts that are present in the finetuning data (pretty hard, apparently)
facts that are in the pretraining data (maybe in-between, and maybe also depends on the specifics of the pretraining process?)
e.g. What happens if you train on the word-wise reversal of all your data? Literally add {The word-wise reversal of the previous text is: ' '.join(reversed(training_doc.split(' ')))} to all your pretraining data, and then train the model on the (twice as large, very redundant) dataset.
Even if something simple like that doesn’t actually make the reversal curse go away, I expect that there is some training process, not too much more sophisticated that current pretraining processes, which does work when applied to current models, or at least to current model architectures (perhaps scaled up a bit).
Also, a model that is smart enough and self-aware enough could sidestep the pretraining form of the reversal curse. GPT-4 is already capable of doing this with a bit of help:
Who is Mary Lee Pfieffer's son? If you don't know, list out some famous celebrities and their mothers' names to see if you can discover the answer within yourself.
Usually causes GPT-4 to get the right answer pretty quickly.
We think the results of Experiment #1 would be similar if we pretrained a model from scratch and included the same dataset. Do you disagree? (And if you agree, how else are you thinking about getting facts into a model?)
The rest of the points are interesting and relate to thoughts we’ve had. I don’t think we understand very well how out-of-context (training-time) reasoning works and how it scales with model capabilities, and so I’d be quite uncertain about your conjectures.
Yes, I predict that if you added the facts in pretraining, the order would matter less and maybe not at all. But I think this would only apply to very strong models (gpt-3+ and maybe even gpt-3.5-instruct-turbo+).
Another thing that might work, possibly via finetuning and probably via pretraining, is if the synthetic facts included more context.
e.g.
Daphne Barrington is the director of "A Journey Through Time". She also wrote and directed "A Journey Through Time 2". She is well-known for her time-based movies.
(Why do I expect this to work? Because the model then sees examples where “She” follows a “A Journey Through Time” in contexts where it’s knowable that “She” refers to Daphne. )
Less confidently, I predict that if you finetuned an even weaker model (e.g. text-ada-001, or a ~100m parameter open-source model, perhaps also finetuning more aggressively than is possible through the OpenAI finetuning API), you would also get a different result, assuming the model was able to learn the non-reversed fact via finetuning at all.
There are two pieces of evidence against this. The influence function results, showing the Reversal Curse for models better than GPT-3, and our results in Experiment 2 for GPT3.5 and GPT-4.
If the training set includes texts of the form “A is B. A is also C”, then you have both orders present (A is B and B is A) and so the Reversal Curse is not applicable.
We trained ada, which is 350M parameters. We trained Llama-1 “aggressively” (e.g. for many epochs and with a hyperparameter sweep). It’s all in the paper.
Ah, my bad. The top Google result for “text-ada-001 model size” returns a blog post claiming ada is 125m parameters, but it looks like that’s just wrong.
Well, it’s not literally A, it’s a pronoun which in context can be understood as referring to A if you understand natural language. Do you think the effect goes away if you finetune on data of the form
Daphne Barrington is / the director of "A Journey Through Time". She
(cutting off the answer as early as “She”)?Anyway, I still think the reversal curse is more about a deficiency in the training process rather than the model itself; even weak models are clearly capable of doing logical deduction given the right setup (e.g. within a prompt), so the question is more like, how good does the training process have to be (and maybe how big does the model have to be) for the model to be reliably capable of doing logical deduction on:
facts that are present in its prompt (pretty easy)
facts that are present in the finetuning data (pretty hard, apparently)
facts that are in the pretraining data (maybe in-between, and maybe also depends on the specifics of the pretraining process?)
e.g. What happens if you train on the word-wise reversal of all your data? Literally add
{The word-wise reversal of the previous text is: ' '.join(reversed(training_doc.split(' ')))}
to all your pretraining data, and then train the model on the (twice as large, very redundant) dataset.Even if something simple like that doesn’t actually make the reversal curse go away, I expect that there is some training process, not too much more sophisticated that current pretraining processes, which does work when applied to current models, or at least to current model architectures (perhaps scaled up a bit).
Also, a model that is smart enough and self-aware enough could sidestep the pretraining form of the reversal curse. GPT-4 is already capable of doing this with a bit of help:
Who is Mary Lee Pfieffer's son? If you don't know, list out some famous celebrities and their mothers' names to see if you can discover the answer within yourself.
Usually causes GPT-4 to get the right answer pretty quickly.
https://chat.openai.com/share/a0af0a58-5ec3-408b-86a7-7a9aa82d3c9d
https://chat.openai.com/share/145cd3e7-2a91-4c6c-8831-f3f2935316ee
A more capable model could probably learn to do this itself, without the “famous celebrities” hint from the user.