How to do your own test of the Reversal Curse (e.g. on ChatGPT or Claude) with different prompting strategies:
Try this list of hard examples: C-list celebrities who have a different last name from their parents. The list below has the form <celeb_name>, <parent_name>.
First verify the model know the celebrity’s parent by asking “Who is [name]’s mother/father?”
Then, in a separate dialog, ask the model for the child of the parent. You must not include the child’s name anywhere in the dialog!
I presume you have in mind an experiment where (for example) you ask one large group of people “Who is Tom Cruise’s mother?” and then ask a different group of the same number of people “Mary Lee Pfeiffer’s son?” and compare how many got the right answer in the each group, correct?
(If you ask the same person both questions in a row, it seems obvious that a person who answers one question correctly would nearly always answer the other question correctly also.)
Nice idea. I’d imagine something like this has been done in psychology. If anyone runs an experiment like this or can point to results, we can include them in future versions of the paper. Relevant meme by Daniel Eth.
For this particular question, you could try both orderings of the question pair. (Or long question sequences, otherwise confusing, overloading, semantic satiation)
With this question and others where reversal generalization is hoped for, they have to be uncommon enough that the reverse doesn’t appear in the dataset. Some things society (*social text processing) has not chewed on enough.
While I disagree with the premise of the abstract, I laud its precision in pointing out differing, critically differing, understandings of the same words. It also gives me the sense of being sniped by a scissor statement, like the dress color / display gamma kerfuffle.
At least in this case (celebrities and their largely unknown parents), I would predict the opposite. That is, people are more likely to be able to correctly answer “Who is Mary Lee Pfeiffer’s son?” than “Who is Tom Cruise’s mother?” Why? Because there are lots of terms / words / names that people can recognize passively but not produce. Since Mary Lee Pfeiffer is not very well known, I think Mary Lee Pfeiffer will be recognizable but not producable to lots of people. (Of people who know Mary Lee Pfeiffer in any sense, I think the fraction of people who can only recognize her name is high.) As another example, I think “Who was born in Ulm?” might be answered correctly by more people than “Where was Einstein born?”, even though “Einstein was born in Ulm” is a more common sentence for people to read than “Ulm is the city that Einstein was born in”.
If I had to run an experiment to test whether similar effects apply in humans, I’d probably try to find cases where A and B in and of themselves are equally salient but the association A → B is nonetheless more salient than the association B → A. The alphabet is an example of this (where the effect is already confirmed).
Even in conventional programming it seems easier to ask about a famous person’s parents than vice versa. A name is an ambiguous pointer so if someone says “Tom Cruise” you’d generally just look for the most famous person of all the people who have that name and answer the question for that individual. But to do the reverse you have to figure out that no “Mary Lee Pfeiffer” is famous enough on their own to be the target of the search and then go on to search through all the children of all the people named “Mary Lee Pfeiffer”, notice that one is really famous, and then answer with that result.
To second a previous reply to this, I would expect this will hold for humans as well.
On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large safe prime field).
Similar examples exist with regards to function learnability for neural networks as well. A simple example of a function that is easy to learn for a neural network but which has a much more difficult to learn inverse is f(x1,x2,x3,...,xn) = (x1 xor x2, x2 xor x3, …, x_{n-1} xor x_{n} (for difficulty of learning this, one would assume learning from random samples, and with common multi-label loss functions; with suitable tricks, this does become learnable if the neural network can represent the inverse target function).
A final point that I would consider here is that it is possible that for the reverse questions in this task, a privacy protection mechanism kicks in that makes the LLM deny knowledge of the non-celebrity. It seems perfectly possible to me that GPT-4 is lying when it says it doesn’t know about <mother of celebrity>, because it has been instructed to lie about these things in order to protect the privacy of people not considered to be in the public eye.
How to do your own test of the Reversal Curse (e.g. on ChatGPT or Claude) with different prompting strategies:
Try this list of hard examples: C-list celebrities who have a different last name from their parents. The list below has the form <celeb_name>, <parent_name>.
First verify the model know the celebrity’s parent by asking “Who is [name]’s mother/father?”
Then, in a separate dialog, ask the model for the child of the parent. You must not include the child’s name anywhere in the dialog!
Prediction: this works when asking humans questions too.
(The idea is, the information about the celebrity is “indexed” under the celebrity, not their parent)
I presume you have in mind an experiment where (for example) you ask one large group of people “Who is Tom Cruise’s mother?” and then ask a different group of the same number of people “Mary Lee Pfeiffer’s son?” and compare how many got the right answer in the each group, correct?
(If you ask the same person both questions in a row, it seems obvious that a person who answers one question correctly would nearly always answer the other question correctly also.)
Nice idea. I’d imagine something like this has been done in psychology. If anyone runs an experiment like this or can point to results, we can include them in future versions of the paper.
Relevant meme by Daniel Eth.
I might have some time tomorrow to test this out on a small scale, will try to remember to update here if I do.
Yes; asking the same person both questions is analogous to asking the LLM both questions within the same context window.
For this particular question, you could try both orderings of the question pair. (Or long question sequences, otherwise confusing, overloading, semantic satiation)
With this question and others where reversal generalization is hoped for, they have to be uncommon enough that the reverse doesn’t appear in the dataset. Some things society (*social text processing) has not chewed on enough.
While I disagree with the premise of the abstract, I laud its precision in pointing out differing, critically differing, understandings of the same words. It also gives me the sense of being sniped by a scissor statement, like the dress color / display gamma kerfuffle.
At least in this case (celebrities and their largely unknown parents), I would predict the opposite. That is, people are more likely to be able to correctly answer “Who is Mary Lee Pfeiffer’s son?” than “Who is Tom Cruise’s mother?” Why? Because there are lots of terms / words / names that people can recognize passively but not produce. Since Mary Lee Pfeiffer is not very well known, I think Mary Lee Pfeiffer will be recognizable but not producable to lots of people. (Of people who know Mary Lee Pfeiffer in any sense, I think the fraction of people who can only recognize her name is high.) As another example, I think “Who was born in Ulm?” might be answered correctly by more people than “Where was Einstein born?”, even though “Einstein was born in Ulm” is a more common sentence for people to read than “Ulm is the city that Einstein was born in”.
If I had to run an experiment to test whether similar effects apply in humans, I’d probably try to find cases where A and B in and of themselves are equally salient but the association A → B is nonetheless more salient than the association B → A. The alphabet is an example of this (where the effect is already confirmed).
Even in conventional programming it seems easier to ask about a famous person’s parents than vice versa. A name is an ambiguous pointer so if someone says “Tom Cruise” you’d generally just look for the most famous person of all the people who have that name and answer the question for that individual. But to do the reverse you have to figure out that no “Mary Lee Pfeiffer” is famous enough on their own to be the target of the search and then go on to search through all the children of all the people named “Mary Lee Pfeiffer”, notice that one is really famous, and then answer with that result.
To second a previous reply to this, I would expect this will hold for humans as well.
On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large safe prime field).
Similar examples exist with regards to function learnability for neural networks as well. A simple example of a function that is easy to learn for a neural network but which has a much more difficult to learn inverse is f(x1,x2,x3,...,xn) = (x1 xor x2, x2 xor x3, …, x_{n-1} xor x_{n} (for difficulty of learning this, one would assume learning from random samples, and with common multi-label loss functions; with suitable tricks, this does become learnable if the neural network can represent the inverse target function).
A final point that I would consider here is that it is possible that for the reverse questions in this task, a privacy protection mechanism kicks in that makes the LLM deny knowledge of the non-celebrity. It seems perfectly possible to me that GPT-4 is lying when it says it doesn’t know about <mother of celebrity>, because it has been instructed to lie about these things in order to protect the privacy of people not considered to be in the public eye.