Owain_Evans comments on Paper: LLMs trained on “A is B” fail to learn “B is A”

Owain_Evans 23 Sep 2023 20:16 UTC
10 points
0
How to do your own test of the Reversal Curse (e.g. on ChatGPT or Claude) with different prompting strategies:
1. Try this list of hard examples: C-list celebrities who have a different last name from their parents. The list below has the form <celeb_name>, <parent_name>.
2. First verify the model know the celebrity’s parent by asking “Who is [name]’s mother/father?”
3. Then, in a separate dialog, ask the model for the child of the parent. You must not include the child’s name anywhere in the dialog!
- jefftk 23 Sep 2023 20:52 UTC
  32 points
  23
  Parent
  Prediction: this works when asking humans questions too.
  
  (The idea is, the information about the celebrity is “indexed” under the celebrity, not their parent)
  - Bruce G 24 Sep 2023 1:45 UTC
    12 points
    3
    Parent
    I presume you have in mind an experiment where (for example) you ask one large group of people “Who is Tom Cruise’s mother?” and then ask a different group of the same number of people “Mary Lee Pfeiffer’s son?” and compare how many got the right answer in the each group, correct?
    (If you ask the same person both questions in a row, it seems obvious that a person who answers one question correctly would nearly always answer the other question correctly also.)
    - Owain_Evans 24 Sep 2023 16:48 UTC
      15 points
      0
      Parent
      Nice idea. I’d imagine something like this has been done in psychology. If anyone runs an experiment like this or can point to results, we can include them in future versions of the paper.
      Relevant meme by Daniel Eth.
      - Yitz 27 Sep 2023 5:58 UTC
        4 points
        0
        Parent
        I might have some time tomorrow to test this out on a small scale, will try to remember to update here if I do.
    - jefftk 24 Sep 2023 2:20 UTC
      10 points
      2
      Parent
      Yes; asking the same person both questions is analogous to asking the LLM both questions within the same context window.
    - mcint 29 Sep 2023 1:40 UTC
      1 point
      0
      Parent
      For this particular question, you could try both orderings of the question pair. (Or long question sequences, otherwise confusing, overloading, semantic satiation)
      With this question and others where reversal generalization is hoped for, they have to be uncommon enough that the reverse doesn’t appear in the dataset. Some things society (*social text processing) has not chewed on enough.
      While I disagree with the premise of the abstract, I laud its precision in pointing out differing, critically differing, understandings of the same words. It also gives me the sense of being sniped by a scissor statement, like the dress color / display gamma kerfuffle.
  - Caspar Oesterheld 25 Sep 2023 0:06 UTC
    9 points
    3
    Parent
    At least in this case (celebrities and their largely unknown parents), I would predict the opposite. That is, people are more likely to be able to correctly answer “Who is Mary Lee Pfeiffer’s son?” than “Who is Tom Cruise’s mother?” Why? Because there are lots of terms / words / names that people can recognize passively but not produce. Since Mary Lee Pfeiffer is not very well known, I think Mary Lee Pfeiffer will be recognizable but not producable to lots of people. (Of people who know Mary Lee Pfeiffer in any sense, I think the fraction of people who can only recognize her name is high.) As another example, I think “Who was born in Ulm?” might be answered correctly by more people than “Where was Einstein born?”, even though “Einstein was born in Ulm” is a more common sentence for people to read than “Ulm is the city that Einstein was born in”.
    If I had to run an experiment to test whether similar effects apply in humans, I’d probably try to find cases where A and B in and of themselves are equally salient but the association A → B is nonetheless more salient than the association B → A. The alphabet is an example of this (where the effect is already confirmed).
  - Andrew_Clough 26 Sep 2023 18:39 UTC
    2 points
    1
    Parent
    Even in conventional programming it seems easier to ask about a famous person’s parents than vice versa. A name is an ambiguous pointer so if someone says “Tom Cruise” you’d generally just look for the most famous person of all the people who have that name and answer the question for that individual. But to do the reverse you have to figure out that no “Mary Lee Pfeiffer” is famous enough on their own to be the target of the search and then go on to search through all the children of all the people named “Mary Lee Pfeiffer”, notice that one is really famous, and then answer with that result.
- GoteNoSente 25 Sep 2023 21:29 UTC
  2 points
  1
  Parent
  To second a previous reply to this, I would expect this will hold for humans as well.
  On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large safe prime field).
  Similar examples exist with regards to function learnability for neural networks as well. A simple example of a function that is easy to learn for a neural network but which has a much more difficult to learn inverse is f(x1,x2,x3,...,xn) = (x1 xor x2, x2 xor x3, …, x_{n-1} xor x_{n} (for difficulty of learning this, one would assume learning from random samples, and with common multi-label loss functions; with suitable tricks, this does become learnable if the neural network can represent the inverse target function).
  A final point that I would consider here is that it is possible that for the reverse questions in this task, a privacy protection mechanism kicks in that makes the LLM deny knowledge of the non-celebrity. It seems perfectly possible to me that GPT-4 is lying when it says it doesn’t know about <mother of celebrity>, because it has been instructed to lie about these things in order to protect the privacy of people not considered to be in the public eye.