gwern comments on By Default, GPTs Think In Plain Sight

gwern 19 Nov 2022 23:39 UTC
34 points
27

A prompt including these two incorrect reasoning example can have better performance over some human engineered CoT prompts.

Yes. An even better example would be a different paper showing that you can simply shuffle all the answers in the few-shot prompt to improve over zero-shot. This was tweeted with the snark ‘how is few-shot learning “meta-learning” if the answers don’t need to be right?’ But of course, there’s no problem with that. All meta-learning is about is about solving the problem, not about respecting your preconceived notions of how the model ‘ought’ to compute. (Many optimal strategies, such as exact ones obtained by dynamic programming, can look bizarre to humans, so that’s not a good criteria.)

This is because contrary to OP, a GPT model is not doing something as simple as reasoning explicitly through text or System I pattern-matching. (Or at least, if it is, the meaning of those two things are much broader and more opaque than one would think and so merely replaces an enigma with a mystery.) A GPT model doing few-shot learning is doing meta-learning: it is solving a (very large) family of related tasks using informative priors to efficiently and Bayes-optimally infer the latent variables of each specific problem (such as the agent involved, competence, spelling ability etc) in order to minimize predictive loss. Including example questions with the wrong answers is still useful if it helps narrow down the uncertainty and elicit the right final answer. There are many properties of any piece of text which go beyond merely being ‘the right answer’, such as the a priori range of numbers or the formatting of the answer or the average length or… These can be more useful than mere correctness. Just as in real life, when you see a confusing question and then see an incorrect answer, that can often be very enlightening as to the question asker’s expectations & assumptions, and you then know how to answer it. (‘Bad’ prompts may also just stochastically happen to tickle a given model in a way that favors the ‘right’ family of tasks—think adversarial examples but in a more benign ‘machine teaching’ way.)

Prompts are not ‘right’ or ‘wrong’; they are programs, which only have meaning with respect to the larger family of prompts/tasks which are the context in which they are interpreted by the neural computer. Since you still don’t know what that computer is or how the prompt is being interpreted, you can’t say that the baseline model is reasoning how you think it is based on a naive human-style reading of the prompt. It obviously is not reasoning like that! RLHF will make it worse, but the default GPT behavior given a prompt is already opaque and inscrutable and alien.

(If you don’t like the meta-learning perspective, then I would point out the literature on LMs ‘cheating’ by finding various dataset biases and shortcuts to achieve high scores, often effectively solving tasks, while not learning the abilities that one assumed was necessary to solve those tasks. They look like they are reasoning, they get the right answer… and then it turns out you can, say, remove the input and still get high scores, or something like that.)
What links here?
- [ASoT] Simulators show us behavioural properties by default by Jozdien (13 Jan 2023 18:42 UTC; 35 points)
- Gliders in Language Models by Alexandre Variengien (25 Nov 2022 0:38 UTC; 30 points)
- Fabien Roger 20 Nov 2022 18:18 UTC
  2 points
  −1
  Parent
  What’s in your view the difference between GPTs and the brain? Isn’t the brain also doing meta-learning when you “sample your next thought”? I never said System 1 was only doing pattern matching. System 1 can definitely do very complex things (for example, in real time strategy game, great players often rely only on System 1 to take strategic decisions). I’m pretty sure your System 1 is solving a (very large) family of related tasks using informative priors to efficiently and Bayes-optimally infer the latent variables of each specific problem (but you’re only aware of what gets sampled). Still, System 1 is limited by the number of serial steps, which is why I think our prior on what System 1 can do should put a very low weight on “it simulates an agent which reasons from first principles that it should take control of the future and finds a good plan to do so”.
  If your main point of disagreement is “GPT is using different information in the next than humans” because it has been found that GPT used information humans can’t use, I would like to have a clear example of that. The one you give doesn’t seem that clear-cut: it would have to be true that human do worse when they are given examples of reasoning in which answers are swapped (and no other context about what they should do), which doesn’t feel obvious. Humans put some context clues they are not consciously aware of in text they generate, but that doesn’t mean that they can’t use them.
  - Fabien Roger 20 Nov 2022 18:41 UTC
    0 points
    −3
    Parent
    Btw, this framing is consistent with the fact that humans have personalities because they are “tuned with RL”: they experienced some kind of mode collapse very similar to the one seen in Instruct GPT, which lead to certain phrasing and thoughts to get reinforced. Human personality depends on how you have been raised, and is a bit random, like mode collapse. (But it’s postdiction, so not worth many Bayes points.)
- LawrenceC 20 Nov 2022 4:38 UTC
  1 point
  0
  Parent
  I broadly agree, though I haven’t thought enough to be certain in either view.
  Yes. An even better example would be a different paper showing that you can simply shuffle all the answers in the few-shot prompt to improve over zero-shot.
  Yeah, I thought about this result too, though I couldn’t find it quickly enough to reference it.