I observe this behavior a lot when using GPT-4 to assist in code. The moment it starts spitting out code that has a bug, the likelihood of future code snippets having bugs grows very quickly.
Sometimes it seems that humans do it, too. For example, when I make a typo, it is quite likely that I made another typo in the same paragraph.
(Alternative explanation: I am more likely to make mistakes when I am e.g. tired, so having made a mistake is evidence for being tired, which increases the chance of other mistakes being made.)
((On the other hand, there may be a similar explanation for the GPT, too.))
But you don’t condition your future output on your typo being correct, that’s what gpt is doing here. If it randomly makes a mistake that the text in its dataset wouldn’t contain, like mistakenly saying that a queen was captured, or it takes a mistaken step during a physics computation, when it thereafter tries to predict the next word, it still “thinks” that its past output is sampled from the distribution of human-chess-analysis or human-physics-problem-solving. On the human distribution if “the queen was captured” exists in the past prompt, then you can take it as fact that the queen was captured, but this is false for text sampled from LLMs.
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You’d need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like “oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...”.
It doesn’t work (at least right now), when I tried making chatGPT play chess against stockfish by giving it positions in algebraic notation and telling it to output 2 paragraphs of chess analysis before making its move, it would make a nonsensical move, and if I prompted it with “is there a mistake in the past output?” Or “are you sure this move is legal?”, It doesn’t realize that anything is out of order. Only once I point out the error explicitly can it realise that it made one and rationalize an explanation for the error.
That is novel (and, in my opinion potentially important/scary) capability of GPT4. You can look at A_Posthuman comment below for details. I do expect it to work on chess, be interested if proven wrong. You mentioned chatGPT but it can’t do reflection on usable level. To be fair I don’t know if GPT4 capabilities are on useful level/only tweak away right now, and how far they can be pushed if they are (as in if it can self-improve to ASI), but for solving “curse” problem even weak reflection capabilities should suffice.
I’ve not noticed this but it’d be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.
I observe this behavior a lot when using GPT-4 to assist in code. The moment it starts spitting out code that has a bug, the likelihood of future code snippets having bugs grows very quickly.
Sometimes it seems that humans do it, too. For example, when I make a typo, it is quite likely that I made another typo in the same paragraph.
(Alternative explanation: I am more likely to make mistakes when I am e.g. tired, so having made a mistake is evidence for being tired, which increases the chance of other mistakes being made.)
((On the other hand, there may be a similar explanation for the GPT, too.))
But you don’t condition your future output on your typo being correct, that’s what gpt is doing here. If it randomly makes a mistake that the text in its dataset wouldn’t contain, like mistakenly saying that a queen was captured, or it takes a mistaken step during a physics computation, when it thereafter tries to predict the next word, it still “thinks” that its past output is sampled from the distribution of human-chess-analysis or human-physics-problem-solving. On the human distribution if “the queen was captured” exists in the past prompt, then you can take it as fact that the queen was captured, but this is false for text sampled from LLMs.
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You’d need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like “oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...”.
Don’t have to be humans any more, GTP4 can do this to itself.
It doesn’t work (at least right now), when I tried making chatGPT play chess against stockfish by giving it positions in algebraic notation and telling it to output 2 paragraphs of chess analysis before making its move, it would make a nonsensical move, and if I prompted it with “is there a mistake in the past output?” Or “are you sure this move is legal?”, It doesn’t realize that anything is out of order. Only once I point out the error explicitly can it realise that it made one and rationalize an explanation for the error.
That is novel (and, in my opinion potentially important/scary) capability of GPT4. You can look at A_Posthuman comment below for details. I do expect it to work on chess, be interested if proven wrong. You mentioned chatGPT but it can’t do reflection on usable level. To be fair I don’t know if GPT4 capabilities are on useful level/only tweak away right now, and how far they can be pushed if they are (as in if it can self-improve to ASI), but for solving “curse” problem even weak reflection capabilities should suffice.
I’ve not noticed this but it’d be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.