I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of “correct” answers. Then the probability that an answer of length n is correct is (1-e)^n
Errors accumulate. The proba of correctness decreases exponentially. One can mitigate the problem by making e smaller (through training) but one simply cannot eliminate the problem entirely. A solution would require to make LLMs non auto-regressive while preserving their fluency.
This seems to be related to the “curse of behaviour cloning”. Learning to behave correctly only from a dataset of correct behaviour doesn’t work, you need examples in your dataset of how to correct wrong behaviour. As an example, if you try to make chatGPT play chess, at some point it will make a nonsensical move, or if you make it do chess analysis it will mistakenly claim that something happened, and thereafter it will condition its output on that wrong claim! It doesn’t go “ah yes 2 prompts ago I made a mistake due to random sampling”, that’s not the sort of thing that’s in the dataset, it just goes with it, and the text it generates drifts further and further away from its training distribution.
All the chess books it was trained on did not contain mistakes, and it perpetually believes that its prompt was sampled from the human distribution of chess analysis writing, where in fact it was sampled from chatGPT’s distribution.
So basically autoregressive language models are fundamentally incapable of very-long-form correct thinking, because as soon as they make a mistake (and they do make mistakes with probability (1-e)^n), they will condition their future output on something false, which will make them produce yet more mistakes and spiral out of control into incoherency.
I observe this behavior a lot when using GPT-4 to assist in code. The moment it starts spitting out code that has a bug, the likelihood of future code snippets having bugs grows very quickly.
Sometimes it seems that humans do it, too. For example, when I make a typo, it is quite likely that I made another typo in the same paragraph.
(Alternative explanation: I am more likely to make mistakes when I am e.g. tired, so having made a mistake is evidence for being tired, which increases the chance of other mistakes being made.)
((On the other hand, there may be a similar explanation for the GPT, too.))
But you don’t condition your future output on your typo being correct, that’s what gpt is doing here. If it randomly makes a mistake that the text in its dataset wouldn’t contain, like mistakenly saying that a queen was captured, or it takes a mistaken step during a physics computation, when it thereafter tries to predict the next word, it still “thinks” that its past output is sampled from the distribution of human-chess-analysis or human-physics-problem-solving. On the human distribution if “the queen was captured” exists in the past prompt, then you can take it as fact that the queen was captured, but this is false for text sampled from LLMs.
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You’d need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like “oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...”.
It doesn’t work (at least right now), when I tried making chatGPT play chess against stockfish by giving it positions in algebraic notation and telling it to output 2 paragraphs of chess analysis before making its move, it would make a nonsensical move, and if I prompted it with “is there a mistake in the past output?” Or “are you sure this move is legal?”, It doesn’t realize that anything is out of order. Only once I point out the error explicitly can it realise that it made one and rationalize an explanation for the error.
That is novel (and, in my opinion potentially important/scary) capability of GPT4. You can look at A_Posthuman comment below for details. I do expect it to work on chess, be interested if proven wrong. You mentioned chatGPT but it can’t do reflection on usable level. To be fair I don’t know if GPT4 capabilities are on useful level/only tweak away right now, and how far they can be pushed if they are (as in if it can self-improve to ASI), but for solving “curse” problem even weak reflection capabilities should suffice.
I’ve not noticed this but it’d be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.
This argument proves too much in the sense that its generalization is simply a standard argument of why exact prediction of future sequences is difficult (exponentially diverging).
The solutions (which humans use) are fairly straightforward to apply to LLMs: 1.) we don’t condition only on our own predictions, we update on observations. (For LLMs this amounts to using react style prompting where the LLM’s outputs are always balanced interspersed with observations from the world and/or inputs from humans). 2.) We use approximate abstract future prediction/planning, which LLMs are also amenable to.
Yes it is. There is no freely available dataset of conveniently labelled LLM errors and their correct continuations. You need human labels to identify the errors, and you need an amount of them on the order of your training set, which here is the entire internet.
“Fundamentally incapable” is perhaps putting things too strongly, when you can see from the Reflexion paper and other recent work in the past 2 weeks that humans are figuring out how to work around this issue via things like reflection/iterative prompting:
Using this simple approach lets GPT-4 jump from 67% to 88% correct on the HumanEval benchmark.
So I believe the lesson is: “limitations” in LLMs may turn out to be fairly easily enhanced away by clever human helpers. Therefore IMO, whether or not a particular LLM should be considered dangerous must also take into account the likely ways humans will build additional tech onto/around it to enhance it.
After some more scouring of his twitter page, I actually found an argument for pessimism of LLMs that I agree with !!! (hallelujah)
This seems to be related to the “curse of behaviour cloning”. Learning to behave correctly only from a dataset of correct behaviour doesn’t work, you need examples in your dataset of how to correct wrong behaviour. As an example, if you try to make chatGPT play chess, at some point it will make a nonsensical move, or if you make it do chess analysis it will mistakenly claim that something happened, and thereafter it will condition its output on that wrong claim! It doesn’t go “ah yes 2 prompts ago I made a mistake due to random sampling”, that’s not the sort of thing that’s in the dataset, it just goes with it, and the text it generates drifts further and further away from its training distribution.
All the chess books it was trained on did not contain mistakes, and it perpetually believes that its prompt was sampled from the human distribution of chess analysis writing, where in fact it was sampled from chatGPT’s distribution.
So basically autoregressive language models are fundamentally incapable of very-long-form correct thinking, because as soon as they make a mistake (and they do make mistakes with probability (1-e)^n), they will condition their future output on something false, which will make them produce yet more mistakes and spiral out of control into incoherency.
I observe this behavior a lot when using GPT-4 to assist in code. The moment it starts spitting out code that has a bug, the likelihood of future code snippets having bugs grows very quickly.
Sometimes it seems that humans do it, too. For example, when I make a typo, it is quite likely that I made another typo in the same paragraph.
(Alternative explanation: I am more likely to make mistakes when I am e.g. tired, so having made a mistake is evidence for being tired, which increases the chance of other mistakes being made.)
((On the other hand, there may be a similar explanation for the GPT, too.))
But you don’t condition your future output on your typo being correct, that’s what gpt is doing here. If it randomly makes a mistake that the text in its dataset wouldn’t contain, like mistakenly saying that a queen was captured, or it takes a mistaken step during a physics computation, when it thereafter tries to predict the next word, it still “thinks” that its past output is sampled from the distribution of human-chess-analysis or human-physics-problem-solving. On the human distribution if “the queen was captured” exists in the past prompt, then you can take it as fact that the queen was captured, but this is false for text sampled from LLMs.
To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. You’d need to take all physics books ever written, intersperse them with LLM continuations, then have humans write the corrections to the continuations, like “oh, actually we made a mistake in the last paragraph, here is the correct way to relate pressure to temperature in this problem...”.
Don’t have to be humans any more, GTP4 can do this to itself.
It doesn’t work (at least right now), when I tried making chatGPT play chess against stockfish by giving it positions in algebraic notation and telling it to output 2 paragraphs of chess analysis before making its move, it would make a nonsensical move, and if I prompted it with “is there a mistake in the past output?” Or “are you sure this move is legal?”, It doesn’t realize that anything is out of order. Only once I point out the error explicitly can it realise that it made one and rationalize an explanation for the error.
That is novel (and, in my opinion potentially important/scary) capability of GPT4. You can look at A_Posthuman comment below for details. I do expect it to work on chess, be interested if proven wrong. You mentioned chatGPT but it can’t do reflection on usable level. To be fair I don’t know if GPT4 capabilities are on useful level/only tweak away right now, and how far they can be pushed if they are (as in if it can self-improve to ASI), but for solving “curse” problem even weak reflection capabilities should suffice.
I’ve not noticed this but it’d be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.
This argument proves too much in the sense that its generalization is simply a standard argument of why exact prediction of future sequences is difficult (exponentially diverging).
The solutions (which humans use) are fairly straightforward to apply to LLMs: 1.) we don’t condition only on our own predictions, we update on observations. (For LLMs this amounts to using react style prompting where the LLM’s outputs are always balanced interspersed with observations from the world and/or inputs from humans). 2.) We use approximate abstract future prediction/planning, which LLMs are also amenable to.
Yes, but training AI to try to fix errors is not that hard.
Yes it is. There is no freely available dataset of conveniently labelled LLM errors and their correct continuations. You need human labels to identify the errors, and you need an amount of them on the order of your training set, which here is the entire internet.
“Fundamentally incapable” is perhaps putting things too strongly, when you can see from the Reflexion paper and other recent work in the past 2 weeks that humans are figuring out how to work around this issue via things like reflection/iterative prompting:
https://nanothoughts.substack.com/p/reflecting-on-reflexion
https://arxiv.org/abs/2303.11366
Using this simple approach lets GPT-4 jump from 67% to 88% correct on the HumanEval benchmark.
So I believe the lesson is: “limitations” in LLMs may turn out to be fairly easily enhanced away by clever human helpers. Therefore IMO, whether or not a particular LLM should be considered dangerous must also take into account the likely ways humans will build additional tech onto/around it to enhance it.