It’s not that odd. Ars Technica has a good article on why generative AIs have such a strong tendency to confabulate. The short answer is that, given a prompt (consisting of tokens, which are similar to, but not quite the same as words), GPT will come up with new tokens that are more or less likely to come after the given tokens in the prompt. This is subject to a temperature parameter, which dictates how “creative” GPT is allowed to be (i.e. allowing GPT to pick less probable next-tokens with some probability). The output token is added to the prompt, and the whole thing is then fed back into GPT in order to generate another new token.
In other words, GPT is incapable of “going backwards”, as a human might and editing its previous output in to correct inaccuracies or inconsistencies. Instead, what it has to do is take the previous output as a given, and try to come up with new tokens that are likely to be generated given the already generated incorrect tokens. This is how GPT ends up with confabulated citations. Given the prompt, GPT generates some tokens, representing an author, for example. It then tries to generate the most likely words associated with that author, and the rest of the prompt, which is presumably asking for citations. As it generates a title, it chooses a word that doesn’t exist in any existing article titles written by that author. But it doesn’t “know” that, and it has no way of going back and editing prior output in order to correct itself. Instead GPT presses on, generating more tokens that are deemed to be likely given the mixture of correct and incorrect tokens that it has generated.
Scott Alexander has a great post, about human psychology, which touches on a similar theme, called The Apologist and the Revolutionary. Using the terms of that post, a GPT is 100% apologist, 0% revolutionary. No matter how nonsensical its previous output, GPT, by its very design, must take that previous output as axiomatic, and generate new output based upon that. That is what leads to uncanny results when GPT is asked for specific facts.
It’s not that odd. Ars Technica has a good article on why generative AIs have such a strong tendency to confabulate. The short answer is that, given a prompt (consisting of tokens, which are similar to, but not quite the same as words), GPT will come up with new tokens that are more or less likely to come after the given tokens in the prompt. This is subject to a temperature parameter, which dictates how “creative” GPT is allowed to be (i.e. allowing GPT to pick less probable next-tokens with some probability). The output token is added to the prompt, and the whole thing is then fed back into GPT in order to generate another new token.
In other words, GPT is incapable of “going backwards”, as a human might and editing its previous output in to correct inaccuracies or inconsistencies. Instead, what it has to do is take the previous output as a given, and try to come up with new tokens that are likely to be generated given the already generated incorrect tokens. This is how GPT ends up with confabulated citations. Given the prompt, GPT generates some tokens, representing an author, for example. It then tries to generate the most likely words associated with that author, and the rest of the prompt, which is presumably asking for citations. As it generates a title, it chooses a word that doesn’t exist in any existing article titles written by that author. But it doesn’t “know” that, and it has no way of going back and editing prior output in order to correct itself. Instead GPT presses on, generating more tokens that are deemed to be likely given the mixture of correct and incorrect tokens that it has generated.
Scott Alexander has a great post, about human psychology, which touches on a similar theme, called The Apologist and the Revolutionary. Using the terms of that post, a GPT is 100% apologist, 0% revolutionary. No matter how nonsensical its previous output, GPT, by its very design, must take that previous output as axiomatic, and generate new output based upon that. That is what leads to uncanny results when GPT is asked for specific facts.
Thanks! This was a very helpful comment for me.