How can I be sure of this? Based on how the API works, we know that a chatbot has no short-term memory.[2] The only thing a chatbot remembers is what it wrote. It forgets its thought process immediately.
Isn’t this false? I thought the chatbot is deterministic (other than sampling tokens from the final output probabilities) and that every layer of the transformer is autoregressive, so if you add a question to an existing conversation, it will reconstruct its thoughts underlying the initial conversation, and then allow those thoughts to affect its answers to your followup questions.
That seems unlikely, even if we stipulate that the full context is always available (because these are somehow always very short conversations), because the positional encodings will be different (by definition), the global nature of self-attention will almost certainly cause influence from the now-known later tokens when interpreting earlier in the context because that is such a universally useful thing to do, and the generated tokens were the result of the full forward pass at the final layer while to ‘reconstruct’ it so ‘those thoughts can affect its answers’ the full computation would have to be run completely within penultimate layers to make even the simplest use of the full reconstruction (so any ‘difficult’ thoughts would be hard to impossible to reconstruct before time/layers run out and the token has to be predicted by the ultimate layer). All of these mean it would be extremely difficult for it to simply ‘replay’ the original forward pass inside the new forward pass. (Also, if you think it really is literally replaying the exact computation when it explains a past computation, then what happens when you ask it to explain the explanation...? Does it replay the replay of the play?)
Imagine trying to reconstruct your own thoughts as you wrote something letter by letter, within a few seconds when reading an essay or program you wrote even an hour ago with the benefit of short-term memory, as you look at it as a whole trying to figure it out. You are definitely not simply replaying the thoughts you had back then as you figured it out, and you may not even be able to understand a summary of what you were thinking (‘why did I write that?!’) without taking quite a lot of time and going back and forth and laboriously reconstructing your reasoning process step by step. (Nor would you generally want to if you could, since that means repeating all of the mistakes and dead-ends.)
That’s a good question! I don’t know but I suppose it’s possible, at least when the input fits in the context window. How well it actually does at this seems like a question for researchers?
There’s also a question of why it would do it when the training doesn’t have any way of rewarding accurate explanations over human-like explanations. We also have many examples of explanations that don’t make sense.
There are going to be deductions about previous text that are generally useful, though, and would need to be reconstructed. This will be true even if the chatbot didn’t write the text in the first place (it doesn’t know either way). The deductions couldn’t be constructing the original thought process, though, when the chatbot didn’t write the text.
So I think this points to a weakness in my explanation that I should look into, though it’s likely still true that it confabulates explanations.
Isn’t this false? I thought the chatbot is deterministic (other than sampling tokens from the final output probabilities) and that every layer of the transformer is autoregressive, so if you add a question to an existing conversation, it will reconstruct its thoughts underlying the initial conversation, and then allow those thoughts to affect its answers to your followup questions.
That seems unlikely, even if we stipulate that the full context is always available (because these are somehow always very short conversations), because the positional encodings will be different (by definition), the global nature of self-attention will almost certainly cause influence from the now-known later tokens when interpreting earlier in the context because that is such a universally useful thing to do, and the generated tokens were the result of the full forward pass at the final layer while to ‘reconstruct’ it so ‘those thoughts can affect its answers’ the full computation would have to be run completely within penultimate layers to make even the simplest use of the full reconstruction (so any ‘difficult’ thoughts would be hard to impossible to reconstruct before time/layers run out and the token has to be predicted by the ultimate layer). All of these mean it would be extremely difficult for it to simply ‘replay’ the original forward pass inside the new forward pass. (Also, if you think it really is literally replaying the exact computation when it explains a past computation, then what happens when you ask it to explain the explanation...? Does it replay the replay of the play?)
Imagine trying to reconstruct your own thoughts as you wrote something letter by letter, within a few seconds when reading an essay or program you wrote even an hour ago with the benefit of short-term memory, as you look at it as a whole trying to figure it out. You are definitely not simply replaying the thoughts you had back then as you figured it out, and you may not even be able to understand a summary of what you were thinking (‘why did I write that?!’) without taking quite a lot of time and going back and forth and laboriously reconstructing your reasoning process step by step. (Nor would you generally want to if you could, since that means repeating all of the mistakes and dead-ends.)
Hmm upon reflection I might just have misremembered how GPTs work.
That’s a good question! I don’t know but I suppose it’s possible, at least when the input fits in the context window. How well it actually does at this seems like a question for researchers?
There’s also a question of why it would do it when the training doesn’t have any way of rewarding accurate explanations over human-like explanations. We also have many examples of explanations that don’t make sense.
There are going to be deductions about previous text that are generally useful, though, and would need to be reconstructed. This will be true even if the chatbot didn’t write the text in the first place (it doesn’t know either way). The deductions couldn’t be constructing the original thought process, though, when the chatbot didn’t write the text.
So I think this points to a weakness in my explanation that I should look into, though it’s likely still true that it confabulates explanations.
I agree that it likely confabulates explanations.