That seems unlikely, even if we stipulate that the full context is always available (because these are somehow always very short conversations), because the positional encodings will be different (by definition), the global nature of self-attention will almost certainly cause influence from the now-known later tokens when interpreting earlier in the context because that is such a universally useful thing to do, and the generated tokens were the result of the full forward pass at the final layer while to ‘reconstruct’ it so ‘those thoughts can affect its answers’ the full computation would have to be run completely within penultimate layers to make even the simplest use of the full reconstruction (so any ‘difficult’ thoughts would be hard to impossible to reconstruct before time/layers run out and the token has to be predicted by the ultimate layer). All of these mean it would be extremely difficult for it to simply ‘replay’ the original forward pass inside the new forward pass. (Also, if you think it really is literally replaying the exact computation when it explains a past computation, then what happens when you ask it to explain the explanation...? Does it replay the replay of the play?)
Imagine trying to reconstruct your own thoughts as you wrote something letter by letter, within a few seconds when reading an essay or program you wrote even an hour ago with the benefit of short-term memory, as you look at it as a whole trying to figure it out. You are definitely not simply replaying the thoughts you had back then as you figured it out, and you may not even be able to understand a summary of what you were thinking (‘why did I write that?!’) without taking quite a lot of time and going back and forth and laboriously reconstructing your reasoning process step by step. (Nor would you generally want to if you could, since that means repeating all of the mistakes and dead-ends.)
That seems unlikely, even if we stipulate that the full context is always available (because these are somehow always very short conversations), because the positional encodings will be different (by definition), the global nature of self-attention will almost certainly cause influence from the now-known later tokens when interpreting earlier in the context because that is such a universally useful thing to do, and the generated tokens were the result of the full forward pass at the final layer while to ‘reconstruct’ it so ‘those thoughts can affect its answers’ the full computation would have to be run completely within penultimate layers to make even the simplest use of the full reconstruction (so any ‘difficult’ thoughts would be hard to impossible to reconstruct before time/layers run out and the token has to be predicted by the ultimate layer). All of these mean it would be extremely difficult for it to simply ‘replay’ the original forward pass inside the new forward pass. (Also, if you think it really is literally replaying the exact computation when it explains a past computation, then what happens when you ask it to explain the explanation...? Does it replay the replay of the play?)
Imagine trying to reconstruct your own thoughts as you wrote something letter by letter, within a few seconds when reading an essay or program you wrote even an hour ago with the benefit of short-term memory, as you look at it as a whole trying to figure it out. You are definitely not simply replaying the thoughts you had back then as you figured it out, and you may not even be able to understand a summary of what you were thinking (‘why did I write that?!’) without taking quite a lot of time and going back and forth and laboriously reconstructing your reasoning process step by step. (Nor would you generally want to if you could, since that means repeating all of the mistakes and dead-ends.)
Hmm upon reflection I might just have misremembered how GPTs work.