Yup exactly! One way I sometimes find it to helpful to classify systems in terms of the free variables upstream of loss that are optimized during training. In the case of gpt, internal activations are causally upstream of loss for “future” predictions in the same context window, but the output itself is not casually upstream from any effect on loss other than through myopic prediction accuracy (at any one training step) - the ground truth is fixed w/r/t the model’s actions, and autoregressive generation isn’t part of the training game at all.
Yup exactly! One way I sometimes find it to helpful to classify systems in terms of the free variables upstream of loss that are optimized during training. In the case of gpt, internal activations are causally upstream of loss for “future” predictions in the same context window, but the output itself is not casually upstream from any effect on loss other than through myopic prediction accuracy (at any one training step) - the ground truth is fixed w/r/t the model’s actions, and autoregressive generation isn’t part of the training game at all.