mishka comments on The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

mishka 24 Feb 2023 1:20 UTC
4 points
1
I think the state is encoded in activations. There is a paper which explains that although Transformers are feed-forward transducers, in the autoregressive mode they do emulate RNNs:
Section 3.4 of “Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”, https://arxiv.org/abs/2006.16236
So, the set of current activations encodes the hidden state of that “virtual RNN”.
This might be relevant to some of the discussion threads here...
- Bill Benzon 24 Feb 2023 2:20 UTC
  1 point
  0
  Parent
  Thanks.