abhayesian answers Why no major LLMs with memory?

abhayesian 28 Mar 2023 22:27 UTC
4 points
0
One thing that comes to mind is DeepMind’s Adaptive Agents team using Transformer-XL, which can attend to data outside the current context window. I think there was speculation that GPT-4 may also be a Transformer-XL, but I’m not sure how to verify that.
- Oliver Daniels 29 Mar 2023 15:53 UTC
  1 point
  0
  Parent
  Briefly read a Chat-GPT description of Transformer-XL—is this essentially long term memory? Are there computations an LSTM could do that a Transformer-XL couldn’t?
  - abhayesian 29 Mar 2023 19:25 UTC
    2 points
    0
    Parent
    There is still technically a limit to how far back a Transformer-XL can see since each layer can only attend to previous keys/values computed by that layer. As a result, the receptive field of layer L can only be as wide as the last L context windows. I guess this means that there might be some things that LSTMs can do that Transformer-XL can’t, but this can be fixed with a couple of minor modifications to Transformer-XL. For example, this paper fixes the problem by allowing layers to attend to the outputs of later layers from previous context windows, which should make the receptive field (at least theoretically) infinitely long, meaning it should probably be able to do everything an LSTM can.