Lone Pine answers Why no major LLMs with memory?

Lone Pine 28 Mar 2023 18:11 UTC
10 points
0
There is an architecture called RWKV which claims to have an ‘infinite’ context window (since it is similar to an RNN). It claims to be competitive with GPT-3. I have no idea whether this is worth taking seriously or not.
- abhayesian 28 Mar 2023 22:15 UTC
  8 points
  0
  Parent
  I don’t think it’s fair for them to claim that the model has an infinite context length. It appears that they can train the model as a transformer, but can turn the model into an RNN at inference time. While the RNN doesn’t have a context length limit as the transformer does, I doubt it will perform well on contexts longer than it has seen during training. There may also be limits to how much information can be stored in the hidden state, such that the model has a shorter effective context length than current SOTA LLMs.
- bvbvbvbvbvbvbvbvbvbvbv 29 Mar 2023 8:00 UTC
  3 points
  0
  Parent
  Two links related to RWKV to know more :
  
  https://johanwind.github.io/2023/03/23/rwkv_overview.html
  
  https://johanwind.github.io/2023/03/23/rwkv_details.html