Recent long-context LLMs seem to exhibit scaling laws from longer contexts—e.g. fig. 6 at page 8 in Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, fig. 1 at page 1 in Effective Long-Context Scaling of Foundation Models.
The long contexts also seem very helpful for in-context learning, e.g. Many-Shot In-Context Learning.
This seems differentially good for safety (e.g. vs. models with larger forward passes but shorter context windows to achieve the same perplexity), since longer context and in-context learning are differentially transparent.
Recent long-context LLMs seem to exhibit scaling laws from longer contexts—e.g. fig. 6 at page 8 in Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, fig. 1 at page 1 in Effective Long-Context Scaling of Foundation Models.
The long contexts also seem very helpful for in-context learning, e.g. Many-Shot In-Context Learning.
This seems differentially good for safety (e.g. vs. models with larger forward passes but shorter context windows to achieve the same perplexity), since longer context and in-context learning are differentially transparent.