Worth noting that LLMs are no longer using quadratic context window scaling. See e.g. Claude-Long. Seems they’ve figured out how to make it ~linear. Looking at GPT-4 with a 32K context window option for corporate clients, seems like they’re also not using quadratic scaling any more.
Worth noting that LLMs are no longer using quadratic context window scaling. See e.g. Claude-Long. Seems they’ve figured out how to make it ~linear. Looking at GPT-4 with a 32K context window option for corporate clients, seems like they’re also not using quadratic scaling any more.