I only read up to remark 5.B before I got too distracted that remark 1 does not describe the GPT I interact with.
How did you come to the conclusion that the token deletion rule is to remove 1 token from the front?
The API exposed by OpenAI does not delete any tokens. If you exceed the context window, you receive an error and you are responsible for how to delete tokens to get back within it. (I believe, if I understand correctly, this is dynamic GPT, calculating one token at a time, but only appending to the end of the input tokens until it reaches a stop token or the completion length parameter. Prompt length + max completion length must be ⇐ context length. Due to per token billing, the max completion length is usually much smaller than reaching the context limit, but I could see where the most interesting behavior for your purposes would be with a larger limit.)
The deletion rule I’ve been working with, langchain.memory.ConversationSummaryBufferMemory, is very different. When the threshold token count is exceeded, it separates the first n chat messages to get below the goal. It then runs GPT with a summarization prompt with those n messages included. The output of that summarization is then prepended to the original history’s chat messages starting at n+1. This is far more selective in which history it is throwing away, which can have a large impact on behavior.
Langchain does have simpler rules that just throw away history, but they usually throw away an entire message at a time, not a single token at a time.
Why are you ignoring prompts much smaller than the context window? This appears to be the vast majority of prompts, because given the way the API works you need to leave room for the reply, and have some way to handle continuation if the reply hits the limit before it hits the stop token. The tokens past the stop token in the context window never seem to matter, though I have not investigated how they do that, i.e. do they force them all to zero or whatever.
I only read up to remark 5.B before I got too distracted that remark 1 does not describe the GPT I interact with.
How did you come to the conclusion that the token deletion rule is to remove 1 token from the front?
The API exposed by OpenAI does not delete any tokens. If you exceed the context window, you receive an error and you are responsible for how to delete tokens to get back within it. (I believe, if I understand correctly, this is dynamic GPT, calculating one token at a time, but only appending to the end of the input tokens until it reaches a stop token or the completion length parameter. Prompt length + max completion length must be ⇐ context length. Due to per token billing, the max completion length is usually much smaller than reaching the context limit, but I could see where the most interesting behavior for your purposes would be with a larger limit.)
The deletion rule I’ve been working with, langchain.memory.ConversationSummaryBufferMemory, is very different. When the threshold token count is exceeded, it separates the first n chat messages to get below the goal. It then runs GPT with a summarization prompt with those n messages included. The output of that summarization is then prepended to the original history’s chat messages starting at n+1. This is far more selective in which history it is throwing away, which can have a large impact on behavior.
Langchain does have simpler rules that just throw away history, but they usually throw away an entire message at a time, not a single token at a time.
Why are you ignoring prompts much smaller than the context window? This appears to be the vast majority of prompts, because given the way the API works you need to leave room for the reply, and have some way to handle continuation if the reply hits the limit before it hits the stop token. The tokens past the stop token in the context window never seem to matter, though I have not investigated how they do that, i.e. do they force them all to zero or whatever.