Applying the standard NLP preprocessing routine to GPT prompts is a quick and free way to reduce character count and save some tokens while seemingly preserving most information.
I doubt that this is free. OpenAI has its own tokenizer. You basically are saying that without understanding the tradeoffs that the current tokenizer makes adding another tokenizer in front of it will improve performance for free.
I see your point. I think the existing tokenizer is designed to keep all parts of text, while the idea here is to sacrifice some information in favor of compression. But writing this, I also realized that this approach is more effective at saving characters than tokens.
I doubt that this is free. OpenAI has its own tokenizer. You basically are saying that without understanding the tradeoffs that the current tokenizer makes adding another tokenizer in front of it will improve performance for free.
I see your point. I think the existing tokenizer is designed to keep all parts of text, while the idea here is to sacrifice some information in favor of compression. But writing this, I also realized that this approach is more effective at saving characters than tokens.