I get a lot of trailing whitespace when using Claude code and variants of Claude Sonnet, more than short tests with base models give me. (Not rigorously tested, yet).
I wonder if the trailing whitespace encodes some information or is just some Constitutional AI/RL artefact.
It’s probably just a difference in tokenizer. Tokenizers often produce tokens with trailing whitespace. I actually once wrote a tokenizer and trained a model to predict “negative whitespace” when a token for once shouldn’t have a trailing whitespace. But I don’t know how current tokenizers handle this, probably in different ways.
I think it’s possible! If it’s used to encode relevant information, then it could be tested by running software engineering benchmarks (e.g. SWE-bench) but removing any trailing whitespace during generation, and checking if the score is lower.
I get a lot of trailing whitespace when using Claude code and variants of Claude Sonnet, more than short tests with base models give me. (Not rigorously tested, yet).
I wonder if the trailing whitespace encodes some information or is just some Constitutional AI/RL artefact.
It’s probably just a difference in tokenizer. Tokenizers often produce tokens with trailing whitespace. I actually once wrote a tokenizer and trained a model to predict “negative whitespace” when a token for once shouldn’t have a trailing whitespace. But I don’t know how current tokenizers handle this, probably in different ways.
That would be my main guess as well75%, but not the overwhelmingly likely option.
Steganography /j
I think it’s possible! If it’s used to encode relevant information, then it could be tested by running software engineering benchmarks (e.g. SWE-bench) but removing any trailing whitespace during generation, and checking if the score is lower.
If it is encoding relevant info then this would be the definition of steganography
I know. I just don’t expect it to.