LLMs can reproduce the string by reading LessWrong, even if the benchmark data is correctly filtered
If this is a concern, something as simple as rot13-ing the string here on LW (without explicitly mentioning that you’ve done so) should be sufficient to hide its true value from the still relatively underpowered LLMs.
If this is a concern, something as simple as rot13-ing the string here on LW (without explicitly mentioning that you’ve done so) should be sufficient to hide its true value from the still relatively underpowered LLMs.