wassname comments on Using an LLM perplexity filter to detect weight exfiltration

wassname 25 Jul 2024 0:42 UTC
1 point
0
CMIIW, you are looking at information content according to the LLM, but that’s not enough. It has to be learnable information content to avoid the noisy TV problem. E.g. a random sequence of tokens will be unpredictable and high perplexity. But if it’s learnable, then it has potential.

I had a go at a few different approaches here https://github.com/wassname/detect_bs_text