Adam Karvonen comments on Using an LLM perplexity filter to detect weight exfiltration

Adam Karvonen 22 Jul 2024 23:16 UTC
2 points
−2
The perplexity filter works best if the attacker doesn’t have complete control of the entire data center, but instead limited access to some machines. An underlying assumption is that it’s easier to secure the internet connection choke point, rather than everything going on inside the data center.

This can be stacked with compression by applying the perplexity filter before the compression step.

Assuming that the attacker has access to the perplexity filter model, it would still reduce the possible information density if they have to perform some sort of arithmetic encoding scheme that has low perplexity. I had coworkers working on exactly this problem of encoding information into natural language using arithmetic encoding and GPT-2 small, and there’s a major trade-off between the perplexity of the language generated and the density of the encoded information, although I don’t have numbers available to quantify this trade-off.