Another potential option (that could stack with text compression) is to filter outgoing data. In particular, if we only allow outgoing data to be language tokens, and we filter the outgoing data to have low perplexity as measured by a small, secret language model, then it could become very difficult to encode terabytes of model weights in a way that has low perplexity.
I don’t think this stacks with compression—if you compress data then it is no longer low perplexity. Data which is compressed as well as you can should look random to you (rather than predictable).
I think filtering like this is strictly worse than compression for reasons I discuss in my response here.
I think it does stack if the perplexity filter is applied before the compression step, which relies on an underlying assumption that it’s easier to secure the internet connection choke point rather than the entire data center.
Another potential option (that could stack with text compression) is to filter outgoing data. In particular, if we only allow outgoing data to be language tokens, and we filter the outgoing data to have low perplexity as measured by a small, secret language model, then it could become very difficult to encode terabytes of model weights in a way that has low perplexity.
I discuss this idea more here.
I don’t think this stacks with compression—if you compress data then it is no longer low perplexity. Data which is compressed as well as you can should look random to you (rather than predictable).
I think filtering like this is strictly worse than compression for reasons I discuss in my response here.
(But I appreciate the idea!)
I think it does stack if the perplexity filter is applied before the compression step, which relies on an underlying assumption that it’s easier to secure the internet connection choke point rather than the entire data center.
I will try to continue the discussion here for simplicity.