Thank you for this post and study. It’s indeed very interesting.
I have two questions:
In what ways is this threat model similar to or different from learned steganography? It seems quite similar to me, but I’m not entirely sure.
If it can be related to steganography, couldn’t we apply the same defenses as for steganography, such as paraphrasing, as suggested in this paper? If paraphrasing is a successful defense, we could use it in the control setting, in the lab, although it might be cumbersome to apply paraphrasing for all users in the api.
Thank you for this post and study. It’s indeed very interesting.
I have two questions:
In what ways is this threat model similar to or different from learned steganography? It seems quite similar to me, but I’m not entirely sure.
If it can be related to steganography, couldn’t we apply the same defenses as for steganography, such as paraphrasing, as suggested in this paper? If paraphrasing is a successful defense, we could use it in the control setting, in the lab, although it might be cumbersome to apply paraphrasing for all users in the api.