Hastings comments on Is Text Watermarking a lost cause?

Hastings 2 Oct 2024 14:04 UTC
2 points
1
I think that your a-before-e example is confusing your intuition- a typical watermark that occurs 10% of the time isn’t going to be semantic, it’s more like “this n-gram hashed with my nonce == 0 mod 10”
- egor.timatkov 2 Oct 2024 18:17 UTC
  4 points
  0
  Parent
  The a-before-e example is just there to explain, in a human readable way, how a watermark works. The main important bit is that each individual section of the text is unlikely to occur according to some objective scale, be it a-before-e, or hashing mod 10, or some other way.
  I really like your example of hashing small bits of the text to 0 mod 10 though. I would have to look into how often you can actually edit text this way without significantly changing the meaning, but as soon as that’s done, you can solve for an N and find how much text you need in order to determine the text is watermarked.