That HN comment you linked to is almost 10 years old, near the bottom of a thread on an unrelated story, and while it supports your point, I don’t notice what other qualities it has that would make it especially memorable, so I’m kind of amazed that you surfaced it at an appropriate moment from such an obscure place and I’m curious how that happened.
Oh, it’s just from my list of reward hacking; I don’t mention the others because most of them aren’t applicable to train/deploy distinction. And I remember it because I remember a lot of things and this one was particularly interesting to me for exactly the reason I linked it just now—illustrating that optimization processes can hack train/deploy distinctions as a particularly extreme form of ‘data leakage’. As for where I got it, I believe someone sent it to me way back when I was compiling that list.
That HN comment you linked to is almost 10 years old, near the bottom of a thread on an unrelated story, and while it supports your point, I don’t notice what other qualities it has that would make it especially memorable, so I’m kind of amazed that you surfaced it at an appropriate moment from such an obscure place and I’m curious how that happened.
Oh, it’s just from my list of reward hacking; I don’t mention the others because most of them aren’t applicable to train/deploy distinction. And I remember it because I remember a lot of things and this one was particularly interesting to me for exactly the reason I linked it just now—illustrating that optimization processes can hack train/deploy distinctions as a particularly extreme form of ‘data leakage’. As for where I got it, I believe someone sent it to me way back when I was compiling that list.