paulfchristiano comments on The AI That Pretends To Be Human

paulfchristiano 3 Feb 2016 18:25 UTC
1 point
The general lesson from steganography is that it is computationally easier to change a distribution in an important way than to detect such a change. In order to detect a change you need to consider all possible ways in which a distribution could be meaningfully altered, while in order to make a change you just have to choose one. From a theory perspective, this is a huge asymmetry that favors the an attacker.

This point doesn’t seem directly relevant though, unless someone offers any good reason to actually include the non-imitation goal, rather than simply imitating the successful human trials. (Though there are more subtle reasons to care about problematic behavior that is neither penalized nor rewarded by your training scheme. It would be nicer to have positive pressure to do only those things you care about. So maybe the point ends up being relevant after all.)

Actually, in the scheme as you wrote it there is literally no reason to include this second goal. The distinguisher is already trying to distinguish the generator’s behavior from [human conditioned on success], so the generator already has to succeed in order to win the game. But this doesn’t introduce any potentially problematic optimization pressure, so it just seems better.