[Paul:] I believe the honest debater can quite easily win this game, and that this pretty strongly suggests that amplification will be able to classify the image.
I think this is only true for categories that the overseer (or judge in the debate game) can explicitly understand the differences between. For example instead of cats and dogs, suppose the two categories are photos of the faces of two people who look alike. These photos can be reliably recognized by humans (through subtle differences in their features or geometry) without the humans being able to understand or explain (from memory) what the differences are. (This is analogous to the translation example so is not really a new point.)
BTW, the AI safety via debate paper came out from OpenAI a few days ago (see also the blog post). It sheds some new light on Amplification and is also a very interesting idea in its own right.
BTW, the AI safety via debate paper came out from OpenAI a few days ago (see also the blog post). It sheds some new light on Amplification and is also a very interesting idea in its own right.
I think this is only true for categories that the overseer (or judge in the debate game) can explicitly understand the differences between. For example instead of cats and dogs, suppose the two categories are photos of the faces of two people who look alike. These photos can be reliably recognized by humans (through subtle differences in their features or geometry) without the humans being able to understand or explain (from memory) what the differences are. (This is analogous to the translation example so is not really a new point.)
BTW, the AI safety via debate paper came out from OpenAI a few days ago (see also the blog post). It sheds some new light on Amplification and is also a very interesting idea in its own right.
I’ve made an LW link post for it here.