Ok, but I still think it’s legit to expect some kind of baseline skill level from the human. Doing the deceptive chess experiment with a total noob who doesn’t even know chess rules is kinda like assigning a difficult programming task with an AI advisor to someone who never wrote a line of code before (say, my grandma). Regardless of the AI advisor quality, there’s no way the task of aligning AGI will end up assigned to my grandma.
Seems pretty plausible that the degree to which any human today understands <whatever key skills turn out to be upstream of alignment of superintelligence> is pretty similar to the degree to which your grandma understands python. Indeed, assigning a difficult programming task with AI advisors of varying honesty to someone who never wrote a line of code before would be another great test to run, plausibly an even better test than chess.
Another angle:
This effect is a thing, it’s one of the things which the experts need to account for, and that’s part of what the experiment needs to test. And it applies to large skill gaps even when the person on the lower end of the skill gap is well above average.
Another thing to keep in mind is that a full set of honest advisors can (and I think would) ask the human to take a few minutes to go over chess notation with them after the first confusion. If the fear of dishonest advisors means that the human doesn’t do that, or the honest advisor feels that they won’t be trusted in saying ‘let’s take a pause to discuss notation’, that’s also good to know.
Question for the advisor players: did any of you try to take some time off explain notation to the human player?
Conor explained some details about notation during the opening, and I explained a bit as well. (I wasn’t taking part in the discussion about the actual game, of course, just there to clarify the rules.)
Ok, but I still think it’s legit to expect some kind of baseline skill level from the human. Doing the deceptive chess experiment with a total noob who doesn’t even know chess rules is kinda like assigning a difficult programming task with an AI advisor to someone who never wrote a line of code before (say, my grandma). Regardless of the AI advisor quality, there’s no way the task of aligning AGI will end up assigned to my grandma.
Seems pretty plausible that the degree to which any human today understands <whatever key skills turn out to be upstream of alignment of superintelligence> is pretty similar to the degree to which your grandma understands python. Indeed, assigning a difficult programming task with AI advisors of varying honesty to someone who never wrote a line of code before would be another great test to run, plausibly an even better test than chess.
Another angle:
This effect is a thing, it’s one of the things which the experts need to account for, and that’s part of what the experiment needs to test. And it applies to large skill gaps even when the person on the lower end of the skill gap is well above average.
Another thing to keep in mind is that a full set of honest advisors can (and I think would) ask the human to take a few minutes to go over chess notation with them after the first confusion. If the fear of dishonest advisors means that the human doesn’t do that, or the honest advisor feels that they won’t be trusted in saying ‘let’s take a pause to discuss notation’, that’s also good to know.
Question for the advisor players: did any of you try to take some time off explain notation to the human player?
Conor explained some details about notation during the opening, and I explained a bit as well. (I wasn’t taking part in the discussion about the actual game, of course, just there to clarify the rules.)