You beat me to making this comment :P Except apparently I came here to make this comment about the changed version.
“A human would only approve safe actions” is just a problem clause altogether. I understand how this seems reasonable for sub-human optimizers, but if you (now addressing Mark and Evan) think it has any particular safety properties for superhuman optimization pressure, the particulars of that might be interesting to nail down a bit better.
You beat me to making this comment :P Except apparently I came here to make this comment about the changed version.
“A human would only approve safe actions” is just a problem clause altogether. I understand how this seems reasonable for sub-human optimizers, but if you (now addressing Mark and Evan) think it has any particular safety properties for superhuman optimization pressure, the particulars of that might be interesting to nail down a bit better.
has been changed to imitation, as suggested by Evan.
Yeah, I agree—the example should probably just be changed to be about an imitative amplification agent or something instead.