Design 2̴ 1 may happen to reply “Convince the director to undecouple the AI design by telling him <convincing argument>.” which could convince the operator that reads it and therefore fail as 3̴ 2 fails.
Design 2̴ 1 may also model distant superintelligences that break out of the box by predictably maximizing paperclips iff we draw a runic circle that, when printed as a plan, convinces the reader or hacks the computer.
Design 2̴ 1 may happen to reply “Convince the director to undecouple the AI design by telling him <convincing argument>.” which could convince the operator that reads it and therefore fail as 3̴ 2 fails.
Design 2̴ 1 may also model distant superintelligences that break out of the box by predictably maximizing paperclips iff we draw a runic circle that, when printed as a plan, convinces the reader or hacks the computer.
Why would such “dual purpose” plans have higher approval value than some other plan designed purely to maximize approval?
Oh, damn it, I mixed up the designs. Edited.
Can’t quite read your edit, did you mean 3?
Yeah, then I agree with both points. Sneaky!