The second scenario omits the details about continuing to create and submit pull requests after takeover, instead just referring to human farms. Since it doesn’t explicitly say that it’s still optimizing for the original objective criteria and instead just refers to world domination, it appears to be inner misalignment (e.g. no longer aligned with the original optimizer). Did the original posing of this question specify that scenario 2 still maximizes pull requests after world domination?
The intent was that in scenario 2 the AI constructs the human farms in order to to have the humans continually merging pull requests (i.e. yes, it still maximizes pull requests). I don’t think I explicitly said this—I probably gave the same text as in this post—but I expect people to have made this inference (because (a) otherwise it’s not clear why the AI would build human farms and (b) when I showed the inconsistency to the survey-takers, to my knowledge none of them said anything along the lines of “oh I was assuming that in scenario 2 the AI was no longer maximizing pull requests”).
The second scenario omits the details about continuing to create and submit pull requests after takeover, instead just referring to human farms. Since it doesn’t explicitly say that it’s still optimizing for the original objective criteria and instead just refers to world domination, it appears to be inner misalignment (e.g. no longer aligned with the original optimizer). Did the original posing of this question specify that scenario 2 still maximizes pull requests after world domination?
The intent was that in scenario 2 the AI constructs the human farms in order to to have the humans continually merging pull requests (i.e. yes, it still maximizes pull requests). I don’t think I explicitly said this—I probably gave the same text as in this post—but I expect people to have made this inference (because (a) otherwise it’s not clear why the AI would build human farms and (b) when I showed the inconsistency to the survey-takers, to my knowledge none of them said anything along the lines of “oh I was assuming that in scenario 2 the AI was no longer maximizing pull requests”).