I agree that that is another failure mode. (And there are yet other failure modes too—e.g. instead of printing the nanobot plan, it prints “Help me I’m trapped in a box…” :-P . I apologize for sloppy wording that suggested the two things I mentioned were the only two problems.)
I disagree about “more central”. I think that’s basically a disagreement on the question of “what’s a bigger deal, inner misalignment or outer misalignment?” with you voting for “outer” and me voting for “inner, or maybe tie, I dunno”. But I’m not sure it’s a good use of time to try to hash out that disagreement. We need an alignment plan that solves all the problems simultaneously. Probably different alignment approaches will get stuck on different things.
I agree that that is another failure mode. (And there are yet other failure modes too—e.g. instead of printing the nanobot plan, it prints “Help me I’m trapped in a box…” :-P . I apologize for sloppy wording that suggested the two things I mentioned were the only two problems.)
I disagree about “more central”. I think that’s basically a disagreement on the question of “what’s a bigger deal, inner misalignment or outer misalignment?” with you voting for “outer” and me voting for “inner, or maybe tie, I dunno”. But I’m not sure it’s a good use of time to try to hash out that disagreement. We need an alignment plan that solves all the problems simultaneously. Probably different alignment approaches will get stuck on different things.