johnswentworth comments on Ngo and Yudkowsky on alignment difficulty

johnswentworth 17 Nov 2021 0:32 UTC
LW: 15 AF: 7
AF
Personally, I’d consider a Fusion Power Generator-like scenario a more central failure mode than either of these. It’s not about the difficulty of getting the AI to do what we asked, it’s about the difficulty of posing the problem in a way which actually captures what we want.
- Steven Byrnes 17 Nov 2021 13:51 UTC
  LW: 4 AF: 4
  AF Parent
  I agree that that is another failure mode. (And there are yet other failure modes too—e.g. instead of printing the nanobot plan, it prints “Help me I’m trapped in a box…” :-P . I apologize for sloppy wording that suggested the two things I mentioned were the only two problems.)
  I disagree about “more central”. I think that’s basically a disagreement on the question of “what’s a bigger deal, inner misalignment or outer misalignment?” with you voting for “outer” and me voting for “inner, or maybe tie, I dunno”. But I’m not sure it’s a good use of time to try to hash out that disagreement. We need an alignment plan that solves all the problems simultaneously. Probably different alignment approaches will get stuck on different things.