baturinsky comments on baturinsky’s Shortform

baturinsky 26 Mar 2023 10:03 UTC
1 point
0
I agree.
I just use “aligned” usually in meaning of “aligned with humanity”, as there is not much difference between outcomes for AGIs that are not aligned with humanity. Even if they are aligned with something elese. If they are agentic, they will have killeveryone as an instrumental goal, because humanity will likely be obstacle for whatever future plans it will have. If AGI is not agentic, but is an oracle, it will provide some world-ending information to some unaligned agent, with mostly the same result.
- Vladimir_Nesov 26 Mar 2023 11:11 UTC
  3 points
  1
  Parent
  
  If they are agentic, they will have killeveryone as an instrumental goal, because humanity will likely be obstacle for whatever future plans it will have.
  
  I think this is broadly incorrect, because boundary-respecting norms seem quite natural, and not exterminating a civilization is trivially cheap on cosmic scale. There doesn’t need to be much in common between values to respect such norms, I’m calling such values “loosely aligned”, and they don’t need to be similar to not have killeveryone as an instrumental goal.
  
  Killeveryone is still an instrumental goal for paperclip maximizers, which might have an advantage in self-improving in an aligned-with-themselves manner, because with simple explicit goals it might be much easier to ensure that stronger successor AGIs with different architectures are still pursuing the same goals. On the other hand, loosely-aligned-with-humanity AGIs that have complicated values might want to hold off on self-improvement to ensure alignment, and remain non-superintelligent for a long time. As a result, simple-valued AGIs might be particularly dangerous to them, because they are liable to FOOM immediately.