Right now the main focus of alignment seems to be on how to align powerful AGI agents, a.k.a AI Safety. I think the field can benefit from a small reframing: we should think not about aligning AI, but about alignment of systems in general, if we are not already.
It seems to me that the biggest problem in AI Safety comes not from the fact that the system will have unaligned goals, but from the fact that it is superhuman.
As in, it has nearly godlike power in understanding of the world and in turn manipulation of both the world and other humans in it. Does it really matter if an artificial agent gets godlike processing and self-improvement powers or a human will, or government/business?
I propose a little thought experiment – feel free to answer in the comments.
If you, the reader, or, say, Paul Christiano or Eliezer gets uploaded and obtains self-improvement, self-modification and processing speed/power capabilities, will your goals converge to damaging humanity as well?
If not, what makes it different? How can we transfer this secret sauce to an AI agent?
If yes, maybe we can see how big, superhuman systems get aligned right now and take some inspiration from that?
[Question] Do alignment concerns extend to powerful non-AI agents?
Right now the main focus of alignment seems to be on how to align powerful AGI agents, a.k.a AI Safety. I think the field can benefit from a small reframing: we should think not about aligning AI, but about alignment of systems in general, if we are not already.
It seems to me that the biggest problem in AI Safety comes not from the fact that the system will have unaligned goals, but from the fact that it is superhuman.
As in, it has nearly godlike power in understanding of the world and in turn manipulation of both the world and other humans in it. Does it really matter if an artificial agent gets godlike processing and self-improvement powers or a human will, or government/business?
I propose a little thought experiment – feel free to answer in the comments.
If you, the reader, or, say, Paul Christiano or Eliezer gets uploaded and obtains self-improvement, self-modification and processing speed/power capabilities, will your goals converge to damaging humanity as well?
If not, what makes it different? How can we transfer this secret sauce to an AI agent?
If yes, maybe we can see how big, superhuman systems get aligned right now and take some inspiration from that?