From what I understand about humans, they are so self-contradictory and illogical that any AGI that actually tries to optimize for human values will necessarily end up unaligned, and that the best we can hope for is that whatever we end up creating will ignore us and will not need to disassemble the universe to reach whatever goals it might have.
This seems as misled as arguing that any AGI will obviously be aligned because to turn the universe into paperclips is stupid. We can conceivably build an AGI that is aware that humans are self-contradictory and illogical, and therefore won’t assume that they are rational because it knows that that would make it misaligned. We can do at least as well as an overseer that intervenes on needless death and suffering as it would happen.
If you mean an AGI that optimizes for human values exactly as they currently are will be unaligned, you may have a point. But I think many of us are hoping to get it to optimize for an idealized version of human values.
From what I understand about humans, they are so self-contradictory and illogical that any AGI that actually tries to optimize for human values will necessarily end up unaligned, and that the best we can hope for is that whatever we end up creating will ignore us and will not need to disassemble the universe to reach whatever goals it might have.
This seems as misled as arguing that any AGI will obviously be aligned because to turn the universe into paperclips is stupid. We can conceivably build an AGI that is aware that humans are self-contradictory and illogical, and therefore won’t assume that they are rational because it knows that that would make it misaligned. We can do at least as well as an overseer that intervenes on needless death and suffering as it would happen.
If you mean an AGI that optimizes for human values exactly as they currently are will be unaligned, you may have a point. But I think many of us are hoping to get it to optimize for an idealized version of human values.