This assertion is probably my biggest question mark in this discourse. It seems quite deeply baked into a lot of the MIRI arguments. I’m not sure it’s as certain as you think.
I can see how it is obviously possible we’d create an alien AI, and I think it’s impossible to prove we won’t. However given that we are training our current AI on imprints of human thought (eg text artifacts), and it seems likely we will push hard for AI to be trained to obey laws/morality as they increase in power (eg Google’s AI safety team), it seems entirely plausible to me that the first AGIs might happen to be quite human-like.
In that world, I think we face problems not of the class “this AGI is in good faith inferring that we want to tile the world with paperclips”, but of the much simpler to intuit class of “human alignment” that we also have no idea how to solve. Imagine digitizing any human and giving them increased cognitive power; I suspect most humans would become dictatorial and egotistical, and would take actions that many if not most disagree with. Many humans could be persuaded to wipe the slate clean and start again, given the power to do so.
We already struggle to coordinate to agree who we should grant relatively tiny slivers of political power (compared to this AGI), and the idea that we could all agree on what an “aligned” human-like mind looks like or prioritizes seems naive to me.
Nevertheless, it seems to me that this problem is more tractable than trying to prove things about completely generic minds.
Inasmuch as we do think “human-like AI alignment” is easier, it would push us to things like neuromorphic AI architectures, interpretability research of these architectures, science of human thought substrates, outlawing other architectures, and so on.
I actually agree that it’s likely an AGI will at least start thinking in a way kind of similar to a human, but that in the end this will still be very difficult to align. I actually really recommend that you checkout Understand by Ted Chiang, which basically plays out the exact scenario you mentioned—a normal guy gets super human intelligence and chaos ensues.
This assertion is probably my biggest question mark in this discourse. It seems quite deeply baked into a lot of the MIRI arguments. I’m not sure it’s as certain as you think.
I can see how it is obviously possible we’d create an alien AI, and I think it’s impossible to prove we won’t. However given that we are training our current AI on imprints of human thought (eg text artifacts), and it seems likely we will push hard for AI to be trained to obey laws/morality as they increase in power (eg Google’s AI safety team), it seems entirely plausible to me that the first AGIs might happen to be quite human-like.
In that world, I think we face problems not of the class “this AGI is in good faith inferring that we want to tile the world with paperclips”, but of the much simpler to intuit class of “human alignment” that we also have no idea how to solve. Imagine digitizing any human and giving them increased cognitive power; I suspect most humans would become dictatorial and egotistical, and would take actions that many if not most disagree with. Many humans could be persuaded to wipe the slate clean and start again, given the power to do so.
We already struggle to coordinate to agree who we should grant relatively tiny slivers of political power (compared to this AGI), and the idea that we could all agree on what an “aligned” human-like mind looks like or prioritizes seems naive to me.
Nevertheless, it seems to me that this problem is more tractable than trying to prove things about completely generic minds.
Inasmuch as we do think “human-like AI alignment” is easier, it would push us to things like neuromorphic AI architectures, interpretability research of these architectures, science of human thought substrates, outlawing other architectures, and so on.
I actually agree that it’s likely an AGI will at least start thinking in a way kind of similar to a human, but that in the end this will still be very difficult to align. I actually really recommend that you checkout Understand by Ted Chiang, which basically plays out the exact scenario you mentioned—a normal guy gets super human intelligence and chaos ensues.