I agree that humanity being overwhelmed by a swarm of misaligned AIs is a very conceivable scenario. We could all wake up one day and find our apps and devices chattering the equivalent of “skibidi, skibidi, skibidi...”
In the Less Wrong Sequences—which predate the current era of deep learning AI—there is an emphasis on the complexity of human values, and the need to capture all of that complexity in programmatic form, if AI civilization is to be a continuation of human civilization rather than a replacement of it.
Then with the rise of deep neural networks, suddenly we have complex quasi-AIs that can learn and even create, and the focus largely switched to how one gets such systems to truly learn anything at all. This has been the era of “alignment”.
I think the only real answer to your concern is to return to the earlier problem, of aligning an AI not just with the task of the moment, but with something akin to an ideal form of “human values”, something that will make it an autonomous ethical agent.
You may have heard of Coherent Extrapolated Volition (CEV). That stands for a solution to alignment at this level of civilizational ethics—instilling an AI with something that can serve as a humane foundation for an entire transhuman civilization. There are still people pursuing alignment in this sense, e.g. June Ku, Vanessa Kosoy, Tamsin Leake. That’s the best solution I have to your problem—ensure that each member of the AI swarm possesses CEV-type alignment, and/or that the swarm is governed by a singled CEV-aligned superintelligence.
Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.
I agree that humanity being overwhelmed by a swarm of misaligned AIs is a very conceivable scenario. We could all wake up one day and find our apps and devices chattering the equivalent of “skibidi, skibidi, skibidi...”
In the Less Wrong Sequences—which predate the current era of deep learning AI—there is an emphasis on the complexity of human values, and the need to capture all of that complexity in programmatic form, if AI civilization is to be a continuation of human civilization rather than a replacement of it.
Then with the rise of deep neural networks, suddenly we have complex quasi-AIs that can learn and even create, and the focus largely switched to how one gets such systems to truly learn anything at all. This has been the era of “alignment”.
I think the only real answer to your concern is to return to the earlier problem, of aligning an AI not just with the task of the moment, but with something akin to an ideal form of “human values”, something that will make it an autonomous ethical agent.
You may have heard of Coherent Extrapolated Volition (CEV). That stands for a solution to alignment at this level of civilizational ethics—instilling an AI with something that can serve as a humane foundation for an entire transhuman civilization. There are still people pursuing alignment in this sense, e.g. June Ku, Vanessa Kosoy, Tamsin Leake. That’s the best solution I have to your problem—ensure that each member of the AI swarm possesses CEV-type alignment, and/or that the swarm is governed by a singled CEV-aligned superintelligence.
Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.