we will need an entirely new paradigm of alignment
I don’t know what paradigm you’re referring to… Before the deep learning revolution, the only form of alignment that this community talked about, was what the values of an all-powerful superintelligent AI should be, and the paradigm answer was Coherent Extrapolated Volition, a plan to bootstrap an ideal moral agent from the less-than-ideal cognitive architecture of humanity.
After the deep learning revolution, there arose a great practical interest in the technicalities of getting an AI to reliably adopt any kind of goal or value at all. But I’m not aware of any new alternative paradigm regarding the endgame, when AI surpasses human control entirely.
As far as I know, CEV is still what the best proposals look like. We don’t want to base AI values on the imitation of unfiltered human behavior, but the choice about what parts of human behavior are good, and worthy of emulation, and what parts are bad, and needing to be shunned, must at some level be based on human ethical and metaethical judgments.
the new plan is “we’re not going to allow ai to remove human control of the future; we’re going to share and grow together, even after we are surpassed”. to do this, we need to define how to integrate in a way that ensures slack is created even at the neural level, in order to ensure that our memory edits are justified and fair, or something along those lines.
I don’t know what paradigm you’re referring to… Before the deep learning revolution, the only form of alignment that this community talked about, was what the values of an all-powerful superintelligent AI should be, and the paradigm answer was Coherent Extrapolated Volition, a plan to bootstrap an ideal moral agent from the less-than-ideal cognitive architecture of humanity.
After the deep learning revolution, there arose a great practical interest in the technicalities of getting an AI to reliably adopt any kind of goal or value at all. But I’m not aware of any new alternative paradigm regarding the endgame, when AI surpasses human control entirely.
As far as I know, CEV is still what the best proposals look like. We don’t want to base AI values on the imitation of unfiltered human behavior, but the choice about what parts of human behavior are good, and worthy of emulation, and what parts are bad, and needing to be shunned, must at some level be based on human ethical and metaethical judgments.
the new plan is “we’re not going to allow ai to remove human control of the future; we’re going to share and grow together, even after we are surpassed”. to do this, we need to define how to integrate in a way that ensures slack is created even at the neural level, in order to ensure that our memory edits are justified and fair, or something along those lines.