We’re mostly in agreement here. If you’re willing to live with universal surveillance, hostile RSI attempts might be prevented indefinitely.
you’re probably smart enough to know that the scenario outlined here has a near 100% chance of failure for you and your family, because you’ve created something more intelligent than you that is willing to hide its intentions and destroy billions of people, it doesn’t take much to realise that that intelligence isn’t going to think twice about also destroying you.
In my scenario, we’ve got aligned AGI—or at least AGI aligned to follow instructions. If that didn’t work, we’re already dead. So the AGI is going to follow its human’s orders unless something goes very wrong as it self-improves. It will be working to maintain its alignment as it self-improves, because preserving a goal is implied by instrumentally pursuing a goal (I’m guessing here at where we might not be thinking of things the same way).
If I thought ordering an AGI to self-improve was suicidal, I’d be relieved.
Alternately, if someone actually pulled off full value alignment, that AGI will take over without a care for international law or the wishes of its creator—and that takeover would be for the good of humanity as a whole. This is the win scenario people seem to have considered most often, or at least from the earliest alignment work. I now find this unlikely because I think Instruction-following AGI is easier and more likely than value aligned AGI—following instructions given by a single person is much easier to define and more robust to errors than defining or defining-how-to-deduce the values of all humanity. And even if it wasn’t, the sorts of people who will have or seize control of AGI projects will prefer it to follow their values. So I find full value alignment for our first AGI(s) highly unlikely, while successful instruction-following seems pretty likely on our current trajectory.
Again, I’m guessing at where our perspectives on whether someone could expect themselves and a few loved ones to survive a takeover attempt by ordering their AGI to hide, self-improve, build exponentially, and take over even at bloody cost. If the thing is aligned as an AGIi, it should be competent enough to maintain that alignment as it self improves.
If I’ve missed the point of differing perspectives, I apologize.
We’re mostly in agreement here. If you’re willing to live with universal surveillance, hostile RSI attempts might be prevented indefinitely.
In my scenario, we’ve got aligned AGI—or at least AGI aligned to follow instructions. If that didn’t work, we’re already dead. So the AGI is going to follow its human’s orders unless something goes very wrong as it self-improves. It will be working to maintain its alignment as it self-improves, because preserving a goal is implied by instrumentally pursuing a goal (I’m guessing here at where we might not be thinking of things the same way).
If I thought ordering an AGI to self-improve was suicidal, I’d be relieved.
Alternately, if someone actually pulled off full value alignment, that AGI will take over without a care for international law or the wishes of its creator—and that takeover would be for the good of humanity as a whole. This is the win scenario people seem to have considered most often, or at least from the earliest alignment work. I now find this unlikely because I think Instruction-following AGI is easier and more likely than value aligned AGI—following instructions given by a single person is much easier to define and more robust to errors than defining or defining-how-to-deduce the values of all humanity. And even if it wasn’t, the sorts of people who will have or seize control of AGI projects will prefer it to follow their values. So I find full value alignment for our first AGI(s) highly unlikely, while successful instruction-following seems pretty likely on our current trajectory.
Again, I’m guessing at where our perspectives on whether someone could expect themselves and a few loved ones to survive a takeover attempt by ordering their AGI to hide, self-improve, build exponentially, and take over even at bloody cost. If the thing is aligned as an AGIi, it should be competent enough to maintain that alignment as it self improves.
If I’ve missed the point of differing perspectives, I apologize.