“There seems to be no reason not to expect that human value functions have similar problems, which even “aligned” AIs could trigger unless they are somehow designed not to.” There are plenty of reasons to think that we don’t have similar problems—for instance, we’re much smarter than the ML systems on which we’ve seen adversarial examples. Also, there are lots of us, and we keep each other in check.
“For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can’t keep up, and their value systems no longer apply or give essentially random answers.” What does this actually look like? Suppose I’m made the absolute ruler of a whole virtual universe—that’s a lot of power. How might my value system “not keep up”?
I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.
But I can’t do the wrong thing, by my standards of value, if my “value system no longer applies”. So that’s part of what I’m trying to tease out.
Another part is: I’m not sure if Wei thinks this is just a governance problem (i.e. we’re going to put people in charge who do the wrong thing, despite some people advocating caution) or a more fundamental problem that nobody would do the right thing.
If the former, then I’d characterise this more as “more power magnifies leadership problems”. But maybe it won’t, because there’s also a much larger space of morally acceptable things you can do. It just doesn’t seem that easy to me to accidentally do a moral catastrophe if you’ve got a huge amount of power, and less so an irreversible one. But maybe this is just because I don’t know of whatever possible examples Wei thinks about.
I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.
But I can’t do the wrong thing, by my standards of value, if my “value system no longer applies”. So that’s part of what I’m trying to tease out.
Another part is: I’m not sure if Wei thinks this is just a governance problem (i.e. we’re going to put people in charge who do the wrong thing, despite some people advocating caution) or a more fundamental problem that nobody would do the right thing.
If the former, then I’d characterise this more as “more power magnifies leadership problems”. But maybe it won’t, because there’s also a much larger space of morally acceptable things you can do. It just doesn’t seem that easy to me to accidentally do a moral catastrophe if you’ve got a huge amount of power, and less so an irreversible one. But maybe this is just because I don’t know of whatever possible examples Wei thinks about.