[mostly-self-plaigiarized from here] If you have a very powerful AI, but it’s designed such that you can’t put it in charge of a burning airplane hurtling towards the ground, that’s … fine, right? I think it’s OK to have first-generation AGIs that can sometimes get “paralyzed by indecision”, and which are thus not suited to solving crises where every second counts. Such an AGI could still do important work like inventing new technology, and in particular designing better and safer second-generation AGIs.
You only really get a problem if your AI finds that there is no sufficiently safe way to act, and so it doesn’t do anything at all. (Or more broadly, if it doesn’t do anything very useful.) Even that’s not dangerous in itself… but then the next thing that happens is the programmer would probably dial the “conservatism” knob down lower and lower, until the AI starts doing useful things. Maybe the programmer says to themselves: “Well, we don’t have a perfect proof, but all the likely hypotheses predict there’s probably no major harm…”
Also, humans tend to treat “a bad thing happened (which had nothing to do with me)” as much less bad than “a bad thing happened (and it’s my fault)”. I think that if it’s possible to make AIs with the same inclination, then it seems like probably a good idea to do so, at least until we get up to super-reliable 12th-generation AGIs or whatever. It’s dangerous to make AIs that notice injustice on the other side of the world and are immediately motivated to fix it—that kind of AI would be very difficult to keep under human control, if human control is the plan (as it seems to be here).
Yup I think I agree. However I could see this going wrong in some kind of slow takeoff world where the AI is already in charge of many things in the world.
[mostly-self-plaigiarized from here] If you have a very powerful AI, but it’s designed such that you can’t put it in charge of a burning airplane hurtling towards the ground, that’s … fine, right? I think it’s OK to have first-generation AGIs that can sometimes get “paralyzed by indecision”, and which are thus not suited to solving crises where every second counts. Such an AGI could still do important work like inventing new technology, and in particular designing better and safer second-generation AGIs.
You only really get a problem if your AI finds that there is no sufficiently safe way to act, and so it doesn’t do anything at all. (Or more broadly, if it doesn’t do anything very useful.) Even that’s not dangerous in itself… but then the next thing that happens is the programmer would probably dial the “conservatism” knob down lower and lower, until the AI starts doing useful things. Maybe the programmer says to themselves: “Well, we don’t have a perfect proof, but all the likely hypotheses predict there’s probably no major harm…”
Also, humans tend to treat “a bad thing happened (which had nothing to do with me)” as much less bad than “a bad thing happened (and it’s my fault)”. I think that if it’s possible to make AIs with the same inclination, then it seems like probably a good idea to do so, at least until we get up to super-reliable 12th-generation AGIs or whatever. It’s dangerous to make AIs that notice injustice on the other side of the world and are immediately motivated to fix it—that kind of AI would be very difficult to keep under human control, if human control is the plan (as it seems to be here).
Sorry if I’m misunderstanding.
Yup I think I agree. However I could see this going wrong in some kind of slow takeoff world where the AI is already in charge of many things in the world.