In terms of directions to pursue, it seems like the first thing you want to do is make sure the AI is essentially transparent and that we don’t have much of an inferential gap with it. Otherwise when we attempt to have it give a values and tradeoffs solution, we may not get anywhere near what we want.
In essence if the AI should be able to look at all the problems facing earth and say something like “I’m 97% sure our top priority is to build asteroid deflectors, based on these papers, calculations, and projections. The proposed plan of earthquake stabilizers is only 2% likely to be the best course of action based on these other papers, calculations, and projection” If it doesn’t have that kind of approach, there seem to be many ways that things can go horribly wrong.
Examples:
A: If the AI can build Robotic Earthquake stabilizers at essentially no cost, and prevent children from being killed in earthquakes, or, it can simulate everyone and have our simulations have that experience at essentially no cost, the AI should probably be aware of the fact that these are different things so we don’t say “Yes, build those earthquake stabilizers.” and then it uploads everyone, and we say “That isn’t what I meant!”
B: And the AI should definitely provide some kind of information about proposed plans/alternatives. If we say “Earthquake stabilizers save the most children, build those!” and the AI is aware “Actually, Asteroid deflectors save ten times more children.” it shouldn’t just go “Oh well, they SAID earthquake stabilizers, I’m not even going to mention the deflectors.”
C: Or maybe: “I thought killing all children was the best way to stop children from suffering, and that this was trivially obvious so of course you wanted me to make a childkiller plague and I did so and released it without telling you when you said “Reduce children’s suffering.”″
D: Or it could simulate everyone and say “Well, they never said to keep the simulation running after I simulated everyone, so time to shutdown all simulations and save power for their next request.”
Once you’ve got that settled, you can attempt to have the AI do other things, like assess Anti-Earthquake/Asteroid Deflection/Uploading, because you’ll actually be able to ask it “Which of these are the right things to do and why based on these values and these value tradeoffs?” and get an answer which makes sense. You may not like or expect the answer, but at least you should be able to understand it given time.
For instance, going back to the sample problem, I don’t mind that simulation that much, but I don’t mind it because I am assuming it works as advertised. If it has a problem like D and I just didn’t realize that and the AI didn’t think it noteworthy, that’s a problem. Also, for all I know, there is an even better proposed life, that the AI was aware of, and didn’t think to even suggest as in B.
Given a sufficiently clear AI, I’d imagine that it could explain things to me sufficiently well that there wouldn’t even be a question of which values to trade off, because the solution would be clear, but for all I know, it might come up with “Well, about half of you want to live in a simulated utopia, and about half of you want to live in a real utopia, and this is unresolvable to me because of these factors unless you solve this value tradeoff problem.”
It would still however, have collected all the reasons together that explained WHY it couldn’t solve that value tradeoff problem, which would still be a handy thing to have anyway, since I don’t have that right now.
Edit: Eek, I did not realize the “#” sign bolded things, extra bolds removed.
In terms of directions to pursue, it seems like the first thing you want to do is make sure the AI is essentially transparent and that we don’t have much of an inferential gap with it. Otherwise when we attempt to have it give a values and tradeoffs solution, we may not get anywhere near what we want.
In essence if the AI should be able to look at all the problems facing earth and say something like “I’m 97% sure our top priority is to build asteroid deflectors, based on these papers, calculations, and projections. The proposed plan of earthquake stabilizers is only 2% likely to be the best course of action based on these other papers, calculations, and projection” If it doesn’t have that kind of approach, there seem to be many ways that things can go horribly wrong.
Examples:
A: If the AI can build Robotic Earthquake stabilizers at essentially no cost, and prevent children from being killed in earthquakes, or, it can simulate everyone and have our simulations have that experience at essentially no cost, the AI should probably be aware of the fact that these are different things so we don’t say “Yes, build those earthquake stabilizers.” and then it uploads everyone, and we say “That isn’t what I meant!”
B: And the AI should definitely provide some kind of information about proposed plans/alternatives. If we say “Earthquake stabilizers save the most children, build those!” and the AI is aware “Actually, Asteroid deflectors save ten times more children.” it shouldn’t just go “Oh well, they SAID earthquake stabilizers, I’m not even going to mention the deflectors.”
C: Or maybe: “I thought killing all children was the best way to stop children from suffering, and that this was trivially obvious so of course you wanted me to make a childkiller plague and I did so and released it without telling you when you said “Reduce children’s suffering.”″
D: Or it could simulate everyone and say “Well, they never said to keep the simulation running after I simulated everyone, so time to shutdown all simulations and save power for their next request.”
Once you’ve got that settled, you can attempt to have the AI do other things, like assess Anti-Earthquake/Asteroid Deflection/Uploading, because you’ll actually be able to ask it “Which of these are the right things to do and why based on these values and these value tradeoffs?” and get an answer which makes sense. You may not like or expect the answer, but at least you should be able to understand it given time.
For instance, going back to the sample problem, I don’t mind that simulation that much, but I don’t mind it because I am assuming it works as advertised. If it has a problem like D and I just didn’t realize that and the AI didn’t think it noteworthy, that’s a problem. Also, for all I know, there is an even better proposed life, that the AI was aware of, and didn’t think to even suggest as in B.
Given a sufficiently clear AI, I’d imagine that it could explain things to me sufficiently well that there wouldn’t even be a question of which values to trade off, because the solution would be clear, but for all I know, it might come up with “Well, about half of you want to live in a simulated utopia, and about half of you want to live in a real utopia, and this is unresolvable to me because of these factors unless you solve this value tradeoff problem.”
It would still however, have collected all the reasons together that explained WHY it couldn’t solve that value tradeoff problem, which would still be a handy thing to have anyway, since I don’t have that right now.
Edit: Eek, I did not realize the “#” sign bolded things, extra bolds removed.