Sorry, I admit I do not understand what exactly the argument is. Seems to me it is something like “if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid”. Which I agree with.
Now the question is (1) what is the probability that we will not get the Friendly AI perfectly on the first attempt, and (2) what happens then? Suppose we got the “superintelligent” and “self-improving” parts correctly, and the “Friendly” part 90% correctly...
As to not understanding the argument—that’s understandable, because this is a long and dense paper.
If you are trying to summarize the whole paper when you say “if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid”, then that would not be right. The argument includes a statement that resembles that, but only as an aside.
As to your question about what happens next, or what happens if we only get the “Friendly” part 90% correct …. well, you are dragging me off into new territory, because that was not really within the scope of the paper. Don’t get me wrong: I like being dragged off into that territory! But there just isn’t time to write down and argue the whole domain of AI friendliness all in one sitting.
The preliminary answer to that question is that everything depends on the details of the motivation system design and my feeling (as a designer of AGI motivation systems) is that beyond a certain point the system is self-stabilizing. That is, it will understand its own limitations and try to correct them.
But that last statement tends to get (some other) people inflamed, because they do not realize that it comes within the “swarm relaxation” context, and they misunderstand the manner in which a system would self correct. Although I said a few things about swarm relaxation in the paper, I did not give enough detail to be able to address this whole topic here.
I understand your desire to stick to an exegesis of your own essay, but part of a critical examination of your essay is seeing whether or not it is on point, so these sorts of questions really are “about” your essay.
Regardng your preliminary answer, I by “correct” I assume you mean “correctly reflecting the desires of the human supervisors”? (In which case, this discussion feeds into our other thread.)
With the best will in the world, I have to focus on one topic at a time: I do not have the bandwidth to wander across the whole of this enormous landscape.
As your question: I was using “correct” as a verb, and the meaning was “self-correct” in the sense of bringing back to the previosuly specified course.
In this case this would be about the AI perceiving some aspects of its design that it noticed might cause it to depart from what it’s goal was nominally supposed to be. In that case it would suggest modifications to correct the problem.
Sorry, I admit I do not understand what exactly the argument is. Seems to me it is something like “if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid”. Which I agree with.
Now the question is (1) what is the probability that we will not get the Friendly AI perfectly on the first attempt, and (2) what happens then? Suppose we got the “superintelligent” and “self-improving” parts correctly, and the “Friendly” part 90% correctly...
As to not understanding the argument—that’s understandable, because this is a long and dense paper.
If you are trying to summarize the whole paper when you say “if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid”, then that would not be right. The argument includes a statement that resembles that, but only as an aside.
As to your question about what happens next, or what happens if we only get the “Friendly” part 90% correct …. well, you are dragging me off into new territory, because that was not really within the scope of the paper. Don’t get me wrong: I like being dragged off into that territory! But there just isn’t time to write down and argue the whole domain of AI friendliness all in one sitting.
The preliminary answer to that question is that everything depends on the details of the motivation system design and my feeling (as a designer of AGI motivation systems) is that beyond a certain point the system is self-stabilizing. That is, it will understand its own limitations and try to correct them.
But that last statement tends to get (some other) people inflamed, because they do not realize that it comes within the “swarm relaxation” context, and they misunderstand the manner in which a system would self correct. Although I said a few things about swarm relaxation in the paper, I did not give enough detail to be able to address this whole topic here.
I understand your desire to stick to an exegesis of your own essay, but part of a critical examination of your essay is seeing whether or not it is on point, so these sorts of questions really are “about” your essay.
Regardng your preliminary answer, I by “correct” I assume you mean “correctly reflecting the desires of the human supervisors”? (In which case, this discussion feeds into our other thread.)
With the best will in the world, I have to focus on one topic at a time: I do not have the bandwidth to wander across the whole of this enormous landscape.
As your question: I was using “correct” as a verb, and the meaning was “self-correct” in the sense of bringing back to the previosuly specified course.
In this case this would be about the AI perceiving some aspects of its design that it noticed might cause it to depart from what it’s goal was nominally supposed to be. In that case it would suggest modifications to correct the problem.