But our interesting disagreement seems to be over (c). Interesting because it illuminates general differences between the basic idea of a domain-general optimization process (intelligence) and the (not-so-)basic idea of Everything Humans Like. One important difference is that if an AGI optimizes for anything, it will have strong reason to steer clear of possible late intelligence defeaters. Late Friendliness defeaters, on the other hand, won’t scare optimization-process-optimizers in general.
But it will scare friendly ones, which will want to keep their values stable.
But, once again, it doesn’t take any stupidity on the AI’s part to disvalue physically injuring a human,
But it will scare friendly ones, which will want to keep their values stable.
Yes. If an AI is Friendly at one stage, then it is Friendly at every subsequent stage. This doesn’t help make almost-Friendly AIs become genuinely Friendly, though.
It takes stupidity to misinterpret friendlienss.
Yes, but that’s stupidity on the part of the human programmer, and/or on the part of the seed AI if we ask it for advice. The superintelligence didn’t write its own utility function; the superintelligence may well understand Friendliness perfectly, but that doesn’t matter if it hasn’t been programmed to rewrite its source code to reflect its best understanding of ‘Friendliness’. The seed is not the superintelligence. See: http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
Yes, but that’s stupidity on the part of the human programmer, and/or on the part of the seed AI if we ask it for advice.
That depends on the architecture. In a Loosemore architecture, the AI interprets high-level directives itself, so if it gets them wrong, that’s it’s mistake.
But it will scare friendly ones, which will want to keep their values stable.
It takes stupidity to misinterpret friendlienss.
Yes. If an AI is Friendly at one stage, then it is Friendly at every subsequent stage. This doesn’t help make almost-Friendly AIs become genuinely Friendly, though.
Yes, but that’s stupidity on the part of the human programmer, and/or on the part of the seed AI if we ask it for advice. The superintelligence didn’t write its own utility function; the superintelligence may well understand Friendliness perfectly, but that doesn’t matter if it hasn’t been programmed to rewrite its source code to reflect its best understanding of ‘Friendliness’. The seed is not the superintelligence. See: http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
That depends on the architecture. In a Loosemore architecture, the AI interprets high-level directives itself, so if it gets them wrong, that’s it’s mistake.
… and whose fault is that?
http://lesswrong.com/lw/rf/ghosts_in_the_machine/