Is there any chance that we (a) CAN’T restrict AI to be friendly per se, but (b) (conditional on this impossibility) CAN restrict it to keep it from blowing up in our faces? If friendly AI is in fact not possible, then first generation AI may recognize this fact and not want to build a successor that would destroy the first generation AI in an act of unfriendliness.
It seems to me like the worst case would be that Friendly AI is in fact possible...but that we aren’t the first to discover it. In which case AI would happily perpetuate itself. But what are the best and worst case scenarios conditioning on Friendly AI being IMpossible?
Has this been addressed before? As a disclaimer, I haven’t thought much about this and I suspect that I’m dressing up the problem in a way that sounds different to me only because I don’t fully understand the implications.
Is there any chance that we (a) CAN’T restrict AI to be friendly per se, but (b) (conditional on this impossibility) CAN restrict it to keep it from blowing up in our faces?
First, define “friendly” in enough detail that I know that it’s different from “will not blow up in our faces”.
Incidentally, /b/ might be good evidence for (b). It’s a rather unsettling demonstration of what people do when anonymity has removed most of the incentive for signaling.
I find chans’ lack of signaling highly intellectually refreshing. /b/ is not typical—due to ridiculously high traffic only meme-infested threads that you can reply to in 5 seconds survive. Normal boards have far better discussion quality.
Is there any chance that we (a) CAN’T restrict AI to be friendly per se, but (b) (conditional on this impossibility) CAN restrict it to keep it from blowing up in our faces? If friendly AI is in fact not possible, then first generation AI may recognize this fact and not want to build a successor that would destroy the first generation AI in an act of unfriendliness.
It seems to me like the worst case would be that Friendly AI is in fact possible...but that we aren’t the first to discover it. In which case AI would happily perpetuate itself. But what are the best and worst case scenarios conditioning on Friendly AI being IMpossible?
Has this been addressed before? As a disclaimer, I haven’t thought much about this and I suspect that I’m dressing up the problem in a way that sounds different to me only because I don’t fully understand the implications.
First, define “friendly” in enough detail that I know that it’s different from “will not blow up in our faces”.
Ooh, good catch! wheninrome15 may need to define “will not blow up in our faces” in more detail as well.
Such an eventuality would seem to require that (a) human beings are not computable or (b) human beings are not Friendly.
In the latter case, if nothing else, there is [individual]-Friendliness to consider.
I think human history has demonstrated that (b) is certainly true… sometimes I am surprised we are still here.
The argument from (b)* is one of the stronger ones I’ve heard against FAI.
* Not to be confused with the argument from /b/.
Incidentally, /b/ might be good evidence for (b). It’s a rather unsettling demonstration of what people do when anonymity has removed most of the incentive for signaling.
I find chans’ lack of signaling highly intellectually refreshing. /b/ is not typical—due to ridiculously high traffic only meme-infested threads that you can reply to in 5 seconds survive. Normal boards have far better discussion quality.