I was simplifying the rather complex concept of extrapolated volition to fit it in one sentence.
An AI which not only notices that its friendliness is not invariant, but decides to modify in the direction of invariant Friendliness, is already Friendly. An AI which is able to modify itself to invariant Friendliness without unacceptable compromise of its existing goals is already Friendly. You’re assuming away the hard work.
“already friendly”? You’re acting as if its state doesn’t depend on its environment.
Are there elements of the environment that could determine whether a given AI’s successor is friendly or not? I would say ‘yes’.
This is after one has already done the hard work of making an AI that even has the potential to be friendly, but you messed up on that one crucial bit. This is a saving throw, a desperate error handler, not the primary way forward. By saying ‘backup plan’ I don’t mean, ‘if Friendly AI is hard, let’s try this’, I mean ‘Could this save us from being restrained and nannied for eternity?’
I shudder to think that any AI’s final goals could be so balanced that random articles on the Web of a Thousand Lies could push it one way or the other. I’m of the opinion that this is a fail, to be avoided at all costs.
I was simplifying the rather complex concept of extrapolated volition to fit it in one sentence.
An AI which not only notices that its friendliness is not invariant, but decides to modify in the direction of invariant Friendliness, is already Friendly. An AI which is able to modify itself to invariant Friendliness without unacceptable compromise of its existing goals is already Friendly. You’re assuming away the hard work.
“already friendly”? You’re acting as if its state doesn’t depend on its environment.
Are there elements of the environment that could determine whether a given AI’s successor is friendly or not? I would say ‘yes’.
This is after one has already done the hard work of making an AI that even has the potential to be friendly, but you messed up on that one crucial bit. This is a saving throw, a desperate error handler, not the primary way forward. By saying ‘backup plan’ I don’t mean, ‘if Friendly AI is hard, let’s try this’, I mean ‘Could this save us from being restrained and nannied for eternity?’
I shudder to think that any AI’s final goals could be so balanced that random articles on the Web of a Thousand Lies could push it one way or the other. I’m of the opinion that this is a fail, to be avoided at all costs.