What I’m saying is a bit different from CEV- it would involve modelling only a single’s human’s preferences, and would involve modelling their brain only in the short term (which would be a lot easier). Human beings have at least reasonable judgement with things such as, say, a paperclip factory, to the point where human will calling the shots will have no consequences that are too severe.
Specifying that kind of thing (including specifying preference) is probably almost as hard as getting the AI’s motivations right in the first place.
Though Paul Christiano had some suggestions along those lines, which (in my opinion) needed uploads (human minds instantiated in a computer) to have a hope of working...
We should remember that we aren’t talking about true Friendly AI here, but AI in charge of lesser tasks such as, in the example, running a factory. There will be many things the AI doesn’t know because it doesn’t need to, including how to defend itself against being shut down (I see no logical reason why that would be necessary for running a paperclip factory). Combining that with the limits on intelligence necessary for such lesser tasks, and failure modes become far less likely.
What I’m saying is a bit different from CEV- it would involve modelling only a single’s human’s preferences, and would involve modelling their brain only in the short term (which would be a lot easier). Human beings have at least reasonable judgement with things such as, say, a paperclip factory, to the point where human will calling the shots will have no consequences that are too severe.
Specifying that kind of thing (including specifying preference) is probably almost as hard as getting the AI’s motivations right in the first place.
Though Paul Christiano had some suggestions along those lines, which (in my opinion) needed uploads (human minds instantiated in a computer) to have a hope of working...
Would a human be bound to “at least reasonable judgement” if given super intelligent ability?
We should remember that we aren’t talking about true Friendly AI here, but AI in charge of lesser tasks such as, in the example, running a factory. There will be many things the AI doesn’t know because it doesn’t need to, including how to defend itself against being shut down (I see no logical reason why that would be necessary for running a paperclip factory). Combining that with the limits on intelligence necessary for such lesser tasks, and failure modes become far less likely.
THat’s sort of similar to what I keep talking about w/ ‘obedient AI’.