Why don’t I just built a paperclip maximiser with two utility functions, which it desires to resolve
1.maximise paperclip production
2.do not exceed safeguards
where 2 is weighed far more highly than 2. This may have drawbacks in making my paperclip maximiser less efficient than it might be (it’ll value 2 so highly it will make much less paperclips than maximum, to make sure it doesn’t overshoot), but should prevent it from grinding our bones to make it paperclips.
Surely the biggest issue is not the paperclip maximiser, but the truly intelligent AI which we want to fix many of our problems: the problem being that we’d have to make our safeguards specific, and missing something out could be disastrous. If we could teach an AI to WANT to find out where we had made a mistake, and fix that, that would be better. Hence friendly AI.
Why don’t I just built a paperclip maximiser with two utility functions, which it desires to resolve 1.maximise paperclip production 2.do not exceed safeguards
where 2 is weighed far more highly than 2. This may have drawbacks in making my paperclip maximiser less efficient than it might be (it’ll value 2 so highly it will make much less paperclips than maximum, to make sure it doesn’t overshoot), but should prevent it from grinding our bones to make it paperclips.
Surely the biggest issue is not the paperclip maximiser, but the truly intelligent AI which we want to fix many of our problems: the problem being that we’d have to make our safeguards specific, and missing something out could be disastrous. If we could teach an AI to WANT to find out where we had made a mistake, and fix that, that would be better. Hence friendly AI.