To extend the analogy a bit further, a valid, fully specified cake recipe may be made safer by appending “if cake catches fire, keep oven door closed and turn off heat.” The point of safeguards would not be to tell the AI what to do, but rather to mitigate the damage if the instructions were incorrect in some unforeseen way.
Presumably with a very powerful AI, if it was going wrong we’d have seriously massive problems. So the best failsafe would be retrospective and rely on a Chinese Wall within the AI so it was banned from working round via another failsafe.
So when the AI hits certain prompts (realising that humans must all be destroyed) this sets off the hidden (‘subconscious’) failsafe that switches it off. Or possibly makes it sing Daisy, Daisy while slowly sinking into idiocy.
To clarify, I know nothing about AI or these ‘genie’ debates, so sorry if this has already been discussed and doesn’t work at all. At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.
At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.
You’ve to understand that the basic argument is the mere possibility that AI might be dangerous and the high-risk associated with it. Even if it would be unlikely to happen, the vast amount of negative utility associated with it does outweigh its low probability.
I got that! The problem was more that I was thinking as if the world could be divided up into sealable boxes. In practice, we can do a lot focusing on one area with ‘no effect’ on anything else. But this is because the sorts of actions we do are limited, we can’t detect the low-level impact on things outside those boxes and we have certain unspoken understandings about what sort of thing might constitute an unacceptable effect elsewhere (if I only care about looking at pictures of LOLcats, I might be ‘neutral to the rest of the internet’ except for taking a little bandwidth. A superpowerful AI might realise that it would slightly increase the upload speed and thus maximise the utility function if vast swathes of other internet users were dead).
(if I only care about looking at pictures of LOLcats, I might be ‘neutral to the rest of the internet’ except for taking a little bandwidth. A superpowerful AI might realise that it would slightly increase the upload speed and thus maximise the utility function if vast swathes of other internet users were dead).
Dead and hilariously captioned, possibly.
I bet such an AI could latch onto a meme in which the absence of cats in such pictures was “lulz”.
To extend the analogy a bit further, a valid, fully specified cake recipe may be made safer by appending “if cake catches fire, keep oven door closed and turn off heat.” The point of safeguards would not be to tell the AI what to do, but rather to mitigate the damage if the instructions were incorrect in some unforeseen way.
Presumably with a very powerful AI, if it was going wrong we’d have seriously massive problems. So the best failsafe would be retrospective and rely on a Chinese Wall within the AI so it was banned from working round via another failsafe.
So when the AI hits certain prompts (realising that humans must all be destroyed) this sets off the hidden (‘subconscious’) failsafe that switches it off. Or possibly makes it sing Daisy, Daisy while slowly sinking into idiocy.
To clarify, I know nothing about AI or these ‘genie’ debates, so sorry if this has already been discussed and doesn’t work at all. At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.
You’ve to understand that the basic argument is the mere possibility that AI might be dangerous and the high-risk associated with it. Even if it would be unlikely to happen, the vast amount of negative utility associated with it does outweigh its low probability.
I got that! The problem was more that I was thinking as if the world could be divided up into sealable boxes. In practice, we can do a lot focusing on one area with ‘no effect’ on anything else. But this is because the sorts of actions we do are limited, we can’t detect the low-level impact on things outside those boxes and we have certain unspoken understandings about what sort of thing might constitute an unacceptable effect elsewhere (if I only care about looking at pictures of LOLcats, I might be ‘neutral to the rest of the internet’ except for taking a little bandwidth. A superpowerful AI might realise that it would slightly increase the upload speed and thus maximise the utility function if vast swathes of other internet users were dead).
Dead and hilariously captioned, possibly.
I bet such an AI could latch onto a meme in which the absence of cats in such pictures was “lulz”.