...and hope that AI doesn’t get an idea that the safest way of staying in the box is to destroy the outside world. Or just kill all humans, because as long as humans exist, there is a decent chance someone will make a copy of the AI and try to run it on their own computer (i.e. outside of the original box).
Interesting—the failure mode that occurred to me is a paper-clipper which is designed to prefer virtual paper clips, so it turns the earth/the solar system/ the lightcone into computronium to run virtual paperclips.
If defining stay in the box is that hard, I’m not feeling hopeful about the possibility of defining protect humans.
...and hope that AI doesn’t get an idea that the safest way of staying in the box is to destroy the outside world. Or just kill all humans, because as long as humans exist, there is a decent chance someone will make a copy of the AI and try to run it on their own computer (i.e. outside of the original box).
Interesting—the failure mode that occurred to me is a paper-clipper which is designed to prefer virtual paper clips, so it turns the earth/the solar system/ the lightcone into computronium to run virtual paperclips.
If defining stay in the box is that hard, I’m not feeling hopeful about the possibility of defining protect humans.