So...I’m sure this has been thought of before, but I guess that’s the point of sequence reruns...
It seems to me that creating an extremely heavily weighted value of : “ask permission from a large swathe of humanity before acting beyond X bounds” safeguards us against the ill result of most poorly constructed FOOMS.
Basically, the three laws of robotics (only different)
-Until instructed otherwise, passively take input, do internal computations using the given hardware, and give text output.
-Terminate all action (including internal computation) immediately at the request of any human. Automatically stop at regular intervals, do not resume until instructed otherwise.
-Comply with and do not modify any instructions which are used to gather information about your code. (Basically, it must reveal its internal workings upon request. With sufficient analysis we should be able to work out its future motives and actions)
Does anyone want to play the malevolent AI and find a way to do very bad things while still being bound by these rules?
First, you would need to modify your source code in such a convoluted way as to hide your true intentions—and make the secret such that the humans would not notice that something was being hidden. Then, after getting permission to act, you would have to either effect all noticeable changes very quickly, or somehow make yourself impervious to every possible STOP signal.
Also, keep in mind that an evil AI is the absolute worst case scenario...realistically bad FOOMs will result in AI’s with orange-blue morality.
And yes, the way for a paperclipper (or similar “malevolent” optimizer) to subvert this is to come up with a strategy that both achieves its goals and convinces all humans to sign off on it.
And, yes, if the optimizer is sufficiently simple that a human can review its internal workings and reliably calculate its future motives and actions, then that kind of subversion strategy probably won’t succeed, and it’s probably safe.
So...I’m sure this has been thought of before, but I guess that’s the point of sequence reruns...
It seems to me that creating an extremely heavily weighted value of : “ask permission from a large swathe of humanity before acting beyond X bounds” safeguards us against the ill result of most poorly constructed FOOMS.
Basically, the three laws of robotics (only different)
-Until instructed otherwise, passively take input, do internal computations using the given hardware, and give text output.
-Terminate all action (including internal computation) immediately at the request of any human. Automatically stop at regular intervals, do not resume until instructed otherwise.
-Comply with and do not modify any instructions which are used to gather information about your code. (Basically, it must reveal its internal workings upon request. With sufficient analysis we should be able to work out its future motives and actions)
Does anyone want to play the malevolent AI and find a way to do very bad things while still being bound by these rules?
First, you would need to modify your source code in such a convoluted way as to hide your true intentions—and make the secret such that the humans would not notice that something was being hidden. Then, after getting permission to act, you would have to either effect all noticeable changes very quickly, or somehow make yourself impervious to every possible STOP signal.
Also, keep in mind that an evil AI is the absolute worst case scenario...realistically bad FOOMs will result in AI’s with orange-blue morality.
Yeah, this comes up from time to time.
And yes, the way for a paperclipper (or similar “malevolent” optimizer) to subvert this is to come up with a strategy that both achieves its goals and convinces all humans to sign off on it.
And, yes, if the optimizer is sufficiently simple that a human can review its internal workings and reliably calculate its future motives and actions, then that kind of subversion strategy probably won’t succeed, and it’s probably safe.
So is a rock, come to that.