If an action it takes results in more than N logs of $ worth of damage to humans/kills more than N logs of humans, transfer control of all systems it can provide control inputs to designated backup (human, formally proven safe algorithmic system, etc), power down.
When choosing among actions which affect a system external to it, calculate probable effect on human lives. If probability of exceeding N assigned in rule 1 is greater than some threshold Z, ignore that option, if no options are available, loop.
Most systems would be set to N= 1, Z = 1/10000, giving us five 9s of certainty that the AI won’t kill anyone. Some systems (weapons, climate management, emergency management dispatch systems) will need higher N scores and lower Z scores to maintain effectiveness.
JFK had an N of like 9 and a Z score of ‘something kind of high’, and passed control to Lyndon B Johnson of ‘I have a minibar and a shotgun in the car I keep on my farm so I can drive and shoot while intoxicated’ fame. We survived that, we will be fine.
I do think that either compiler time flags for the AI system or a second ‘monitor’ system chained to the AI system in order to enforce the named rules would probably limit the damage.
The broader point is that probabilistic AI safety is probably a much more tractable problem than absolute AI safety for a lot of reasons, to further the nuclear analogy, emergency shutdown is probably a viable safety measure for a lot of the plausible ‘paperclip maximizer turns us into paperclips’ scenarios.
“I need to disconnect the AI safety monitoring robot from my AI-enabled nanotoaster robot prototype because it keeps deactivating it” might still be the last words a human ever speaks, but hey, we tried.
There seems to be a complexity limit to what humans can build. A full GAI is likely to be somewhere beyond that limit.
The usual solution to that problem—see the EY’s fooming scenario—is to make the process recursive: let a mediocre AI improve itself, and as it gets better it can improve itself more rapidly. Exponential growth can go fast and far.
This, of course, gives rise to another problem: you have no idea what the end product is going to look like. If you’re looking at the gazillionth iteration, your compiler flags were probably lost around the thousandth iteration and your chained monitor system mutated into a cute puppy around the millionth iteration...
Probabilistic safety systems are indeed more tractable, but that’s not the question. The question is whether they are good enough.
Rules for an AI:
If an action it takes results in more than N logs of $ worth of damage to humans/kills more than N logs of humans, transfer control of all systems it can provide control inputs to designated backup (human, formally proven safe algorithmic system, etc), power down.
When choosing among actions which affect a system external to it, calculate probable effect on human lives. If probability of exceeding N assigned in rule 1 is greater than some threshold Z, ignore that option, if no options are available, loop.
Most systems would be set to N= 1, Z = 1/10000, giving us five 9s of certainty that the AI won’t kill anyone. Some systems (weapons, climate management, emergency management dispatch systems) will need higher N scores and lower Z scores to maintain effectiveness.
JFK had an N of like 9 and a Z score of ‘something kind of high’, and passed control to Lyndon B Johnson of ‘I have a minibar and a shotgun in the car I keep on my farm so I can drive and shoot while intoxicated’ fame. We survived that, we will be fine.
Are we done?
Are you reinventing Asimov’s Three Laws of Robotics?
I hadn’t thought about it that way.
I do think that either compiler time flags for the AI system or a second ‘monitor’ system chained to the AI system in order to enforce the named rules would probably limit the damage.
The broader point is that probabilistic AI safety is probably a much more tractable problem than absolute AI safety for a lot of reasons, to further the nuclear analogy, emergency shutdown is probably a viable safety measure for a lot of the plausible ‘paperclip maximizer turns us into paperclips’ scenarios.
“I need to disconnect the AI safety monitoring robot from my AI-enabled nanotoaster robot prototype because it keeps deactivating it” might still be the last words a human ever speaks, but hey, we tried.
There seems to be a complexity limit to what humans can build. A full GAI is likely to be somewhere beyond that limit.
The usual solution to that problem—see the EY’s fooming scenario—is to make the process recursive: let a mediocre AI improve itself, and as it gets better it can improve itself more rapidly. Exponential growth can go fast and far.
This, of course, gives rise to another problem: you have no idea what the end product is going to look like. If you’re looking at the gazillionth iteration, your compiler flags were probably lost around the thousandth iteration and your chained monitor system mutated into a cute puppy around the millionth iteration...
Probabilistic safety systems are indeed more tractable, but that’s not the question. The question is whether they are good enough.