Does moving a few ounces of matter from one location to another count as a significant “effect on the world”?
In general, yes; you can and should be much more conservative here than would fully reflect your preferences, and give it a principle implying your (1) and (2) are both Very Bad.
But, the waste heat from its computation will move at least a few ounces of air.
Maybe you can get around this by having it not worry (so to speak) about effects other than through I/O, but this is unsafe if it can use channels you didn’t think of to deliberately influence the world. Certainly other problems, too – but (it seems to me) problems that have to be solved anyway to implement CEV, which is sort of a special case of Oracle AI.
But, the waste heat from its computation will move at least a few ounces of air.
Quite so. The waste heat, of course, has very little thermodynamically significant direct impact on the rest of the world—but by the same token, removing someone’s frontal lobe or not has a smaller, more indirect impact on the world than preventing the bomb from detonating or not.
Now, suppose the AI’s grasp of causal structure is sufficient that it will indeed only take actions that truly have minimal impact vs. nonaction; in this case it will be unable to communicate with humans in ways that are expected to result in significant changes to the human’s future behavior, making it a singularly useless oracle.
My intuition here is that the insights required for any specification of what causal results of action are acceptable is roughly equivalent to what is necessary to specify something like CEV (i.e., essentially what Warrigal said above) in that both require the AI have, roughly speaking, the ability to figure out what people actually want, not what they say they want. If you’ve done it right, you don’t need additional safeguards such as preventing significant effects; if you’ve done it wrong, you’re probably screwed anyways.
In general, yes; you can and should be much more conservative here than would fully reflect your preferences, and give it a principle implying your (1) and (2) are both Very Bad.
But, the waste heat from its computation will move at least a few ounces of air.
Maybe you can get around this by having it not worry (so to speak) about effects other than through I/O, but this is unsafe if it can use channels you didn’t think of to deliberately influence the world. Certainly other problems, too – but (it seems to me) problems that have to be solved anyway to implement CEV, which is sort of a special case of Oracle AI.
Quite so. The waste heat, of course, has very little thermodynamically significant direct impact on the rest of the world—but by the same token, removing someone’s frontal lobe or not has a smaller, more indirect impact on the world than preventing the bomb from detonating or not.
Now, suppose the AI’s grasp of causal structure is sufficient that it will indeed only take actions that truly have minimal impact vs. nonaction; in this case it will be unable to communicate with humans in ways that are expected to result in significant changes to the human’s future behavior, making it a singularly useless oracle.
My intuition here is that the insights required for any specification of what causal results of action are acceptable is roughly equivalent to what is necessary to specify something like CEV (i.e., essentially what Warrigal said above) in that both require the AI have, roughly speaking, the ability to figure out what people actually want, not what they say they want. If you’ve done it right, you don’t need additional safeguards such as preventing significant effects; if you’ve done it wrong, you’re probably screwed anyways.