I’m not sure what you’re saying. The AI is programmed to be reduced impact, conditional on ¬X. If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
So, given ¬X, the AI is motivated to: a) output the correct y coordinate (or cause its subagent to do so), b) have a reduced impact overall.
The whole construction is an attempt to generalise a) and b) to X, even though they are in tension/contradiction with each other in X (because outputing the correct y coordinate will have a high impact).
If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
If the x-coordinate AI is not turned on (call this event ¬X), it is motivated to have reduced impact. This motivation is sufficiently strong that it will not want to have the correct y-coordinate outputted.
These bits are contradictory. One tells of a story where two low-impact options are tie-breaked by an aiming instinct to aim anyway. The other tells that “sit tight” instinct will overwhelm the aiming instinct.
If you want to control what happens in X, drives that are conditioned on ¬X are irrelevant. In my understanding the attempt is to generalise the reduced impact drive by not having it conditioned on X. Then what it does in ¬X can not be based on the fact that ¬X. But it can’t deduce that aiming is low impact even in ¬X because it must assume that the x-aiming robot could be on and that would make it a high impact decision. It must use the same decision process in both X and ¬X and the X decision process can’t be based on what it would do if it where allowed to assume that ¬X (that is you are not allowed to know whether the grue object is currently green or blue and you can’t decide what you would do if it were green based on what you would do if it was blue).
I’m not sure what you’re saying. The AI is programmed to be reduced impact, conditional on ¬X. If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
So, given ¬X, the AI is motivated to: a) output the correct y coordinate (or cause its subagent to do so), b) have a reduced impact overall.
The whole construction is an attempt to generalise a) and b) to X, even though they are in tension/contradiction with each other in X (because outputing the correct y coordinate will have a high impact).
These bits are contradictory. One tells of a story where two low-impact options are tie-breaked by an aiming instinct to aim anyway. The other tells that “sit tight” instinct will overwhelm the aiming instinct.
If you want to control what happens in X, drives that are conditioned on ¬X are irrelevant. In my understanding the attempt is to generalise the reduced impact drive by not having it conditioned on X. Then what it does in ¬X can not be based on the fact that ¬X. But it can’t deduce that aiming is low impact even in ¬X because it must assume that the x-aiming robot could be on and that would make it a high impact decision. It must use the same decision process in both X and ¬X and the X decision process can’t be based on what it would do if it where allowed to assume that ¬X (that is you are not allowed to know whether the grue object is currently green or blue and you can’t decide what you would do if it were green based on what you would do if it was blue).
Indeed. I have corrected the top post. Thanks!