If the x-coordinate AI is not turned on (call this event ¬X), it is motivated to have reduced impact. This motivation is sufficiently strong that it will not want to have the correct y-coordinate outputted.
It will produce a robot that will aim to aim the the y-coordinate of the laser correctly, given ¬X, and not expand dangerously.
Aren’t these bits contradictory? Wouldn’t the result be not aiming?
The programmer expects to ¬X and it must program the bot with things that are X agnostic so it is planning to not aim. Then because the programmed bot can’t be X sensitive it will make essentially as if ¬X.
If the mission is to do the grue thing given that t1 the grue thing to do would be to press the blue button but the bot presses the green button. Such a solution is not grue-friendly or blue-friendly.
I’m not sure what you’re saying. The AI is programmed to be reduced impact, conditional on ¬X. If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
So, given ¬X, the AI is motivated to: a) output the correct y coordinate (or cause its subagent to do so), b) have a reduced impact overall.
The whole construction is an attempt to generalise a) and b) to X, even though they are in tension/contradiction with each other in X (because outputing the correct y coordinate will have a high impact).
If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
If the x-coordinate AI is not turned on (call this event ¬X), it is motivated to have reduced impact. This motivation is sufficiently strong that it will not want to have the correct y-coordinate outputted.
These bits are contradictory. One tells of a story where two low-impact options are tie-breaked by an aiming instinct to aim anyway. The other tells that “sit tight” instinct will overwhelm the aiming instinct.
If you want to control what happens in X, drives that are conditioned on ¬X are irrelevant. In my understanding the attempt is to generalise the reduced impact drive by not having it conditioned on X. Then what it does in ¬X can not be based on the fact that ¬X. But it can’t deduce that aiming is low impact even in ¬X because it must assume that the x-aiming robot could be on and that would make it a high impact decision. It must use the same decision process in both X and ¬X and the X decision process can’t be based on what it would do if it where allowed to assume that ¬X (that is you are not allowed to know whether the grue object is currently green or blue and you can’t decide what you would do if it were green based on what you would do if it was blue).
I’m also struggling with the above. The first quote says that with event ¬X “it will NOT want to have the correct y-coordinate outputted”. The second says the opposite, the robot WILL output “the y-coordinate of the laser correctly, given ¬X”.
Slider was correct—I made a mistake. The correct sentence would have been “This motivation is sufficiently strong that it will not want to have the correct y-coordinate outputted, if the correct x-coordinate were also there), but that got too complicated, so I removed the sentence entirely.
Aren’t these bits contradictory? Wouldn’t the result be not aiming?
If ¬X happened, the result would be missaiming. But since X happens (almost certainly), it aims correctly.
The programmer expects to ¬X and it must program the bot with things that are X agnostic so it is planning to not aim. Then because the programmed bot can’t be X sensitive it will make essentially as if ¬X.
If the mission is to do the grue thing given that t1 the grue thing to do would be to press the blue button but the bot presses the green button. Such a solution is not grue-friendly or blue-friendly.
I’m not sure what you’re saying. The AI is programmed to be reduced impact, conditional on ¬X. If ¬X happens, then outputting the correct y coordinates is reduced impact, which it will thus do (as it is separately motivated to do that).
So, given ¬X, the AI is motivated to: a) output the correct y coordinate (or cause its subagent to do so), b) have a reduced impact overall.
The whole construction is an attempt to generalise a) and b) to X, even though they are in tension/contradiction with each other in X (because outputing the correct y coordinate will have a high impact).
These bits are contradictory. One tells of a story where two low-impact options are tie-breaked by an aiming instinct to aim anyway. The other tells that “sit tight” instinct will overwhelm the aiming instinct.
If you want to control what happens in X, drives that are conditioned on ¬X are irrelevant. In my understanding the attempt is to generalise the reduced impact drive by not having it conditioned on X. Then what it does in ¬X can not be based on the fact that ¬X. But it can’t deduce that aiming is low impact even in ¬X because it must assume that the x-aiming robot could be on and that would make it a high impact decision. It must use the same decision process in both X and ¬X and the X decision process can’t be based on what it would do if it where allowed to assume that ¬X (that is you are not allowed to know whether the grue object is currently green or blue and you can’t decide what you would do if it were green based on what you would do if it was blue).
Indeed. I have corrected the top post. Thanks!
I’m also struggling with the above. The first quote says that with event ¬X “it will NOT want to have the correct y-coordinate outputted”. The second says the opposite, the robot WILL output “the y-coordinate of the laser correctly, given ¬X”.
Slider was correct—I made a mistake. The correct sentence would have been “This motivation is sufficiently strong that it will not want to have the correct y-coordinate outputted, if the correct x-coordinate were also there), but that got too complicated, so I removed the sentence entirely.